Presentation is loading. Please wait.

Presentation is loading. Please wait.

Audition: Objects, Sources, Streams

Similar presentations


Presentation on theme: "Audition: Objects, Sources, Streams"— Presentation transcript:

1 Audition: Objects, Sources, Streams
Kevin Binz

2 Acoustics

3 Longitudinal Pressure Waves
Two kinds of waves: transverse & longitudinal Sound waves are the latter.

4 Spectrograms: Frequency, Amplitude, Time
Broadband Noise “Click”

5 Auditory Sources Photons only emerge from the sun.
Pressure waves, however, emerge from a variety of sources. These contributions are mixed together into a single soundscape.

6 Fundamental & Harmonics
Some sounds comprise only a fundamental and a series of harmonics Speech is one such sound.

7 Objects, Sources, Streams

8 What Is An Object? Objects encode useful commonalities within perceptual streams. Perception Perception Perception Perception Perception Perception Object Objects are a tagging system; they allow for the accretion of knowledge.

9 Objects vs Sources In the paper, the singer is an auditory object, source and stream. I want to say the singer is an auditory Source, and that each word is an Object. Streams are systematic maps between source and its objects. Objects can be grouped in this way, even if the source is unknown/novel. Stream A Source A Object Object Object Stream B Source B Object Object

10 The Reconstruction of Spatial Information
Unlike haptics and vision, audition does not directly encode spatial information, at least not initially. Audition is transparent: auditory sources emerging from different locations are superimposed onto a single soundscape. Spatial information, however, is necessary in detecting predators & prey. We will present evidence that subcortical pathways construct spatial representations.

11 The Construction Of Perception
Perceptual features are constructed from physical sense data. Visual Audio Frequency Color Frequency Pitch Amplitude Brightness Amplitude Loudness Some constructions are straightforward: amplitude → loudness Others are more subtle: frequency → pitch

12 Divergent Feature Sets
Sources and Objects have different featural components Stream A Source A Object Object Object Location Timbre Pitch Loudness Temporal Signature Since subcortical pathways infer locations, we might suspect that: Representations of source are available subcortically. Prey don’t really care about acoustic content emitted from their predators.

13 Auditory Pathway

14 Auditory Pathway Overview
Primary Auditory Cortex (PAC) Processing begins in the mesencephalon. CNC is more rightfully called A1. PAC might well be called A4. Medial Geniculate Nucleus (MGN) Inferior Colliculus (IC) Superior Olivary Complex (SOC) Cochlear Nuclear Complex (CNC) Basilar Membrane (BM) Vestibulocochlear Nerve (CN8)

15 Auditory Pathway Parcellation
CNC has three subcomponents: two ventral & one dorsal. Primary Auditory Cortex (PAC) SOC has three subcomponents: lateral, medial, and MNTS Medial Geniculate Nucleus (MGN) VMGN DMGN MMGN IC has three subcomponents: Central Nucleus, dorsal, lateral CNIC Inferior Colliculus (IC) DIC LIC MGN has three subcomponents: ventral, dorsal, medial Three white matter tracts: Dorsal Acoustic Stria Intermediate Acoustic Stria Ventral Acoustic Stria Superior Olivary Complex (SOC) LSO MSO MNTS DCN Cochlear Nuclear Complex (CNC) Basilar Membrane (BM) Vestibulocochlear Nerve (CN8) AVCN PVCN

16 Basilar Membrane: Tonotopy
Audition is your brain on Fourier analysis. The basilar membrane is organized as a spectrogram.

17 Basilar Membrane: Temporal Dynamics
The basilar membrane’s temporal organization is subtle. Single Click Click Train

18 CN8: Vestibulocochlear Nerve
There are 11 cranial nerves, which complement the information transmitted by spinal cord. The eighth cranial nerve carries sound and balance data from the ear. It terminates very low in the brain stem, beneath the pons.

19 Subcortical Pathways Tonotopy exists in all of these structures.
Primary Auditory Cortex (PAC) Tonotopy exists in all of these structures. These pathways together extract spatial information about Sources. Auditory pathways are heavily interwoven with haptic pathways. One possible motivation for this fact: Filtering out self-effected sounds Medial Geniculate Nucleus (MGN) VMGN DMGN MMGN CNIC Inferior Colliculus (IC) DIC LIC LSO Superior Olivary Complex (SOC) MSO MNTS DCN Cochlear Nuclear Complex (CNC) AVCN PVCN

20 Lateral “Where” Stream
Dorsal “How” Stream Cortical Streams Lateral “Where” Stream Ventral “What” Stream PAC seems involved in the construction of auditory objects, and assigning these to sources. The ventral stream seems to link these objects to the VLPFC, with more amodal representations. PMd PMv Primary Audio cortex

21 Subcortical Source Localization

22 Source Localization Mechanisms
Monaural Elevation Input from one ear is used to compute direction along vertical axis. The pinna serves as a “spectral notch” that is contingent on elevation. Binaural Azimuth Information across hemispheres to is used to compute direction along horizontal access. This is done with two circuits for ITD and ILD, respectively. Raykar et al (2005). Extracting the frequencies of the pinna spectral notes in measured head related impulse responses.

23 Tympanic Ear: Independent Evolution

24 Many Solutions To Source Localization
Some fish, for example, exploit particle motion caused by near-field sound stimulation directly exciting hair cells in the inner ear which are oriented parallel to the motion of the water particles. As different hair cells are oriented in different directions, the excitation pattern of the population of hair cells allows the fish to locate sound sources. Certain flies possess hearing organs located at the center of the frontal thorax that, by mechanical means, enhance minute ITDs by several orders of magnitude, rendering them amenable to neuronal processing. A further example is the frog ear, which effectively operates as a pressure difference receiver: sound impinges on the frog’s tympanic ears not only directly from outside, but also indirectly through the mouth and, via the open Eustachian tubes, from the opposite ear. As a consequence, the relatively small differences in distance traveled by the sound to the inner and outer sides of the eardrum generate location-dependent interference patterns in the movement of the tympanum, ultimately creating directional sensitivity at the level of the receptor organ. Reptiles and birds use similar means of “widening” their effective interaural distance by acoustically (mechanically) coupling the tympani via a rigid tube that connects the middle-ear cavities. Accordingly, reptiles and birds experience larger ITDs than do mammals with equivalent head sizes.

25 Localization Performance
Three methods to infer locations in mammals: So. How effective are these methods? ILD ITD Spectral Notch Humans can discriminate changes of just 1–2 degrees in the angular location of a sound source. Thresholds as low as 10 µs for ITDs or 1–2 decibels for ILDs for presentations of binaural clicks. ITD ILD Duplex theory held that spatial localization is performed by separate low- and high-frequency processes Strong evidence for this was provided by Stevens & Newman (1934), which found localization error peak in intermediate range (2-5 kHz). Stevens & Newman (1934). The Localization of Pure Tones. These two processes are ITD (low freq) and ILD (high frequency)

26 Azimuth: ILD Early mammals only had circuitry for ILD, not ITD.
LSO homologues are ubiquitous.

27 Azimuth: ITD

28 Corpora Quadrigemina One of the first multimodal organs: believed to contain a supramodal representation of space. SC is about much more than eye movement. It also represents visual space in its shallow layers, and incorporates haptic and limbic information in deep layers.

29 Source Differentiation

30 Source Inference: Onset Time
Location is not the only way to differentiate sources. Onset time is a very strong strongest grouping cues However, when multiple sources have the same onset, attentional resources can assist in parsing.

31 Source Inference: Alternating Bursts
In the ABA paradigm, you manipulate rate and semitone separation. From this mechanism, you can achieve bistable percepts. Demo When pulse onset times are synchronous, the sound is reliably experienced as one stream. Onset time is an important source grouping cue.

32 Object Characteristics

33 Objects: Extended Across Time
While it consumes processing time to activate APPLE from its visual signature, the description of visual objects don’t seem to be extended across time. Auditory objects, however, are meaningfully extended across time. For example, periodic sounds are often parsed into objects at their temporal boundaries. Streams are, of course, also extended across time, with objects being composed sequentially.

34 Objects: Contextually Invariant
Objects must generalize across identity-preserving changes In vision, objects are invariant to occlusion. This invariance produces continuity illusions. In audition, objects are invariant to noise. This invariance produces continuity illusions. A Frequency A Time B B A Frequency A Time Demo

35 Object Consolidation: Leveraging Harmonics
Since harmonics bear lawful relations to the fundamental, the brain can use this fact to consolidate objects. If a frequency deviates from these lawful predictions, it will be perceived as a separate object.

36 Objects: Predictive Fecundity
Demo If you physically remove the fundamental frequency, the brain will reconstruct it during the construction of pitch. In this demo, the physical frequency is falling, but the missing fundamental is rising. Is pitch rising or falling? Removed Signal

37 Amodal Objects: Semantic Invariance
Rhesus monkey vocalizations: Grunt: Low-Quality Food Warble & Harmonic arch: High- Quality Food In VPC, semantic content is the differentiator, not the auditory stimulus.

38 Looking Forward

39 Is Azimuth Represented in Human PAC?
Spatial sensitivity in human PAC hasn’t been demonstrated by fMRI. But single-cell recordings shown primate PAC contains neurons with spatial sensitivity. These neurons are organized in a distributed rate code rather than a place code. Two hypotheses suggest themselves: Either human genome abandoned these evolutionary structures; or The rate code has been retained, but is too distributed to be imaged. I suspect the latter is correct, and PAC location codes facilitates binding objects to sources. Werner-Reiss & Groh (2008). A Rate Code for Sound Azimuth in Monkey Auditory Cortex: Implications for Human Neuroimaging Studies


Download ppt "Audition: Objects, Sources, Streams"

Similar presentations


Ads by Google