Presentation is loading. Please wait.

Presentation is loading. Please wait.

Auditory scene analysis Day 15

Similar presentations


Presentation on theme: "Auditory scene analysis Day 15"— Presentation transcript:

1 Auditory scene analysis Day 15
Music Cognition MUSC , NSCI 466, NSCI Harry Howard Barbara Jazwinski Tulane University

2 Course administration
Spend provost's money 12/25/2018 Music Cognition - Jazwinski & Howard - Tulane University

3 Music Cognition - Jazwinski & Howard - Tulane University
Goals for today 12/25/2018 Music Cognition - Jazwinski & Howard - Tulane University

4 Statement of the problem

5 The ball-room problem Helmholtz (1863)
"In the interior of a ball-room … there are a number of musical instruments in action, speaking men and women, rustling garments, gliding feet, clinking glasses, and so on … a tumbled entanglement [that is] complicated beyond conception. And yet … the ear is able to distinguish all the separate constituent pats of this confused whole." On the Sensations of Tone as a Physiological Basis for the Theory of Music By Hermann Von Helmholtz, Hermann Ludwig Ferdinand Von Helmholtz Published by Kessinger Publishing, 2005:26 HH: present inducyivevely, form a sound recording (Wang uses beach) 12/25/2018 Music Cognition - Jazwinski & Howard - Tulane University

6 … which is a well-known problem in speech perception
“One of our most important faculties is our ability to listen to, and follow, one speaker in the presence of others. This is such a common experience that we may take it for granted; we may call it ‘the cocktail party problem’…” (Cherry, 1957) “For ‘cocktail party’-like situations… when all voices are equally loud, speech remains intelligible for normal-hearing listeners even when there are as many as six interfering talkers” (Bronkhorst & Plomp, 1992) Wang 2004 12/25/2018 Music Cognition - Jazwinski & Howard - Tulane University

7 What would the analog be in music?
The orchestra problem? 12/25/2018 Music Cognition - Jazwinski & Howard - Tulane University

8 Model as sources of intrusion and distortion
additive noise from other sound sources channel distortion reverberation from surface reflections Wang 2004 12/25/2018 Music Cognition - Jazwinski & Howard - Tulane University

9 with new information about computational/mathematical modeling
Some review with new information about computational/mathematical modeling

10 The auditory periphery
A complex mechanism for transducing pressure variations in the air to neural impulses in auditory nerve fibers Wang 2004 12/25/2018 Music Cognition - Jazwinski & Howard - Tulane University

11 Music Cognition - Jazwinski & Howard - Tulane University
Traveling wave Different frequencies of sound give rise to maximum vibrations at different places along the basilar membrane. The frequency of vibration at a given place is equal to that of the nearest stimulus component (resonance). Hence, the cochlea performs a frequency analysis. Wang 2004 12/25/2018 Music Cognition - Jazwinski & Howard - Tulane University

12 Cochlear filtering model
The gammatone function approximates physiologically-recorded impulse responses n = filter order (4) b = bandwidth f0 = centre frequency f = phase Wang 2004 12/25/2018 Music Cognition - Jazwinski & Howard - Tulane University

13 Music Cognition - Jazwinski & Howard - Tulane University
Gammatone filterbank Each position on the basilar membrane is simulated by a single gammatone filter with appropriate centre frequency and bandwidth. A small number of filters (e.g. 32) are generally sufficient to cover the range 50-8 kHz. Note variation in bandwidth with frequency (unlike Fourier analysis). Wang 2004 12/25/2018 Music Cognition - Jazwinski & Howard - Tulane University

14 Music Cognition - Jazwinski & Howard - Tulane University
Response to a pure tone Many channels respond, but those closest to the target tone frequency respond most strongly (place coding). The interval between successive peaks also encodes the tone frequency (temporal coding). Note propagation delay along the membrane model. Wang 2004 12/25/2018 Music Cognition - Jazwinski & Howard - Tulane University

15 Spectrogram vs. cochleogram
Plot of log energy across time and frequency (linear frequency scale) ‘Cochleogram’ Cochlear filtering by the gammatone filterbank (or other models of cochlear filtering) Quasi-logarithmic frequency scale, and filter bandwidth is frequency-dependent Previous work suggests better resilience to noise than spectrogram Let’s call it ‘cochleogram’ Wang 2004 OMOT; Cochlear filtering by the gammatone filterbank (or other models of cochlear filtering), followed by a stage of nonlinear rectification; the latter corresponds to hair cell transduction by either a hair cell model or simple compression operations (log and cube root) 12/25/2018 Music Cognition - Jazwinski & Howard - Tulane University

16 Music Cognition - Jazwinski & Howard - Tulane University
Beyond the periphery The auditory system (Source: Arbib, 1989) The auditory system is complex: four relay stations between periphery and cortex rather than one as in the visual system In comparison to the auditory periphery, central parts of the auditory system are less understood. Number of neurons in the primary auditory cortex is comparable to that in the primary visual cortex despite the fact that the number of fibers in the auditory nerve is far fewer than that of the optic nerve (thousands vs. millions) Wang 2004 The auditory nerve 12/25/2018 Music Cognition - Jazwinski & Howard - Tulane University

17 Auditory scene analysis

18 Auditory scene analysis (ASA)
Listeners are capable of parsing an acoustic scene to form a mental representation of each sound source – a stream – in the perceptual process of auditory scene analysis (Bregman, 1990) From events to streams Two conceptual processes of ASA: Segmentation Decompose the acoustic mixture into sensory elements (segments) Grouping Combine segments into streams, so that segments in the same stream originate from the same source Two sorts of temporal organization Simultaneous Sequential Wang 2004 12/25/2018 Music Cognition - Jazwinski & Howard - Tulane University

19 Simultaneous organization
Groups sound components that overlap in time. Some cues for simultaneous organization Proximity in frequency (spectral proximity) Common periodicity Harmonicity Fine temporal structure Common spatial location Common onset (and to a lesser degree, common offset) Common temporal modulation Amplitude modulation Frequency modulation Wang 2004 12/25/2018 Music Cognition - Jazwinski & Howard - Tulane University

20 Sequential organization
Groups sound components across time. Some cues for sequential organization: Proximity in time and frequency Temporal and spectral continuity Common spatial location; more generally, spatial continuity Smooth pitch contour Rhythmic structure Rhythmic attention theory (Large and Jones, 1999) Wang 2004 OMITTED: Streaming demo: Cycle of six tones Smooth format transition? 12/25/2018 Music Cognition - Jazwinski & Howard - Tulane University

21 Two processes for grouping
Primitive grouping (bottom-up) Innate data-driven mechanisms, consistent with those described by Gestalt psychologists for visual perception (proximity, similarity, common fate, good continuation, etc.) It is domain-general, and exploits intrinsic structure of environmental sound Grouping cues described earlier are primitive in nature Schema-driven grouping (model-based or top-down) Learned knowledge about speech, music and other environmental sounds. It is domain-specific, e.g. organization of speech sounds into syllables 12/25/2018 Music Cognition - Jazwinski & Howard - Tulane University

22 Organisation in speech: Broadband spectrogram
“… pure pleasure … ” continuity Some other cues we can’t see easily eg same vocal tract onset synchrony offset synchrony common AM harmonicity 12/25/2018 Music Cognition - Jazwinski & Howard - Tulane University

23 Organisation in speech: Narrowband spectrogram
“… pure pleasure … ” continuity Some other cues we can’t see easily eg same vocal tract onset synchrony offset synchrony harmonicity 12/25/2018 Music Cognition - Jazwinski & Howard - Tulane University

24 CASA system architecture
12/25/2018 Music Cognition - Jazwinski & Howard - Tulane University

25 Music cognition Scheirer, E. D. Bregman's chimerae: Music perception as auditory scene analysis.

26 Music Cognition - Jazwinski & Howard - Tulane University
The goal “… is to explain the human ability to map incoming acoustic data into emotional, music-theoretical, or other high-level cognitive representations, and to provide evidence from psychological experimentation for these explanations.” 12/25/2018 Music Cognition - Jazwinski & Howard - Tulane University

27 A bottom-up model of musical perception and cognition
Boxes contain "facilities" or processes which operate on streams of input and produce streams of output. Arrows denote these streams and are labeled with a rough indication of the types of information they might contain. Italicized labels beneath the "music perception" and "music cognition" boxes indicate into which of these categories various musical properties might fall. 12/25/2018 Music Cognition - Jazwinski & Howard - Tulane University

28 Music Cognition - Jazwinski & Howard - Tulane University
More explanation Acoustic events enter the ear as waves of varying sound-pressure level and are processed by the cochlea into streams of band-passed power levels at various frequencies. The harmonically-related peaks in the time-frequency spectrum specified by the channels of filterbank output are grouped into "notes" or "complex tones" using auditory grouping rules such as continuation, harmonicity, and common onset time. Properties of these notes such as timbre, pitch, loudness, and perhaps their rhythmic relationships over time, are determined by a low-level "music perception" facility. Once the properties of the component notes are known, the relationships they bear to each other and to the ongoing flow of time can be analyzed, and higher-level structures such as melodies, chords, and key centers can be constructed. These high-level descriptions give rise to the final "emotive" content of the listening experience as well as other forms of high-level understanding and modeling, such as recognition, affective response, and the capacity for theoretical analysis. 12/25/2018 Music Cognition - Jazwinski & Howard - Tulane University

29 One assumption which bears examination
The explicitly mono-directional flow of data from "low-level" processes to "high-level" processes that is, that the implication that higher-level cognitive models have little or no impact on the stages of lower-level processing. We know from existing experimental data that this upward data-flow model is untrue in particular cases. For example, frequency contours in melodies can lead to a percept of accent structure, which in turn leads to the belief that the accented notes are louder than the unaccented. Thus, the high-level process of melodic understanding impacts the "lower-level" process of determining the loudness of notes. 12/25/2018 Music Cognition - Jazwinski & Howard - Tulane University

30 Another assumption which bears examination
In computer-music research, the process of turning a digital-audio signal into a symbolic representation of the same musical content is termed the transcription problem, and has received much study. The assumption that "notes" are the fundamental mental representations of all musical perception and cognition requires that there be a transcription facility in the brain to produce them. This assumption, and especially the implicated requirement, are largely unsupported by experimental evidence. We have no percept of most of the individual notes which comprise the chords and rhythms in the densely-scored inner sections of a Beethoven symphonic development. While highly-trained individuals may be able to "hear out" some of the specific pitches and timbres through a difficult process of listening and deduction, this is surely not the way in which the general experience of hearing music unfolds. 12/25/2018 Music Cognition - Jazwinski & Howard - Tulane University

31 Music Cognition - Jazwinski & Howard - Tulane University
A "top-down" or "prediction-driven" model of music perception and cognition Boxes again represent processing facilities; arrows are unlabeled to indicate less knowledge about the exact types of information being passed from box to box. adapted from a similar model in [Ellis 1996]). 12/25/2018 Music Cognition - Jazwinski & Howard - Tulane University

32 Music Cognition - Jazwinski & Howard - Tulane University
More explanation In this model, predictions based on the current musical context are compared against the incoming psychoacoustic cues. Prediction is dependent on what has been previously heard, and what is known about the musical domain from innate constraints and learned acculturation. The agreements and/or disagreements between prediction and realization are reconciled and reflected in a new representation of the musical situation. Note that within this model, the types of representations actually present in a mental taxonomy of musical context are as yet unspecified. 12/25/2018 Music Cognition - Jazwinski & Howard - Tulane University

33 Music Cognition - Jazwinski & Howard - Tulane University
Auditory chimera One element of the internal representation of music which has been somewhat underexamined is called an auditory chimera by Bregman: [Music often wants] the listener to accept the simultaneous roll of the drum, clash of the cymbal, and brief pulse of noise from the woodwinds as a single coherent event with its own striking properties. The sound is chimeric in the sense that it does not belong to any single environmental object. [Bregman 1990 p. 460, emphasis added] 12/25/2018 Music Cognition - Jazwinski & Howard - Tulane University

34 Music Cognition - Jazwinski & Howard - Tulane University
An example Again arguing from intuition, it seems likely the majority of the inner-part content of a Beethoven symphony is perceived in exactly this manner. That is, multiple non-melodic voices are grouped together into a single virtual "orchestral" sound object which has certain properties analogous to "timbre" and "harmonic implication", and which is, crucially, irreducible into perceptually smaller units. It is the combined and continuing experience of these "chimeric" objects which gives the music its particular quality in the large -- that is, what the music "sounds like" on a global level. In fact, it seems likely that a good portion of the harmonic and textural impact of a piece of complex music is carried by such objects. 12/25/2018 Music Cognition - Jazwinski & Howard - Tulane University

35 Next Monday Prediction in music


Download ppt "Auditory scene analysis Day 15"

Similar presentations


Ads by Google