# Hearing Complex Sounds MSc Neuroscience Jan Schnupp MSc Neuroscience Jan Schnupp

## Presentation on theme: "Hearing Complex Sounds MSc Neuroscience Jan Schnupp MSc Neuroscience Jan Schnupp"— Presentation transcript:

Hearing Complex Sounds MSc Neuroscience Jan Schnupp jan.schnupp@dpag.ox.ac.uk MSc Neuroscience Jan Schnupp jan.schnupp@dpag.ox.ac.uk

Sound Signals Many physical objects emit sounds when they are “excited” (e.g. hit or rubbed). Sounds are just pressure waves rippling through the air, but they carry a lot of information about the objects that emitted them. (Example: what are these two objects? Which one is heavier, object A or object B ?) The sound (or signal) emitted by an object (or system) when hit is known as the impulse response. Impulse responses of everyday objects can be quite complex, but the sine wave is a fundamental ingredient of these (or any) complex sounds (or signals). Many physical objects emit sounds when they are “excited” (e.g. hit or rubbed). Sounds are just pressure waves rippling through the air, but they carry a lot of information about the objects that emitted them. (Example: what are these two objects? Which one is heavier, object A or object B ?) The sound (or signal) emitted by an object (or system) when hit is known as the impulse response. Impulse responses of everyday objects can be quite complex, but the sine wave is a fundamental ingredient of these (or any) complex sounds (or signals).

Vibrations of a Spring- Mass System Undamped 1. F = -k·y (Hooke’s Law) 2. F = m · a (Newton’s 2 nd ) 3. a = dv/dt = d 2 y/dt 2  -k · y = m · d 2 y/dt 2  y(t) = y o · cos(t ·  k/m) Damped 4. –k·y –r dy/dt = d 2 y/dt 2 y(t) = y o ·e (-r·t/2m) cos(t·  k/m-(r/2m) 2 ) Undamped 1. F = -k·y (Hooke’s Law) 2. F = m · a (Newton’s 2 nd ) 3. a = dv/dt = d 2 y/dt 2  -k · y = m · d 2 y/dt 2  y(t) = y o · cos(t ·  k/m) Damped 4. –k·y –r dy/dt = d 2 y/dt 2 y(t) = y o ·e (-r·t/2m) cos(t·  k/m-(r/2m) 2 ) Don’t worry about the formulae! Just remember that mass-spring systems like to vibrate at a rate proportional to the square-root of their “stiffness” and inversely proportional to their weight. http://auditoryneuroscience.com/acoustics/simple_harmonic_motion

Resonant Cavities In resonant cavities, “lumps of air” at the entrance/exit of the cavity oscillate under the elastic forces exercised by the air inside the cavity. The preferred resonance frequency is inversely proportional to the square root of the volume. (Large resonators => deeper sounds). In resonant cavities, “lumps of air” at the entrance/exit of the cavity oscillate under the elastic forces exercised by the air inside the cavity. The preferred resonance frequency is inversely proportional to the square root of the volume. (Large resonators => deeper sounds).

The Ear Organ of Corti Cochela “unrolled” and sectioned

Modes of Vibration and Harmonic Complex Tones An “ideal” string would maintain the triangular shape set up when the string is plucked throughout the oscillation. Fourier analysis approximates such vibrations as a sum (series) of sine waves. For large numbers of sine waves (“Fourier components”) the approximation becomes very good, for an infinite number of components it is exact. Natural sounds are often made up of sine components that are harmonically related i.e. their frequencies are all integer multiples of a “fundamental frequency”. An “ideal” string would maintain the triangular shape set up when the string is plucked throughout the oscillation. Fourier analysis approximates such vibrations as a sum (series) of sine waves. For large numbers of sine waves (“Fourier components”) the approximation becomes very good, for an infinite number of components it is exact. Natural sounds are often made up of sine components that are harmonically related i.e. their frequencies are all integer multiples of a “fundamental frequency”. http://auditoryneuroscience.com/?q=acoustics/modes_of_vibration

Click Trains & the “30 Hz Transition” At frequencies up to ca 30 Hz, each click in a click train is perceived as an isolated event. At frequencies above ca 30 Hz, individual clicks fuse, and one perceives a continuous hum with a strong pitch. At frequencies up to ca 30 Hz, each click in a click train is perceived as an isolated event. At frequencies above ca 30 Hz, individual clicks fuse, and one perceives a continuous hum with a strong pitch. time sound pressure https://mustelid.physiol.ox.ac.uk/drupal/?q=pitch/click_train

The Impulse (or “Click”) The “ideal click”, or impulse, is an infinitesimally short signal. The Fourier Transform encourages us to think of this click as an infinite series of sine waves, which have started at the beginning of time, continue until the end of time, and all just happen to pile up at the one moment when the click occurs.

Basilar Membrane Response to Clicks http://auditoryneuroscience.com/ear/bm_motion_2

Why Click Trains Have Pitch If we represent each click in a train by its Fourier Transform, then it becomes clear that certain sine components will add (top) while others will cancel (bottom). This results in a strong harmonic structure.

Basilar Membrane Response to Click Trains auditoryneuroscience.com | The Ear

Cariani & Delgutte AN recordings Phase locking to Modulator (Envelope) Phase locking to Carrier AN Phase Locking to Artificial “Single Formant” Vowel Sounds https://mustelid.physiol.ox.ac.uk/drupal/?q=ear/bm_motion_3

Vocal Folds in Action auditoryneuroscience.com | Vocalizations

Articulation Articulators (lips, tongue, jaw, soft palate) move to change resonance properties of the vocal tract. auditoryneuroscience.com | Vocalizations Launch SpectrogramSpectrogram

Harmonics & formants of a vowel Harmonics Formants

Formants Arise From Resonances in the Vocal Tract Speakers change formant frequencies by making resonance cavities in their mouth, nose and throat larger or smaller.

The “Neurogram” As a crude approximation, one might say that it is the job of the ear to produce a spectrogram of the incoming sounds, and that the brain interprets the spectrogram to identify sounds. This figure shows histograms of auditory nerve fibre discharges in response to a speech stimulus. Discharge rates depend on the amount of sound energy near the neuron’s characteristic frequency.

The discharges of cochlear nerve fibres to low- frequency sounds are not random; they occur at particular times (phase locking). Evans (1975) Phase Locking https://mustelid.physiol.ox.ac.uk/drupal/?q=ear/phase_locking

Quantifying Phase-Locking Phase locking is usually measured as a “vector strength”, also known as “synchronicity index” (SI). To calculate this, a Period Histogram of the neural response is normalized, and each bin of the histogram is thought of as a vector. The SI is given by length of the vector sum for all bins. Phase locking is usually measured as a “vector strength”, also known as “synchronicity index” (SI). To calculate this, a Period Histogram of the neural response is normalized, and each bin of the histogram is thought of as a vector. The SI is given by length of the vector sum for all bins.

Phase Locking Limits AN fibres are unable to phase lock to signal fluctuations at rates faster than 3 kHz. This is due to low-pass filtering in the hair cell receptors. AN fibres are unable to phase lock to signal fluctuations at rates faster than 3 kHz. This is due to low-pass filtering in the hair cell receptors. Hair Cell Receptor PotentialsAN fibre Responses Synchrony Index Frequency (kHz) 1000 Hz 2000 Hz 3000 Hz 4000 Hz

Frequency Coding The Place Theory stipulates that frequencies are encoded by activity across the tonotopic array of fibers in the AN, as well as in tonotopic nuclei along the lemniscal auditory pathway. The Timing Theory posits that temporal information conveyed through phase locking provides the dominant cue to frequency information. Neither the Place nor the Timing Theory can account for all psychophysical data The Place Theory stipulates that frequencies are encoded by activity across the tonotopic array of fibers in the AN, as well as in tonotopic nuclei along the lemniscal auditory pathway. The Timing Theory posits that temporal information conveyed through phase locking provides the dominant cue to frequency information. Neither the Place nor the Timing Theory can account for all psychophysical data

Missing Fundamental Sounds http://auditoryneuroscience.com/topics/missing-fundamental

The Pitch of “Complex” Sounds (Examples)

The Periodicity of a Signal is the Major Determinant of its Pitch Iterated rippled noise can be made more or less periodic by increasing or decreasing the number of iterations. The less periodic the signal, the weaker the pitch.  

Sinusoidally Amplitude Modulated (SAM) Tones Example with carrier frequency c:5000 Hz, modulator frequency f:400 Hz Make a sinusoidally modulated tone by multiplying carrier with modulator: sin(2  t c) · (0.5 · sin(2  t m) + 0.5) or by adding sine components of frequencies c+ m, c - m 0.5 · sin(2  t c) + 0.25 · sin(2  t (c+m)) + 0.25 · sin(2  t (c-m)) (!) The SAM tone has no energy at the modulation frequency. Nevertheless the modulator influences the perceived pitch. Example with carrier frequency c:5000 Hz, modulator frequency f:400 Hz Make a sinusoidally modulated tone by multiplying carrier with modulator: sin(2  t c) · (0.5 · sin(2  t m) + 0.5) or by adding sine components of frequencies c+ m, c - m 0.5 · sin(2  t c) + 0.25 · sin(2  t (c+m)) + 0.25 · sin(2  t (c-m)) (!) The SAM tone has no energy at the modulation frequency. Nevertheless the modulator influences the perceived pitch.

AN Phase Locking to SAM Tones Medium SR fibers phase lock better to amplitude modulation than high or low SR fibers. The ability to phase lock to amplitude modulation declines with high sound levels. Medium SR fibers phase lock better to amplitude modulation than high or low SR fibers. The ability to phase lock to amplitude modulation declines with high sound levels. Spontaneous rate High MediumLow Sound level High Low

Spherical Bushy Stellate Globular Bushy Stellate Octopus Fusiform Cell Types of the Cochlear Nucleus AVCN PVCN DCN

CN Phase Locking to Amplitude Modulations Certain cell types in the CN (including Onset and some Chopper types) exhibit much stronger phase locking to AM than is seen in their AN inputs.

Encoding of Envelope Modulations in the Midbrain Neurons in the midbrain or above show much less phase locking to AM than neurons in the brainstem. Transition from a timing to a rate code. Some neurons have bandpass MTFs and exhibit “best modulation frequencies” (BMFs). Topographic maps of BMF may exist within isofrequency laminae of the ICc, (“periodotopy”). Neurons in the midbrain or above show much less phase locking to AM than neurons in the brainstem. Transition from a timing to a rate code. Some neurons have bandpass MTFs and exhibit “best modulation frequencies” (BMFs). Topographic maps of BMF may exist within isofrequency laminae of the ICc, (“periodotopy”).

Periodotopic maps via fMRI Baumann et al Nat Neurosci 2011 described periodotopic maps in monkey IC obtained with fMRI. They used stimuli from 0.5 Hz (infra-pitch) to 512 Hz (mid-range pitch). Their sample size is quite small (3 animals – false positive?) The observed orientation of their periodotopic map (medio-dorsal to latero- ventral for high to low) appears to differ from that described by Schreiner & Langner (1988) in the cat (predimonantly caudal to rostral) Baumann et al Nat Neurosci 2011 described periodotopic maps in monkey IC obtained with fMRI. They used stimuli from 0.5 Hz (infra-pitch) to 512 Hz (mid-range pitch). Their sample size is quite small (3 animals – false positive?) The observed orientation of their periodotopic map (medio-dorsal to latero- ventral for high to low) appears to differ from that described by Schreiner & Langner (1988) in the cat (predimonantly caudal to rostral)

SAM tones should only activate the high frequency parts of the tonotopically organized A1. However, activity (presumed to correspond to the low pitch of these signals) is also seen in the low frequency parts of A1. This activity is thought to be organized in a concentric periodotopic map. However... SAM tones should only activate the high frequency parts of the tonotopically organized A1. However, activity (presumed to correspond to the low pitch of these signals) is also seen in the low frequency parts of A1. This activity is thought to be organized in a concentric periodotopic map. However... Proposed Periodotopy in Gerbil A1

Periodotopy inconsistent in ferret cortex Nelken, Bizley, Nodal, Ahmed, Schnupp, King (2008) J. Neurophysiol 99(4) SAM toneshp Clickshp IRN animal 1 animal 2

A pitch area in primates? In marmoset, Pitch sensitive neurons are most commonly found on the boundary between fields A1 and R. Fig 2 of Bendor & Wang, Nature 2005

A pitch sensitive neuron in marmoset A1? Apparently pitch sensitive neurons in marmoset A1. Fig 1 of Bendor & Wang, Nature 2005 Apparently pitch sensitive neurons in marmoset A1. Fig 1 of Bendor & Wang, Nature 2005

Artificial Vowel Sounds

Bizley et al J Neurosci 2009 Responses to Artificial Vowels

Pitch (Hz) Vowel type (timbre) Joint Sensitivity to Formants and Pitch Bizley, Walker, Silverman, King & Schnupp - J Neurosci 2009

Mapping cortical sensitivity to sound features Neural sensitivity Timbre Nelken et al., J Neurophys, 2004 Bizley et al., J Neurosci, 2009

Summary Periodic Signals (click trains, harmonic complexes) with periods between ca. 30-3000 Hz tend to have a strong pitch. Aperiodic signals (noises, isolated clicks) have “weak” pitches or no pitch at all. Speech sounds include periodic (vowels, voiced consonants) and aperiodic (unvoiced consonants) sounds. There are competing place and timing (temporal) theories of pitch. Place theories depend on the representation of spectral peaks in tonotopic parts of the auditory pathway. Some important features (e.g. formants of speech sounds) are represented tonotopically, but place cannot represent harmonic structure well. Timing theories postulate that early stages of the auditory system measures spike intervals in phase locked discharges. In midbrain and cortex pitch appears to be represented through rate codes. Whether there are “periodotopic maps” or specialized pitch areas in the central auditory system is highly controversial. Periodic Signals (click trains, harmonic complexes) with periods between ca. 30-3000 Hz tend to have a strong pitch. Aperiodic signals (noises, isolated clicks) have “weak” pitches or no pitch at all. Speech sounds include periodic (vowels, voiced consonants) and aperiodic (unvoiced consonants) sounds. There are competing place and timing (temporal) theories of pitch. Place theories depend on the representation of spectral peaks in tonotopic parts of the auditory pathway. Some important features (e.g. formants of speech sounds) are represented tonotopically, but place cannot represent harmonic structure well. Timing theories postulate that early stages of the auditory system measures spike intervals in phase locked discharges. In midbrain and cortex pitch appears to be represented through rate codes. Whether there are “periodotopic maps” or specialized pitch areas in the central auditory system is highly controversial.

Similar presentations