Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 551/651: Structure of Spoken Language Lecture 10: Overview of Sound Perception John-Paul Hosom Fall 2010.

Similar presentations


Presentation on theme: "CS 551/651: Structure of Spoken Language Lecture 10: Overview of Sound Perception John-Paul Hosom Fall 2010."— Presentation transcript:

1 CS 551/651: Structure of Spoken Language Lecture 10: Overview of Sound Perception John-Paul Hosom Fall 2010

2 Anatomy

3 The Outer Ear composed of: pinna (auricle) and external auditory meatus (ear canal) functions: irregular shape of pinna directs high-frequency sounds into ear canal shape of pinna helps with determining location of sound ear canal acts as resonator (2.7 cm long), with broad resonance between 3 to 5 kHz implications: smaller animals better at hearing high-frequency sounds

4 Anatomy The Middle Ear composed of: chamber (tympanic cavity) containing ossicular chain: malleus (hammer), incus (anvil), stapes (stirrup) middle ear also contains epitympanic recess. (The ossicles are lodged in the epitympanic recess.) tympanic membrane (ear drum) is partition between ear canal (outer ear) and middle ear sound transferred from tympanic membrane to cochlea (inner ear) via ossicular chain stapes connects to footplate, which connects to “oval window,” which is membrane of inner ear functions: matches acoustic vibration of air to that of fluid in cochlea (if air directly hits oval window (water), there’s a 30 dB drop in energy) has low-pass filter effect

5 Anatomy The Inner Ear composed of: cochlea, semi-circular canals function: simply speaking, the inner ear performs a frequency analysis of the incoming sound, which is transmitted via VIII th cranial nerve to CNS. cochlea: spiral in shape 35 mm long, wound in 2 ¾ turns filled with “incompressible” “water-like” fluid separated into 3 parts by two membranes, the Basilar Membrane (BM) and Vestibular (Reissner’s) Membrane thousands of hair cells are attached to BM; these cells are connected to neurons that fire in response to sound

6 Anatomy The Inner Ear cochlea: Sound from ear canal is amplified by middle ear Vibration of bone against oval window is received by cochlea (not just at oval window, but by entire cochlea; no standing waves); Entire cochlea vibrates at the same frequency as the stimulus Different locations of the cochlea respond better to particular frequencies; Higher frequencies respond more near base of cochlea. Cochlea and VIII th nerve have tonotopic organization: direct mapping between place and frequency

7 Anatomy A schematic of the cochlea “unrolled” (middle) and basilar membrane (bottom). The top figure indicates the tonotopic organization. (from J.D. Durrant and J.H. Lovrinic, Bases of Hearing Science, 1977, in Daniloff p. 395)

8 Anatomy This figure shows instantaneous displacement of the BM for two instants in time, in response to 200-Hz sine wave, and the envelope of amplitude peaks for this wave. Each point on BM vibrates at a frequency equal to the input frequency (200 Hz).

9 Tonotopic organization BM varies in tautness and shape along its length, creating different frequency responses Tautness at base responds well to high-frequency sounds; compliance at apex (tip) responds well to low-frequency sounds. Each point in BM has a “characteristic frequency” (CF) at which the frequency response is maximum The bandpass shape of a CF filter has constant ratio of frequency to bandwidth, implying better resolution (lower bandwidth) at lower frequencies Anatomy

10 Transduction Between BM and tectorial membrane (A thin, responsive, gelatinous membrane) are hair cells; about 25,000-30,000 outer hair cells, 3500-5000 inner hair cells in humans. (“Tunnel of Corti” separates outer from inner). Each hair cell has 30-300 hair-like projections called stereocilia protruding from the surface into the fluid-filled cavity in a “bundle.” When BM vibrates up and down, it creates a “shearing” motion between tectorial membrane and stereocilia. This motion causes tips of stereocilia to be displaced, causing electrical action potentials in a hair cell; the electrical signal is then transmitted down auditory nerve. Anatomy

11 Transduction Most (95%) neurons connecting cochlea to higher levels in auditory system connect to inner hair cells Function of outer hair cells less clear; provides amplification, sharp tuning (partially under the control of higher levels). Hair cells connect to neurons; about 30,000 neurons in one human auditory nerve. Anatomy

12 1-Inner hair cell 2-Outer hair cells 3-Tunnel of Corti 4-Basilar membrane 5-Habenula perforata 6-Tectorial membrane 7-Deiters' cells 8-Space of Nuel 9-Hensen's cells 10-Inner spiral sulcus from: http://www.iurc.montp.inserm.fr/cric/audition/english/corti/corti.htm Anatomy Organ of Corti contains hair cells and neurons: 11- nerves three parallel rows 11

13 Anatomy The same picture from Gray’s Anatomy (1918):

14 Neural Response Each neuron in the auditory nerve responds to certain frequencies; the response to each frequency can be plotted by stimulating a neuron with a particular frequency and measuring the response rate (firing rate) of the neuron The most sensitive frequency is the “Characteristic Frequency” (CF)

15 Neural Response Auditory firings processed by two types of neurons: ones extracting precise temporal features (onset chopper units), others for spectral features (transient chopper units). (O’Shaughnessy p. 113) Each neuron has spontaneous rate of firing; this rate depends on the sensitivity of the neuron (high spontaneous rates associated with low threshold of firing). 3 “groups” of spontaneous rates: high rate(61%, 18 to 250 spikes/sec), medium rate(23%, 0.5 to 18 spikes/sec), low rate(16%, <0.5 spikes/sec);.

16 Neural Response The firing rate of a neuron to a given stimulus can be plotted: Firing rate has a dynamic range; if intensity is below or above this range, firing rate won’t change. Typical range of 20 dB for low-threshold fibers, 40-50 dB for high-threshold fibers audio visual detection level = threshold

17 Neural Response With three groups of neurons with different thresholds and firing rates, can cover wide range of signal levels at a given frequency: high rate low rate

18 Phase Locking In addition to encoding frequency according to place along the BM, information is encoded in the rate of neuron firing Upper limit of 4 to 5 kHz for phase locking Neural Response This figure shows the number of neuron firings over time in response to three different tones; the timing of the firings is related to the frequency of the tone msec count 2.45 msec/group = 408 Hz 1.18 msec/group = 850 Hz 1.0 msec/group = 1000 Hz firings tone =

19 Neural Response Neural Recruitment Another “method” for increasing dynamic range is for multiple neurons to fire in response to the same stimulus If stimulus is low in energy, a small number of neurons, located near the CF, fire More intense stimuli cause more neurons, located farther from the CF, to fire weak stimulus strong stimulus (same frequency) 1 line = 50–100 fibers

20 Neural Response Adaptation If stimulus remains, neurons “adapt” to it, decreasing the firing rate with an exponential rate of decay (time constant   40 msec). Most of decay occurs within 15-20 msec of stimulus onset. When stimulus removed, firing rate falls to near zero and then exponentially increases back to “spontaneous” rate. There may be two classes of neurons: neurons that respond to steady-state sounds, neurons that respond to changes in frequency, with frequency sensitivity greatest at levels near human speech (O’Shaugnessy p. 119)

21 Hearing Threshold This figure shows the absolute thresholds of hearing, as a function of frequency

22 JND “Just Noticeable Difference”: measure of ability to perceive a difference JND tests: Two stimuli differ along one dimension, otherwise identical Subjects asked if two sounds are the same or different (“AX” test, is X=A?) Or subjects are asked which of two sounds most resembles third (“ABX” or “AXB” test, is X = A or B?) The JND occurs when 75% of responses are “different” (AX) or correctly identified (ABX) People are able to discriminate between 100 Hz and 101 Hz, but can’t identify if a tone is 100, 101, …, 109 Hz without making pairwise comparisons

23 JND JND Trivia: JND is greater for louder sounds, sounds with duration  250 msec Sounds of equal intensity increase in loudness up to 200 msec Below 1 kHz, two tones must be different by 1 to 3 Hz to be perceived as different At higher frequencies, JND is larger (approx. 8 kHz tones require a 100 Hz separation to be heard as different) Entire frequency range has 1600 distinguishable frequencies and 350 intensities, or about 300,000 tones of frequency and intensity that can be identified in pairwise combination (for durations > 200 msec) For duration < 250 msec, there are 850 frequency levels; for duration < 10 msec only 120 levels and 170 intensities Identification of frequencies in isolation yields much fewer tones.

24 JND JND Trivia, Timing Information Onsets of two signals must differ by at least 2 msec to be heard as separate sounds To identify order of two signals, about 17 msec gap is required and sounds must be 125-200 msec long However, people use rise and fall of amplitude to segment speech; can not identify order of 4 vowels of 200 msec duration in repeating sequence, but can identify much shorter vowels if there are amplitude onsets and offsets as well as 50 msec gap between vowels. Sounds with energy onset < 20 msec heard as “plucks”; otherwise, heard as “bow”


Download ppt "CS 551/651: Structure of Spoken Language Lecture 10: Overview of Sound Perception John-Paul Hosom Fall 2010."

Similar presentations


Ads by Google