CS 551/651: Structure of Spoken Language Lecture 11: Overview of Sound Perception, Part II John-Paul Hosom Fall 2010.

CS 551/651: Structure of Spoken Language Lecture 11: Overview of Sound Perception, Part II John-Paul Hosom Fall 2010

Equal-Loudness Contours The subjective loudness of a sine wave may be determined as a function of frequency, creating an equal-loudness contour Instead of measuring loudness directly, what is measured is how intense a 1000-Hz tone must be to sound equally loud as the test frequency. Or, how intense a test frequency must be in order to sound as loud as a 1000-Hz tone The perceived equal loudness at each frequency can be plotted Results are easily influenced by test conditions; “exact shapes … should not be taken too seriously” (Moore, p. 54) These contours are only for steady sounds at a single frequency; such sounds are almost never of practical interest.

Equal-Loudness Contours Equal-loudness contours, from Moore p.55 minimum audible field

Critical Bands Fletcher (1940) conducted an experiment in which there was bandpass noise and a single sine wave. The frequency of the sine wave was always at the center frequency of the noise, and the power density of the noise was fixed. The bandwidth of the noise was varied, and for each bandwidth the minimum intensity at which the sine wave could be perceived was determined. (With increasing bandwidth, the total energy of the noise increased). 0 Hz Nyquist 0 Hz Nyquist power (dB) noise sine wave noise sine wave

Critical Bands

The result was that the threshold for minimum detectability increased with bandwidth up to a limit; after that limit, the noise could increase in bandwidth and power without affecting the perception of the sine wave This implies a band-pass filter structure to sound perception; sound is filtered by a bank of overlapping band-pass filters called auditory filters. This filter structure is based on the structure of the BM. The bandwidth of the filter is related to the bandwidth of the noise at which the sine-wave threshold no longer increases. This bandwidth is called the critical bandwidth (CB) Within one band, if the signal-to-noise ratio exceeds some fixed threshold, the signal (sine wave) will be heard, independent of what’s happening in other bands. This threshold may vary from person to person, typically 0.4

Critical Bands The auditory filters can be approximated by rectangular filters, but better determination of the filter shape is possible. The critical bandwidth at a particular frequency can be estimated using the formula where P is the intensity of the signal, N 0 is the intensity of the noise over a 1-Hz range, K is the threshold of detectability (usually 0.4), and W is the CB. For example, the CB at 1000 Hz is 160 Hz; however, in reality rectangular filters are not accurate; the shape changes with frequency and amplitude Better approximations of the auditory filters look like this:

Critical Bands The shape of auditory filters, as a function of frequency (as determined from masking experiments) and amplitude (as determined from “notched-noise” experiments) (From Moore, p. 105, 110)

Critical Bands In general, CBs are nearly symmetric below levels of 45 dB; at higher energies, bandwidth increases and cutoff “skirt” sharper above the center frequency. Cutoff skirt of 65 dB/oct. at 500 Hz to 100 dB/oct. at 8 kHz. For frequencies 500 Hz, CB increases with frequency, bandwidth of 700 Hz near 4 kHz For frequencies above 1000 Hz, ratio of frequency to bandwidth roughly constant, Q=5 or 6. If two sounds occur within one critical band, the sound with much higher energy dominates perception & masks the other sound. From critical bands, models of perceptual non-linear warpings of the frequency scale have been developed, namely the Bark and Mel scales

Mel Scale and Bark Scale Mel scale: Bark scale:

Masking Masking is a phenomenon in which perception of one sound is obscured by the presence of another sound Masking can occur in both the time and frequency domains, and prevents us from computing perceived loudness of a complex tone as the sum of its components. Frequency masking can be shown by fixing one sound (sine wave at a given frequency and intensity) and varying a second sine wave’s intensity to determine at what intensity it can be perceived. For example, given one tone at 1200 Hz and 80 dB, how loud does a second tone at X Hz have to be in order to be heard? This threshold can be plotted, as well as what the perceived sound is like when above the threshold

Masking From O’Shaugnessy, p. 126 (1929)

Masking Masking occurs in both time and frequency domains; two successive signals within the same CB may show masking effects For example, noise will mask a following tone if the noise is sufficiently loud and the delay between noise and tone is short (forward masking) Energy needed to mask the tone increases with delay and duration of tone; beyond 100 − 200 msec no masking occurs Backward masking occurs (a tone may be masked by a subsequent noise), but only within a 20 msec window. Backward masking may not be purely involuntary; trained subjects may show no backward masking (Moore, p. 129)

Masking: Complex Stimuli These CB and masking experiments typically use simple stimuli; sine waves, clicks, and/or band-pass white noise Processing of complex sounds not as simple as extrapolating effects of simple sounds For example, phase-locking is suppressed in regions between speech formants, enhancing the effect of formants. Complex sounds are louder if they occur over more than one critical band Two tones, both individually below the threshold of hearing, may be perceived when played simultaneously; roughly speaking, the total energy of all tones within a CB determines the threshold. In general, most phenomena can be understood in terms of bands of overlapping band-pass filters

Temporal Integration Absolute thresholds and loudness depend on duration of the stimulus For durations < 200 msec, intensity necessary for detection increases with shorter duration For detecting short-duration tones, the threshold for detection is the product of the tone intensity relative to an intensity- detection threshold for longer sounds, and a time-integration constant: (I − I L )   = detection threshold (energy) where I is the intensity of the short stimulus, I L is the intensity at which the stimulus is detected with duration > 200 msec, and  is the integration time of the auditory system. Some experiments show that  varies with frequency; other experiments show that it doesn’t; at any rate,  has a value between 150 and 375 msec.

Temporal Integration Detecting change in intensity is slightly different Usually a two-alternative forced-choice (2AFC) test is used: (a) two successive stimuli are presented, which differ in one target aspect, (b) the subject indicates which of the two stimuli have more of the target aspect (e.g. loudness, amplitude modulation) The point at which 75% of responses are correct is the threshold for detecting the change. The threshold for detecting intensity has the following model: For band-pass filtered noise, the ratio of  I/I is constant; this is an example of Weber’s law, which states that the smallest detectable change is proportional to the magnitude of the stimulus. Usually,  L is 0.5 to 1 dB

Temporal Integration For pure tones, Weber’s law doesn’t quite hold; a plot of  I to I yields a line with slope 0.9 instead of 1.0 (discrimination improves slightly at higher intensity levels) The time-domain window sizes for computing I and  I are not specified by the model; within one frequency band, a large window for computing I and smaller window for computing  I can show relatives energy changes within the band

Pitch Perception Pitch is the perceived main frequency of a sound; closely related to objective measure of F0. Timing Theory of Pitch: Lower frequencies have pitch estimated based on phase- locking (time-synchronous firing of neurons) Place Theory of Pitch: All frequencies have pitch estimated based on location on BM at which neurons fire most frequently Pitch can be perceived even when the fundamental frequency is not present; pitch determination based on higher harmonics (e.g. telephone-band speech which has energy only > 300 Hz) Harmonics in F1 region especially important for pitch Both timing and place are important; timing allows fine resolution, place allows pitch perception in higher frequencies

Pitch Perception Frequency discrimination of pure tones: the smallest detectable change in frequency increases with frequency Tests done using two-alternative forced choice (2AFC): “which has higher pitch?”

Pitch Perception Zwicker’s place model: detection of pitch of pure tones is equivalent to detecting change in excitation level on low- frequency side of excitation pattern However, this model doesn’t account for all observed phenomena, lending support to the phase-lock time theory in which pitch is measured directly by the inverse of the time between neural firings (at least for frequencies < 4 kHz)

Pitch Perception Pitch perception of complex tones depends on harmonics of the fundamental; if clicks occur with harmonics every 200 Hz, and if all harmonics other than those at 1800, 2000, and 2200 Hz are filtered out (removed), the perceived pitch is still 200 Hz. Pattern recognition models Basic Idea: find a fundamental frequency which has harmonics that match the existing harmonics. Details can become very complex (Moore, 189-192) Temporal models Basic Idea: the pitch value is determined by the periodicity of the total waveform containing harmonics; in other words, determined by constructive interference of several harmonics in time domain. No model alone accounts for all phenomena of perceived pitch

CS 551/651: Structure of Spoken Language Lecture 11: Overview of Sound Perception, Part II John-Paul Hosom Fall 2010.

Similar presentations

Presentation on theme: "CS 551/651: Structure of Spoken Language Lecture 11: Overview of Sound Perception, Part II John-Paul Hosom Fall 2010."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS 551/651: Structure of Spoken Language Lecture 11: Overview of Sound Perception, Part II John-Paul Hosom Fall 2010.

Similar presentations

Presentation on theme: "CS 551/651: Structure of Spoken Language Lecture 11: Overview of Sound Perception, Part II John-Paul Hosom Fall 2010."— Presentation transcript:

Similar presentations

About project

Feedback