perceptual constancy in hearing speech played in a room, several metres from the listener has much the same phonetic content as when played nearby despite.

Slides:



Advertisements
Similar presentations
Revised estimates of human cochlear tuning from otoacoustic and behavioral measurements Christopher A. Shera, John J. Guinan, Jr., and Andrew J. Oxenham.
Advertisements

Introduction to MP3 and psychoacoustics Material from website by Mark S. Drew
Sounds that “move” Diphthongs, glides and liquids.
Auditory scene analysis 2
Acoustic Characteristics of Vowels
Timbre perception. Objective Timbre perception and the physical properties of the sound on which it depends Formal definition: ‘that attribute of auditory.
Periodicity and Pitch Importance of fine structure representation in hearing.
Room Acoustics: implications for speech reception and perception by hearing aid and cochlear implant users 2003 Arthur Boothroyd, Ph.D. Distinguished.
CS 551/651: Structure of Spoken Language Lecture 11: Overview of Sound Perception, Part II John-Paul Hosom Fall 2010.
Speech perception 2 Perceptual organization of speech.
Speech Science XII Speech Perception (acoustic cues) Version
Vern J. Ostdiek Donald J. Bord Chapter 6 Waves and Sound (Section 5)
Effect of reverberation on loudness perceptionInsert footer on Slide Master© University of Reading Department of Psychology 12.
Reflections Diffraction Diffusion Sound Observations Report AUD202 Audio and Acoustics Theory.
Niebuhr, D‘Imperio, Gili Fivela, Cangemi 1 Are there “Shapers” and “Aligners” ? Individual differences in signalling pitch accent category.
SIMS-201 Characteristics of Audio Signals Sampling of Audio Signals Introduction to Audio Information.
SWE 423: Multimedia Systems Chapter 3: Audio Technology (1)
IT-101 Section 001 Lecture #8 Introduction to Information Technology.
SYED SYAHRIL TRADITIONAL MUSICAL INSTRUMENT SIMULATOR FOR GUITAR1.
Speech Perception Overview of Questions Can computers perceive speech as well as humans? Does each word that we hear have a unique pattern associated.
3-D Sound and Spatial Audio MUS_TECH 348. Cathedral / Concert Hall / Theater Sound Altar / Stage / Screen Spiritual / Emotional World Subjective Music.
Watkins, Raimond & Makin (2011) J Acoust Soc Am –2788 temporal envelopes in auditory filters: [s] vs [st] distinction is most apparent; - at higher.
PSY 369: Psycholinguistics
ICA Madrid 9/7/ Simulating distance cues in virtual reverberant environments Norbert Kopčo 1, Scott Santarelli, Virginia Best, and Barbara Shinn-Cunningham.
Interrupted speech perception Su-Hyun Jin, Ph.D. University of Texas & Peggy B. Nelson, Ph.D. University of Minnesota.
SPEECH PERCEPTION The Speech Stimulus Perceiving Phonemes Top-Down Processing Is Speech Special?
Spectral centroid 6 harmonics: f0 = 100Hz E.g. 1: Amplitudes: 6; 5.75; 4; 3.2; 2; 1 [(100*6)+(200*5.75)+(300*4)+(400*3.2)+(500*2 )+(600*1)] / = 265.6Hz.
Relationship between perception of spectral ripple and speech recognition in cochlear implant and vocoder listeners L.M. Litvak, A.J. Spahr, A.A. Saoji,
Reverberation parameters and concepts Absorption co-efficient
Sound source segregation (determination)
Measurement of Sound Decibel Notation Types of Sounds
1 Recent development in hearing aid technology Lena L N Wong Division of Speech & Hearing Sciences University of Hong Kong.
Invariance and context Nothing in the real world is interpreted on its own. –Low-level vision Completely different energy spectra are interpreted as the.
Source/Filter Theory and Vowels February 4, 2010.
Acoustics Reverberation.
EE Audio Signals and Systems Effects Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
SOUND IN THE WORLD AROUND US. OVERVIEW OF QUESTIONS What makes it possible to tell where a sound is coming from in space? When we are listening to a number.
Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto.
The Care and Feeding of Loudness Models J. D. (jj) Johnston Chief Scientist Neural Audio Kirkland, Washington, USA.
Sh s Children with CIs produce ‘s’ with a lower spectral peak than their peers with NH, but both groups of children produce ‘sh’ similarly [1]. This effect.
Sounds in a reverberant room can interfere with the direct sound source. The normal hearing (NH) auditory system has a mechanism by which the echoes, or.
speech, played several metres from the listener in a room - seems to have the same phonetic content as when played nearby - that is, perception is constant.
Pragmatically-guided perceptual learning Tanya Kraljic, Arty Samuel, Susan Brennan Adaptation Project mini-Conference, May 7, 2007.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Sensation & Perception
Hearing Aid (HA) and Cochlear Implant (CI) users provided subjective ratings of usability for speech-to-interference ratios (SIRs) presented in a single-interval,
Modelling compensation for reverberation: work done and planned Amy Beeston and Guy J. Brown Department of Computer Science University of Sheffield EPSRC.
Hearing Research Center
Temporal masking of spectrally reduced speech: psychoacoustical experiments and links with ASR Frédéric Berthommier and Angélique Grosgeorges ICP 46 av.
CHAPTER 4 COMPLEX STIMULI. Types of Sounds So far we’ve talked a lot about sine waves =periodic =energy at one frequency But, not all sounds are like.
Listeners weighting of cues for lateral angle: The duplex theory of sound localization revisited E. A. MacPherson & J. C. Middlebrooks (2002) HST. 723.
Syllables and Stress October 21, 2015.
Introduction to psycho-acoustics: Some basic auditory attributes For audio demonstrations, click on any loudspeaker icons you see....
IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.
PSYC Auditory Science Spatial Hearing Chris Plack.
What can we expect of cochlear implants for listening to speech in noisy environments? Andrew Faulkner: UCL Speech Hearing and Phonetic Sciences.
Sound Transmission Signal degradation in frequency and time domains Boundary effects and density gradients Noise Ranging Signal optimization.
The Care and Feeding of Loudness Models J. D. (jj) Johnston Neural Audio Kirkland, Washington, USA.
distance, m (log scale) -25o 0o +25o C50, dB left right L-shaped
Auditory Localization in Rooms: Acoustic Analysis and Behavior
Auditorium Acoustics 1. Sound propagation (Free field)
Precedence-based speech segregation in a virtual auditory environment
Cognitive Processes PSY 334
Ana Alves-Pinto, Joseph Sollini, Toby Wells, and Christian J. Sumner
FM Hearing-Aid Device Checkpoint 2
Attentional Tracking in Real-Room Reverberation
EPSRC Perceptual Constancy Meeting
Loudness asymmetry in real-room reverberation: cross-band effects
1-channel 2-channel 4-channel 8-channel 16-channel Original
Speech Perception (acoustic cues)
Presentation transcript:

perceptual constancy in hearing speech played in a room, several metres from the listener has much the same phonetic content as when played nearby despite a substantial difference between the amounts of reflected sound which gives different temporal envelopes to the two signals this seems like a ‘constancy’ effect - through a ‘taking account ‘ of reverb. in preceding context

or not? Nielsen & Dau (2010) JASA 128, ; context effects with speech are ‘interference’ interference effects from preceding contexts are ubiquitous - specifically, from modulation masking; Wojtczak & Viemeister (2005) JASA don’t arise from constancy

grouping after (visual shape) constancy grouping before (visual shape) constancy Palmer, S.E. Brooks, J.L. & Nelson, R. (2003) When does grouping happen? Acta Psychologia, 114,

constancy effects are interference effects for example, in the second demo; - contexts interfere in that they distort the ovoid's perceived shape and when hearing ‘takes account’ of the context’s reverb. - contexts interfere in that they distort the subsequent words’ identities

interference effects on this time scale are not particularly ubiquitous (in speech, ‘extrinsic’ effects, from beyond the syllable, tend to be weak) forward modulation masking; - does occur at high(ish) modulation frequencies (>20 Hz) - unlikely to affect modulation frequencies important in speech (<16 Hz) (Wojtczak & Viemeister, 2005)

the main sticking point for Nielsen & Dau; if there’s no information from a preceding speech context; - how come there appears to be compensation for effects of reverb? however, compensation is likely to be the system’s ‘default’ setting - i.e. it should ‘expect’ high(ish) reverb. in sounds when it’s in a room - just as completion is the default in the first demonstration:

such behaviour is very common in perceptual systems ‘Bayesian’ approaches capture this; - the general idea is that ‘prior’ probabilities influence what we see for example, the probability that the middle column here is full dots is (10 full-dots on the left, and 10 half-dots on the right) but the prior probability of a full dot is much greater than so we see the middle column as full dots - and group accordingly

compensation for reverb. in speech seems similarly ‘Bayesian’ - i.e. compensation is effected when reverb. in test words is probable the context’s reverb. largely governs this probability but when there’s no context, prior probabilities are more influential here, the perceptual system is in a room - so the prior probability of a dry test word is low - and the prior probability of a reverberant test word is higher - so the relatively high probability of test-word reverb. → compensation

here, ‘sir’ vs. ‘stir’ test words distinguished by the sounds’ temporal envelopes: e.g. the gap in ‘stir’ before voicing onset 11-step continuum end-point ‘stir’ (step 10) from amplitude modulation of other end-point, ‘sir’ (step 0) prominent effect of this AM is the gap intermediate steps, 1-9, by varying modulation depth 200 ms step 0 ‘sir’ ‘stir’ step 10 AM function time amplitude 200 ms

real-room reflection patterns: taken from an office room, volume=183.6 m 3 recorded with dummy-head transducers, facing each other room’s impulse response obtained at different distances, this varies the amount of reflected sound in signals i.e.: early (50 ms) to late energy ratio: 18 dB at 0.32 m → 2 dB at 10 m with an A-weighted energy decay rate of 60 dB per 960 ms at 10 m impulse responses convolved with ‘dry’ speech recordings headphone presentation → monaural ‘real-room’ listening

perceptual effects of room reflections: from category boundary: ‘extrinsic’ context: “next you’ll get _ to click on” increase test-word’s distance: more ‘sir’ responses, which increases category boundary increase context’s distance as well: ‘perceptual constancy’ effect i.e., fewer ‘sir’ responses, which restores category boundary mean proportion of ‘sir’ responses continuum step 0510 “sir”“stir” mean category boundary

speech processed with an 8-band noise-excited vocoder temporal envelope in each band from gammatone-filtered speech, (η=4, and bandwidths= ‘Cambridge ERBs’) each envelope applied to a (similarly) gammatone-filtered noise band centre-frequencies in kHz = 0.25 x 2 (7/12)(n-1), where n=band number, and n=1,2,…,8 grouping effect time frequency, kHz (log scale) ms ‘sir’ n step 0 step 10

what is the relative importance of the different bands in the test word? context held at 0.32 m throughout n test word’s bands test-word band varied between 0.32 m and 10 m test-word band held at 0.32 m in all conditions

S2S2 condition number (cond) test dist.=10. m test dist.=.32 m category boundary, step nW n, 1 W n, 2 W n, S1S1 S6S6 S5S5 Σ cond=6 cond=1 importance of band n = S cond W n, cond

“sir” [s ɜ ], consonant & vowel ffts 85 band no frequency, kHz (log scale) dB consonant, [s] vowel, [ ɜ ] difference

what is the relative importance of the different bands in the context? all test-word’s bands varied between 0.32 m and 10 m n context’s bands context band varied between 0.32 m and 10 m context band held at 0.32 m in all conditions

n cond=6 cond=2cond=3cond=4 cond=5 cond=1 category boundary, step context’s distance, m test dist.=10. m test dist.=.32 m S a, 1 S b, 1 Σ cond=6 cond=1 importance of band n = (S a, cond - S b, cond ) W n, cond S b, 2 S a, 2 S a, 6 S b, 6 W n, 1 W n, 2 W n,

“sir” [s ɜ ], consonant & vowel ffts 85 band no frequency, kHz (log scale) dB consonant, [s] vowel, [ ɜ ] difference

both importance functions are high-pass this could arise from a band-by-band mechanism, as the test-word’s [s] is essentially high-frequency noise

effects of removing bands from the context : if ‘default’ (a priori) setting of each band is compensation - effects should resemble those of increasing bands’ distance to 10 m all test word’s bands present, and varied between 0.32 m and 10 m n context’s bands band not present in context band held at 0.32 m in all conditions

n category boundary, step condition number (cond) test dist.=10. m test dist.=.32 m S2S2 S1S1 S6S6 S5S5 W n, 1 W n, 2 W n, Σ cond=6 cond=1 importance of band n = S cond W n, cond

“sir” [s ɜ ], consonant & vowel ffts 85 band no frequency, kHz (log scale) dB consonant, [s] vowel, [ ɜ ] difference

removing bands also gives a high-pass importance function - effects are similar to adding reverb. (increasing distance) suggests: - effective contexts should have power in the important bands - i.e. those bands where the [s] has most energy might explain why some wide-band contexts are ineffective (Watkins, 2005; Nielsen & Dau, 2010) the alternative suggestion was: - wide-band temporal envelope is too ‘smooth’ - so extra smoothing by reverb. is not apparent

for the 8 bands of the preceding context (‘next you’ll get …’); - each band given the same, wide-band temporal envelope → ‘wide band’ condition sound’s overall power; the same as other wideband contexts, but here the energy is concentrated in the 8 bands, so the spectrum level near the 8 centre-frequencies is higher 8-band sparse-NV speech

both 8-band and wide-band contexts are very effective and both give substantial constancy effects so, ‘sharpness’ of temporal envelopes in 8-band conditions - not too crucial context’s distance, m category boundary, step unprocessed8-band wide band

some other continua - modulation depth varied as for sir-stir - but here, substantial influence of onset characteristics category boundary, step rose-roadsknees-needs context’s distance, m test dist.=.32 m test dist. = 2.5 m test dist.=10. m wash-watch

wash - watch context & test near (0.32 m) context near - test far (10. m) proportion ‘wash’ responses continuum step context & test far (10. m)

wash to watch continuum - progressive increase in modulation depth this has a substantial effect on test words’ identity little or no effect of test-word reverb. only small effects of the context’s reverb. difficult to understand in terms of modulation processing; - no apparent effects of reverb. on the test-word’s modulation - little effect of anything resembling modulation masking easy to understand in terms of reverberant ‘tails’ - onsets important for this distinction - tails don’t affect onsets much

The idea that constancy precedes grouping of the vocoder’s bands is also consistent with the difficulties encountered by users of cochlear implants when they are in cocktail-party situations; the grouping of the bands is largely of the type that comes after constancy, and so the factors responsible for this grouping are of limited utility in segregating sources (Nelson et al., 2003; Qin and Oxenham, 2003; Stickney et al. 2004). A related finding is that interactions between reverberation effects and masking effects are less apparent with vocoder simulations than they are with unprocessed speech (Poissant et al., 2006). This result-pattern seems to come about through the progressive scrambling of the fine-structure segregation cues as reverberation increases in unprocessed speech, which does not occur in vocoder simulations where these 'primitive' segregation cues are much less prevalent.