Modelling compensation for reverberation: work done and planned Amy Beeston and Guy J. Brown Department of Computer Science University of Sheffield EPSRC.

Modelling compensation for reverberation: work done and planned Amy Beeston and Guy J. Brown Department of Computer Science University of Sheffield EPSRC 12-month meeting, Sheffield: 23rd Oct 2009

1 of 36 Overview 1.Work done  sir-stir framework  3 Across-channel model configurations 2.Work planned  Across-channel model  Within-channel model  Further questions

2 of 36 Part 1: Work Done  sir-stir framework  3 Across-channel model configurations

Watkins’ sir/stir paradigm 3 of 36 Category boundary, step Human listeners Context distance, m 100.32 10 5 0 effect of reverberation effect of compensation more ‘sir’ responses more ‘stir’ responses

4 of 36 Efferent auditory processing  Reverberating a speech signal reduces its dynamic range  Reflections fill gaps in the temporal envelope  Efferent system helps control dynamic range (Guinan & Gifford, 1988).  Could compensation be characterised as restoration of dynamic range? mean= small value mean/peak= 0.1216 mean= larger value mean/peak= 0.2142 dry reverberated

5 of 36 mean-to-peak ratio (MPR)  measured over some time-window  peak does not vary greatly with source-receiver distance  mean increases with source-receiver distance  MPR = mean/peak therefore MPR increases with distance

6 of 36 Modelling framework

7 of 36 Stimuli Watkins, JASA 2005, experiment 5  forward/reversed speech carrier forward/reversed reverberation fwdrev fwd rev speech carrier reverberation Watkins, JASA 2005, experiment 4  reverberate, then flip polarities: noise after flip polarities, then reverberate: noise before afterbefore noise

8 of 36 Auditory Periphery  Outer/middle ear Simulates human data from Huber et al. (2001)  Basilar membrane DRNL – dual resonance nonlinear filterbank (DRNL) Originally proposed by Meddis, O’Mard and Lopez-Poveda (2001) Human parameters from Meddis (2006) Efferent attenuation introduced by Ferry and Meddis (2007)  Hair cell Linear output between threshold and saturated firing rate (Messing 2007) Does not model adaptation in the auditory nerve

Best frequency (Hz, log-spaced) 100 8000 100 8000 100 8000 Time Auditory Nerve STEP 9 of 36 Spectro-temporal excitation patterns … ok, next you’ll getto click on … {} sir stir

10 of 36 Efferent attenuation based on dynamic range  Idea to control the amount of efferent attenuation applied in the model according to the dynamic range of the context  Dynamic range measured according to mean-to-peak ratio in AN response Kurtosisnegative differentials offsetsmean-to-peak ratio

11 of 36 Auditory Nerve response  Across-channel model the auditory nerve response is summed across all frequency channels prior to implementation of the efferent system component  Within-channel model the auditory nerve response is NOT summed across all frequency channels

12 of 36 Across Channel  the auditory nerve response is summed across all frequency channels prior to implementation of the efferent system component Σ MPRATT frequency >> time >>

13 of 36 Within Channel  Auditory nerve response in each frequency channel influences the efferent system component frequency >> MPRATT MPRATT MPRATT MPRATT MPRATT MPRATT MPRATT MPRATT time >>

Efferent attenuation MPR ATT  Linear map from MPR (of summed AN) to efferent attenuation, ATT  ATT turns down the gain on the non-linear pathway of DRNL  The rate-intensity curve shifts to the right 14 of 36

15 of 36 Recognition  helps to recover the dip in the temporal envelope corresponding to the ‘t’ closure in ‘stir’ Templates: sirstir

16 of 36 3 Model configurations  Open loop  Semi-closed loop amount of attenuation is estimated during one second preceding the test- word, and held constant thereafter  Closed loop amount of attenuation is estimated continually in a sliding time window, and updated on a sample-by-sample basis (or at a specified control rate)

17 of 36 Efferent system: Open loop  Open loop  Many simulations were run: the amount of attenuation applied was varied across a range of values (0-30 dB), and the category boundary resulting were recorded in calibration charts.  The ‘best-match’ to human results was found (manually) for each condition.  Near contexts match best with low attenuation values, while far contexts match best with higher attenuation values.

18 of 36 Results: Open loop Category Boundary results 12.5 9.0 22.0 21.5 Attenuation applied (dB) farnear Attenuation applied (dB) Calibration curves for tuning 0, 0.5, … 29.5, 30

19 of 36 Efferent system: Semi-closed loop  Semi-closed loop  amount of attenuation is estimated during one second preceding the test- word, and held constant thereafter

20 of 36 Semi-closed loop ATT …………… ok, next you’ll getto click on …………… {} sir stir  Examine context within time window to derive a metric value  Use metric value to determine the efferent attenuation

21 of 36 Metric: Semi-closed loop MPR MPR in experiment 5 (JASA 2005)  across-channel AN response measured over 1s window before test-word  increases with context distance  small difference for reversed-carrier (squares/circles)  larger difference (decrease) for reversed-reverb (black/white)

22 of 36 Results: Semi-closed loop  Tuned to match near-near and far-far (fwd fwd) conditions  experiment 5 achieves qualitative (not quantitative) match to human data…  …but experiment 4 conditions do not match well ATTENUATION=(38.36*MPR)+13.77 fwdrev forward reverse speech carrier reverberation before after

23 of 36 Efferent system: Closed-loop  Closed loop  amount of attenuation is estimated continually in a sliding time window, and updated on a sample-by-sample basis (or at a specified control rate)

24 of 36 Closed loop ATT …………… ok, next you’ll getto click on …………… {} sir stir  Examine context within time window to derive a metric value  Use metric value to determine the efferent attenuation applied  Window slides forward, process repeats…

25 of 36 Metric: Closed loop MPR Expt. 5 MPR in experiment 5 (JASA 2005)  measured continually over window, shifting forwards from start of context  Insignificant change with reversal of speech carrier (left/right)  Significant change with reversal of reverberation (top/bottom) MPR through time forward reverse reverberation fwdrev speech carrier

26 of 36 Closed loop (expt 5)  tuned to ‘best’ match near-near and far-far (fwd fwd) conditions  variation possible due to granularity of model (± 0.5) ATTENUATION=(45*MPR)+18ATTENUATION=(45*MPR)+19 fwdrev speech carrier fwdrev speech carrier fwd rev reverberation fwd rev reverberation

27 of 36 Closed loop (expt 4)  MPR mapping does not generalise for experiment 4 noise contexts ATTENUATION=(45*MPR)+18 fwdrev speech carrier fwd rev reverberation before after

28 of 36 Part 2: Work Planned  Across-channel model  Within-channel model  Further questions

29 of 36 Across-channel model Practical considerations  Control rate specified to speed up the simulation (usually 1 kHz i.e., attenuation parameter is updated every 1 ms)  Time-window over which to determine metric (usually previous 1 second, different values under investigation at present)  Shape of window (rectangular at present, should have a ‘forgetting function’) Question to Tony et al.  What data can we use to determine the shape/duration window?

30 of 36 Across-channel model Σ MPRATT frequency >> time >>  window shape/duration? time >> weight

31 of 36 Within-channel model  Previously we asked what duration and shape is the metric-window in time.  Now we ask what duration and shape is the metric-window in frequency.  /t/ is defined by sharp onset burst 2->8 kHz (Régnier & Allen, 2008)  template matching over restricted areas of the frequency domain

32 of 36 Within-channel model Frequency-dependent suppression:  Feedback from efferent system appears to be fairly narrowly tuned  fall-off in the effect of efferent-induced threshold shift at low BFs [data from cat, Guinan & Gifford (1988)]  improves representation of low-frequency speech structure when efferent attenuation is high Modelling implications:  Need no longer be a pooled auditory nerve (STEP) response for metric/map to attenuation  Each channel can react quasi-independently to the audio context it hears

33 of 36 Within-channel model frequency >> MPRATT MPRATT MPRATT MPRATT MPRATT MPRATT MPRATT MPRATT time >> weight  window shapes/durations?

34 of 36 Questions Is there a time-analogy to the frequency gaps in 8-band stimuli? -imposing gaps so that bits are missing from the freq/time pattern in the context window. -might allow an importance weighting for time-bands like for the frequency bands.

35 of 36 Implication? What happens with a silent context?  Physiology predicts that efferent system is not activated  Model predicts small dynamic range, - maximum mean/peak ratio - high efferent attenuation - low category boundary (more stirs)  specifically, if (when) context is shorter than metric window: - should we shorten the metric window? - zero pad the utterances? - count previous trial as context?

36 of 36 Thanks  Tony Watkins, Simon Makin and Andrew Raimond of Reading University for all the data.  Ray Meddis and Robert Ferry of Essex University for the DRNL program code.  Kalle Palomäki, Hynek Hermansky and Roger Moore for discussion.

The end

Modelling compensation for reverberation: work done and planned Amy Beeston and Guy J. Brown Department of Computer Science University of Sheffield EPSRC.

Similar presentations

Presentation on theme: "Modelling compensation for reverberation: work done and planned Amy Beeston and Guy J. Brown Department of Computer Science University of Sheffield EPSRC."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Modelling compensation for reverberation: work done and planned Amy Beeston and Guy J. Brown Department of Computer Science University of Sheffield EPSRC.

Similar presentations

Presentation on theme: "Modelling compensation for reverberation: work done and planned Amy Beeston and Guy J. Brown Department of Computer Science University of Sheffield EPSRC."— Presentation transcript:

Similar presentations

About project

Feedback