Presentation is loading. Please wait.

Presentation is loading. Please wait.

Using Fo and vocal-tract length to attend to one of two talkers. Chris Darwin University of Sussex With thanks to : Rob Hukin John Culling John Bird MRC.

Similar presentations


Presentation on theme: "Using Fo and vocal-tract length to attend to one of two talkers. Chris Darwin University of Sussex With thanks to : Rob Hukin John Culling John Bird MRC."— Presentation transcript:

1

2 Using Fo and vocal-tract length to attend to one of two talkers. Chris Darwin University of Sussex With thanks to : Rob Hukin John Culling John Bird MRC & EPSRC

3 1.Review past work on the way that the human auditory system uses differences in Fo to separate two voices; 2. Present new data on the use of Fo, vocal- tract length and their combination to allow listeners to select one of tw o simultaneous messages. Something old, something new, something borrowed, background blue.

4 Difference in Fo leads to: 1.binaural separation of sound sources 2.increase in intelligibility 3.ability to track a sound source over time. Three types of experiment:

5 Difference in Fo leads to: 1.binaural separation of sound sources 2.increase in intelligibility 3.ability to track a sound source over time. Three types of experiment:

6 Broadbent & Ladefoged (1957) PAT-generated sentence “What did you say before that?” F1F2 when Fo the same -125 Hz (either natural or monotone), listeners heard: one voice only 16/18 in one place 18/18 when Fo different -125 /135 (monotone), listeners heard: two voices 15/18 in two places 12/18

7 ... Harvey Fletcher (1953) was there first ! (almost) p 216 describes experiment (suggested by Arnold). Speech fuses but polyphonic music sounds weird since different notes are heard at different ears LP @1kHzHP @1kHz

8 B & L Conclusion Common Fo integrates –broadband frequency regions of a single voice –coming simultaneously to different ears into a single voice heard in one position.

9 Is a common Fo sufficient for fusion? Broadbent & Ladefoged's stimuli used formant resonators with broad low-frequency skirts. Sharply-filtered sounds sometimes give impression of two sound sources even with common Fo.

10 Formant T(f) & abs difference

11 Dichotic : same Fo original PSOLA Fo -> 0% PSOLA Fo -> 0% LP filter HP filter Left ear Right ear apologies to Hideki

12 Dichotic : different Fo original PSOLA Fo -> - 4% PSOLA Fo -> + 4% LP filter HP filter Left ear Right ear

13 Complementary LP/HP filters Variable bandwidth

14 Complementary LP/HP filters (dB)

15 Dichotic Results (female voice) Filter X-over @ 1 kHz

16 Dichotic Results (male voice) Dichotic

17 -| Level difference | between ears (dB)

18 Higher filter cut-offs need wider bandwidths Same Fo

19 Low-frequency overlap cf natural ILDs higher for low frequency sounds

20 ITD : same Fo original PSOLA Fo -> 0% PSOLA Fo -> 0% LP filter HP filter Left ear Right ear Delay ±571 µs

21 ITD : different Fo original PSOLA Fo -> - 4% PSOLA Fo -> + 4% LP filter HP filter Left ear Right ear Delay ±571 µs

22 ITD Results (female voice) ±570 µs ITD

23 ITD Results (male voice) ±570 µs ITD

24 Summary Fusion at same Fo? Fusion at Different Fo (±4%)? Dichotic Low-frequency overlap needed No But what about Fo’s ability to separate different voices? (original B & L question)

25 Difference in Fo leads to: 1.binaural separation of sound sources 2.increase in intelligibility 3.ability to track a sound source over time. Three types of experiment:

26  Fo improves identification double vowels sentences double vowels over by 1 semitone sentences improve for longer

27 Mechanisms of  Fo improvement A. Global: Across formant grouping by Fo (as originally conceived by B & L) B. Local: Better definition of individual formants - especially F1 where harmonics resolved At small ∆Fos B more important than A for double vowels (Culling & Darwin, JASA 1993). Also true for sentences?

28  Fo between two sentences (Bird & Darwin 1998; after Brokx & Nooteboom, 1982) Target sentence Fo = 140 Hz Masking sentence = 140 Hz ± 0,1,2,5,10 semitones Two sentences (same talker) only voiced consonants (with very few stops) Task: write down target sentence Replicates & extends Brokx & Nooteboom

29 Chimeric sentences (Bird & Darwin, Grantham Meeting 1998) 100-100100-106100-112100-133100-178 Fo below 800 HzFo above 800 Hz

30 Paired sentences' Fos Low Pass High Pass Normal100100 112112 Same Fo in High100100 112100 Same Fo in Low100100 100112 Swapped100112 (gives wrong gping)112100

31 Segregating sentence pairs by Fo all the action is in the low frequency region (<800 Hz) no strong evidence of across-formant grouping

32 Adding Fo-swapped inappropriate pairing of Fo only detrimental above 4 semitones

33 Summary of Fo-differences Across-formant grouping only significant for large Fo differences (> ~ 4 semitones) Most of the improvement with small Fo differences happens in the F1 frequency- region.

34 another caveat for auto-correlation Improvement in identification of double vowels for small ∆Fos is about as good when each vowel is made up of alternating harmonics of the two Fos (Culling & Darwin) Autocorrelation would pull out completely wrong envelopes.

35 No simultaneous effect of FM different Frequency Modulations of Fo Although separation by Fo shows strong effects, there is no detectable effect of simultaneous separation by different Frequency Modulations of Fo. Listeners unable to discriminate correlated from uncorrelated FM in simulataneous inharmonic sine waves (Carlyon).

36 Summary of  Fo effects in separating competing voices Intelligibility increased by small  Fo only in F1 region (and harmonic alternation tolerated)... … but not by  Fo in only higher freq. region. Across-formant consistency of Fo only important at larger  Fo FM produces no additional separation

37 Difference in Fo leads to: 1.binaural separation of sound sources 2.increase in intelligibility 3.ability to track a sound source over time. Three types of experiment:

38 Tracking by Fo We can also continuity of an Fo contour to track a particular sound source over time.

39 CRM task (tracking a sound source) (Bolia et al., 2000) 2 simultaneous sentences each of form  Ready (Call Sign) go to (Color) (Number) now.  Same talker (TT); Same Sex (TS); Different sex (TD) Target denoted by Call-Sign "Baron" 8 Talkers in corpus, 2048 tokens

40 Listeners responded by selecting the appropriate colored digit with the computer mouse CRM task (Bolia et al., 2000)

41 CRM task results (Brungart et al)

42 Effect of change in Fo

43

44 Fo contours for 2 individuals Individuals, with most constant Fo contours, show most improvement with ∆Fo

45 Effect of change of VT

46 Effect of joint change of Fo and VT Original: male

47 Effect of joint change of Fo and VT Original: female

48 Superadditivity of ∆Fo and ∆VT 0.00 0.50 1.00 1.50 0.000.501.001.50 predicted d' actual d' male female ∆Fo & ∆VT superadditive … and still less than real different-sex talkers

49 Conclusions Same Fo not a sufficient condition for dichotic fusion for complemenarily filtered speech. Intelligibility increase for small ∆Fo confined to F1 region. Only across-formant for larger ∆Fo. Fo & VT-size useful for tracking sources across time. Superadditive.


Download ppt "Using Fo and vocal-tract length to attend to one of two talkers. Chris Darwin University of Sussex With thanks to : Rob Hukin John Culling John Bird MRC."

Similar presentations


Ads by Google