Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pitch perception in auditory scenes 2 Papers on pitch perception… of a single sound source of more than one sound source LOTS - too many? Almost none.

Similar presentations


Presentation on theme: "Pitch perception in auditory scenes 2 Papers on pitch perception… of a single sound source of more than one sound source LOTS - too many? Almost none."— Presentation transcript:

1

2 Pitch perception in auditory scenes

3 2 Papers on pitch perception… of a single sound source of more than one sound source LOTS - too many? Almost none ± a few

4 3 Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate timbre, pitch, location Stored information about sounds concerns a single source

5 4 Bach: Musical Offering (strings)

6 5 Talk outline Low-level grouping cues in pitch perception –harmonic structure Conjoint or disjoint allocation –onset-time –context –spatial cues. Use of a difference in harmonic structure between two sound sources to help identify, track over time, and localise simultaneous sound sources.

7 6 Excitation pattern of complex tone on basilar membrane -5.0 0.0 5.0 10.0 15.0 20.0 25.0 base apex log (ish) frequency 200400 1600 resolvedunresolved 600 800 Output of 1600 Hz filterOutput of 200 Hz filter -2 -1.5 -0.5 0 0.5 1 1.5 2 00.20.40.60.81 1/200s = 5ms -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 00.20.40.60.81 1/200s = 5ms

8 7 Mistuned harmonic’s contribution to pitch declines as Gaussian function of mistuning Experimental evidence from mistuning expts: Moore, Glasberg & Peters JASA (1985). Darwin (1992). In M. E. H. Schouten (Ed). The auditory processing of speech: from sounds to words Berlin: Mouton de Gruyter frequency (Hz) 400 800 12001600 20002400 0 Match low pitch -1.5 -0.5 0 0.5 1 1.5 540 560580 600620640 660680 ∆F 0 (Hz) Frequency of mistuned harmonic f (Hz) ∆F o = a - k ∆f exp(-∆f 2 / 2 s 2 ) s = 19.9 k = 0.073

9 8 Harmonic Sieve Only consider frequencies that are close enough to harmonic. Useful as front-end to a Goldstein-type model of pitch perception. Duifhuis, Willems & Sluyter JASA (1982). frequency (Hz) 400 800 12001600 20002400 200 Hz sieve spacing 0 blocked

10 9 Is “harmonic sieve” necessary with autocorrelation models? Autocorrelation could in principle explain mistuning effect – mistuned harmonic initially shifts autocorrelation peak –then produces its own peak But the numbers do not work out. –Meddis & Hewitt model is too tolerant of mistuning.

11 10 Disjoint allocation ? Will the 200-Hz series grab the 600-Hz component, and so make it give less of a pitch shift to the 155- Hz series ? Double Complex, Ipsilateral TARGET MATCH 600Hz 155465 775 155465775 200 Darwin, Buffa, Williams and Ciocca (1992). in Auditory physiology and perception edited by Cazals, Horner and Demany (Pergamon, Oxford)

12 11 Disjoint allocation… (2) No evidence of 600-Hz component being grabbed by "in-tune" 200-Hz complex

13 12 Pitch determined by conjoint allocation Decision on each simultaneous pitch taken independently.

14 13 Mistuning & pitch -0.2 0 0.2 0.4 0.6 0.8 1 012358 vowel complex Mean pitch shift (Hz) % Mistuning of 4th Harmonic 8 subjects 90 ms

15 14 Onset asynchrony & pitch -0.2 0 0.2 0.4 0.6 0.8 1 080160240320 vowel complex Onset Asynchrony T (ms) Mean pitch shift (Hz) 8 subjects ±3% mistuning 90 ms T

16 15 Adaptation ? Increases effect of mistuning

17 16 Use of onset-tme A long onset-time removes a harmonic from pitch calculation. Not just due to adaptation because of regrouping experiment NB Onset-time much longer for pitch (100s of ms) than for hearing sound out as separate harmonic or for timbre judgements of complex (10s of ms).

18 17 Repeated context and pitch Darwin, Hukin, & Al-Khatib, B. Y. (1995). "Grouping in pitch perception: evidence for sequential constraints," J. Acoust. Soc. Am. 98, 880-885.

19 18 Pitch perception is not a bacon slicer Pitch perception does not operate on independent time- slices It uses, for example, the “Old + New” heuristic to parse which components are temporally relevant Speckschneidermaschine

20 19 These conclusions apply to resolved harmonics Separate simultaneous pitches only audible with unresolved harmonics when each pitch comes predominantly from a separate frequency region. Problem for autocorrelation models And the Old+New Heuristic breaks down. No advantage for onset-time.

21 20 Superimposing unresolved harmonics Time (s) 00.1 ミ 0.9636 0.9636 0 Time (s) 00.1 ミ 0.9696 0.9696 0 Time (s) 00.1 ミ 1.929 1.929 0 2kHz - 3kHz Fo = 100 Hz Fo = 125 Hz sum Carlyon, R. P. (1996). "Encoding the fundamental frequency of a complex tone in the presence of a spectrally overlapping masker," J. Acoust. Soc. Am. 99, 517- 24.

22 21 Spatial cues ? Contralateral mistuned component contributes almost as much as ipsilateral (k=.073 vs.061) Cf Beerends and Houtsma JASA (1989). -1.5 -0.5 0 0.5 1 1.5 540560580600620640660680 ipsilateral contralateral ∆F 0 (Hz) Frequency of mistuned harmonic f (Hz) ∆F o = a - k ∆f exp(-∆f 2 / 2 s 2 ) s = 17.4 k = 0.061 s = 19.9 k = 0.073

23 22 Summary of constraints for pitch Harmonic (Gaussian s.d. 3%) Onset time (~100-200 ms) Repeated Context V. weak spatial constraint Independent estimates of different simultaneous F0s (conjoint allocation) N.B. These constraints different for unresolved harmonics and for e.g. timbre / vowel perception Old +New

24 23 Using pitch differences to separate different objects Helps intelligibility of simultaneous speech Helps to track one voice in presence of others Helps to separate objects for independent localisation

25 24  Fo between two sentences (Bird & Darwin 1998; after Brokx & Nooteboom, 1982) Two sentences (same talker) only voiced consonants (with very few stops) Thus maximising Fo effect Task: write down target sentence Replicates & extends Brokx & Nooteboom Target sentence Fo = 140 Hz Masking sentence = 140 Hz ± 0,1,2,5,10 semitones

26 25 Example of using harmonicity for grouping for another property How do we localise complex sounds when more than one sound source is present? Maybe we group sounds first and then pool the localisation estimates of the component frequencies for that sound. More stable than grouping by location?

27 26 Localisation by ITD: Jeffress / Trahiotis & Stern 500 Hz Narrow-band noise Right ear leads by 1.5 ms 500 Hz Wider-band noise Right ear leads by 1.5 ms Hear on Left (phase ambiguity) Hear on Right (consistent ITD)

28 27 Resolution of phase ambiguity

29 28 Asynchrony & localisation -15 -10 -5 0 5 10 15 Pointer IID (dB) 0204080 Asynchrony of 500 Hz tone (ms) 6 Ss

30 29 Mistuning & localisation -15 -10 -5 0 5 10 15 Pointer IID (dB) 0136 6 Ss percent mistuning


Download ppt "Pitch perception in auditory scenes 2 Papers on pitch perception… of a single sound source of more than one sound source LOTS - too many? Almost none."

Similar presentations


Ads by Google