Presentation is loading. Please wait.

Presentation is loading. Please wait.

A koktelparti effektus Hogy lehet ebben a helyzetben a hallgato egyaltalan kepes megerteni a beszedet? Mik a koktelparti effektus faktorai es dimenzioi?

Similar presentations


Presentation on theme: "A koktelparti effektus Hogy lehet ebben a helyzetben a hallgato egyaltalan kepes megerteni a beszedet? Mik a koktelparti effektus faktorai es dimenzioi?"— Presentation transcript:

1

2 A koktelparti effektus Hogy lehet ebben a helyzetben a hallgato egyaltalan kepes megerteni a beszedet? Mik a koktelparti effektus faktorai es dimenzioi? Lehetseges-e a kutatonak a realis helyzetet leegyszerusitenie es leszukitenie ahhoz, hogy igy parametrikus kiserleteket vegezzen el? Lehetseges- e az ilyen kiserletek eredmenyeit visszavezetni a teljes, leegyszerusitetlen realis helyzethez?

3 Segitseg jon Albert Bregmantol (“Auditory Scene Analysis”, 1990) “Stream segregation” – hangzo folyamatok elkulonitese Ket fele elkkulonites: 1automatikus, primitiv (periferikusan eredo, alulrol- felfele halado) 2sema-altal meg hatarozott (magas szinten eredo, felulrol-lefele halado) Csoportositasi elv (=grouping principle) hangokat vagy hangok komponenseit akkor tekintjuk egy forrasbol eredonek, ha csoportosithatjuk oket kozos jellegzetesseg(ek) alapjan, pl. ugyanazon alaphang felhangjai, vagy ugyanolyan idoburkolat, vagy ugyanolyan beesesi szog, stb.

4 The “cocktail-party effect:” (trying to) follow one particular talker’s speech in a crowd 3 2 1 0.5 4 kHz 3 2 1 0.5 4 kHz

5 Auditory Segregation: Definitions The psychophysical space of auditory segregation dimensions Part I: -- The problem of dimensionality -- 1D data: discrimination in informational masking --Prediction of 2D segregation from 1D informational masking estimates Part II: -- Correlation between pairs of segregation dimensions computed from obtained and predicted 2D data

6 THE "COCKTAIL PARTY EFFECT": One speech source (=the "target") is segregated from other simultaneous speech sources FACT:  Simultaneous speech sources differ along multiple dimensions  Differences along dimensions have to be resolved  Values on all dimensions have to be correctly associated with a given source

7 DEFINITION OF SEGREGATION: Two simultaneous sounds that differ along two dimensions are segregated when (1) the differences along both dimensions can be resolved and (2) the correct values of each dimension are associated with either sound Thus, if Speaker “A” utters “X” and Speaker “B” utters “Y”, saying that “A  X”  “B  Y” indicates segregation, but “A  Y”  “B  X” does not

8 :high formant; F hi Right :low formant Dimensions: pitch and (unique) formant peak frequency

9 THREE CARDINAL DIMENSIONS OF THE AUDITORY SCENE:  “WHAT”  “WHEN”  “WHERE”

10 THREE CARDINAL DIMENSIONS OF THE AUDITORY SCENE: “WHAT”“WHEN” “WHERE” 0 O Azimuth 0 O Elevation Frequency (Hz) 400500600 Amplitude 700 Random Masker (P = P MSK ) or 150 ms Signal (P = P SIG ) 300 ms m S m M (Subject’s own HRTFs) F (spectral region) f 0 (pitch)

11 (t) ()() (f)

12 THREE CARDINAL DIMENSIONS OF THE AUDITORY SCENE:  “WHAT”  “WHEN”  “WHERE” Outside the  “WHAT”/  “WHEN”/  “WHERE” space:  SEGREGATION Inside the  “WHAT”/  “WHEN”/  “WHERE space:  FUSION Between the  “WHAT”/  “WHEN”/  “WHERE dimensions:  TRADE-OFF

13 TRADE-OFF: The Heisenberg-Gabor principle  f  t = k extended:  f  t  = k or  f  t  [(1-  ft )(1-  f  )(1-  t  )] -1 = k

14 Are the three dimensions orthogonal? Why is orthogonality (or correlation) important? Can we determine the correlation between the dimensions? Questions:

15 TTo psychophysically measure segregation of two speech sources and to determine how much each dimension contributes to the segregation of speech sources Two sources  two “streams” Keep only vestigial features of speech (f 0, modulation) Look at two dimensions at once we must first reduce the complexity of speech in a "cocktail-party" situation, to a degree sufficient for studying it in the lab

16 :left of midline;  Left Right :right of midline Dimensions: pitch and azimuth

17 Hypothesis: 1D resolution in “informational noise” is a prerequisite for segregation, where “informational noise” could be: Informational noise: Pitch: many f 0 ’s each with many components (same location and flat envelope) Location: many locations (same spectrum/pitch and flat envelope) Envelope structure: random pattern of bursts (same spectrum/pitch and location) 1.Informational masking within one dimension between streams 2.Interference of information between dimensions Goal: Compare thresholds obtained for different dimensions

18 Pitch diff. (3- comp. signals)  Informational maskers 

19 Spectrum < 1 kHz Azimuth diff. (multicomp. signals)  Informational maskers 

20 Rhythmic pattern (3- comp. signal)  Informational maskers  Diff. rhythmic patterns (3- comp. signals)

21 Finding: because the masking functions are (quasi-) linear in log, i.e., b log  D  constant, informational masking in 1D resolution seems to obey the power law  D b = C Use b obtained from 1D informational masking results to transform 2D thresholds  D into informational masking S/N thresholds in dB

22 2D segregation on dimensions D 1, D 2 can be predicted from one-dimensional observations through the trade-off  D 1  D 2 = k or b 1 log  D 1 = log k – b 2 log  D 2 Since b log  D  constant, and informational masking in 1D resolution approximately obeys the power law  D b = C, b1b1 b2b2

23 Spectrum < 1 kHz Azimuth vs. rhythm in 1D (predicted)

24 Spectrum 1<2.5 kHz Azimuth vs. frequency in 1D (predicted)

25 Frequency vs. rhythm in 1D (predicted)

26 Now let’s see real 2D segregation data First use  x/x scales for both dimensions Then show the same data with both scales transformed to dB as indicated by the 1D informational masking data

27 Spectrum < 2.4 kHz Azimuth vs. Pitch (rhythm same)

28 Rhythm vs. Spectrum/Pitch (azimuth same) 2D INFO. MASK. FOR SPECTR./PITCH SEGREG. (dB) Average f mod = 4.375 Hz

29 Spectrum < 2.4 kHz Azimuth vs. rhythm (pitch/spectrum same)

30 Now let us compare predicted and obtained slopes of informational masking of one dimension by another: The difference between predicted and observed slopes will be estimated by changing the angle between the x and y axes of the 1D data lines until they overlap with the 2D data lines. The difference between predicted (=orthogonal) and obtained 2D slopes for each subject thus provides an estimate of the correlation between segregation information carried by a particular pair of dimensions in the “cocktail-party” effect for that subject

31

32 Spectrum <1kHz obs. pred./orth.  =0.217  =0.017  =0.251 Azimuth vs. rhythm (pitch and spectrum same)

33  =0.307  =0.340  =0.053 pred./orth. obs. Spectrum/Pitch vs. Rhythm (location same) Spectrum 1< kHz

34 ORTHOGONAL DIMENSIONS – MADE-UP DATA Temporal envelope plane Pitch plane Azimuth plane

35 Pitch plane Temporal envelope plane Azimuth plane SUBJECT 1

36 Azimuth plane Temporal envelope plane Pitch plane SUBJECT 2

37 Azimuth plane Temporal envelope plane Pitch plane SUBJECT 3

38 By and large, segregation cues provided by the three cardinal dimensions are not independent To segregate two streams, listeners will obtain cues from whatever dimension yields them the most easily Conclusions Non-optimal choice of cues leads to interference between streams and between dimensions Segregation is likely to be helped by highlighting streams rather than by aiding the processing of a given dimension

39 The End (can you segregate these?)


Download ppt "A koktelparti effektus Hogy lehet ebben a helyzetben a hallgato egyaltalan kepes megerteni a beszedet? Mik a koktelparti effektus faktorai es dimenzioi?"

Similar presentations


Ads by Google