A koktelparti effektus Hogy lehet ebben a helyzetben a hallgato egyaltalan kepes megerteni a beszedet? Mik a koktelparti effektus faktorai es dimenzioi?

Slides:



Advertisements
Similar presentations
Revised estimates of human cochlear tuning from otoacoustic and behavioral measurements Christopher A. Shera, John J. Guinan, Jr., and Andrew J. Oxenham.
Advertisements

Auditory scene analysis 2
Pattern Recognition and Machine Learning
Multipitch Tracking for Noisy Speech
Sound source segregation Development of the ability to separate concurrent sounds into auditory objects.
Timbre perception. Objective Timbre perception and the physical properties of the sound on which it depends Formal definition: ‘that attribute of auditory.
Periodicity and Pitch Importance of fine structure representation in hearing.
DFT/FFT and Wavelets ● Additive Synthesis demonstration (wave addition) ● Standard Definitions ● Computing the DFT and FFT ● Sine and cosine wave multiplication.
CS 551/651: Structure of Spoken Language Lecture 11: Overview of Sound Perception, Part II John-Paul Hosom Fall 2010.
Improvement of Audibility for Multi Speakers with the Head Related Transfer Function Takanori Nishino †, Kazuhiro Uchida, Naoya Inoue, Kazuya Takeda and.
Speaking Style Conversion Dr. Elizabeth Godoy Speech Processing Guest Lecture December 11, 2012.
Basic Spectrogram Lab 8. Spectrograms §Spectrograph: Produces visible patterns of acoustic energy called spectrograms §Spectrographic Analysis: l Acoustic.
Source Localization in Complex Listening Situations: Selection of Binaural Cues Based on Interaural Coherence Christof Faller Mobile Terminals Division,
Interrupted speech perception Su-Hyun Jin, Ph.D. University of Texas & Peggy B. Nelson, Ph.D. University of Minnesota.
Hearing & Deafness (4) Pitch Perception 1. Pitch of pure tones 2. Pitch of complex tones.
Basic Concepts and Definitions Vector and Function Space. A finite or an infinite dimensional linear vector/function space described with set of non-unique.
Two- tone unmasking and suppression in a forward-masking situation Robert V. Shannon 1976 Spring 2009 HST.723 Theme 1: Psychophysics.
Chapter 11 Multiple Regression.
Spectral centroid 6 harmonics: f0 = 100Hz E.g. 1: Amplitudes: 6; 5.75; 4; 3.2; 2; 1 [(100*6)+(200*5.75)+(300*4)+(400*3.2)+(500*2 )+(600*1)] / = 265.6Hz.
Sound source segregation (determination)
Discrete Time Periodic Signals A discrete time signal x[n] is periodic with period N if and only if for all n. Definition: Meaning: a periodic signal keeps.
Fundamentals of Perceptual Audio Encoding Craig Lewiston HST.723 Lab II 3/23/06.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Speech Segregation Based on Sound Localization DeLiang Wang & Nicoleta Roman The Ohio State University, U.S.A. Guy J. Brown University of Sheffield, U.K.
1 Improved Subjective Weighting Function ANSI C63.19 Working Group Submitted by Stephen Julstrom for October 2, 2007.
Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto.
Low Level Visual Processing. Information Maximization in the Retina Hypothesis: ganglion cells try to transmit as much information as possible about the.
Wireless and Mobile Computing Transmission Fundamentals Lecture 2.
Lecture 9 Fourier Transforms Remember homework 1 for submission 31/10/08 Remember Phils Problems and your notes.
Mr Background Noise and Miss Speech Perception in: by Elvira Perez and Georg Meyer.
SIGNAL DETECTION IN FIXED PATTERN CHROMATIC NOISE 1 A. J. Ahumada, Jr., 2 W. K. Krebs 1 NASA Ames Research Center; 2 Naval Postgraduate School, Monterey,
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
Basics of Neural Networks Neural Network Topologies.
Sounds in a reverberant room can interfere with the direct sound source. The normal hearing (NH) auditory system has a mechanism by which the echoes, or.
Recognition of Speech Using Representation in High-Dimensional Spaces University of Washington, Seattle, WA AT&T Labs (Retd), Florham Park, NJ Bishnu Atal.
Dynamic Aspects of the Cocktail Party Listening Problem Douglas S. Brungart Air Force Research Laboratory.
‘Missing Data’ speech recognition in reverberant conditions using binaural interaction Sue Harding, Jon Barker and Guy J. Brown Speech and Hearing Research.
Spatial and Spectral Properties of the Dummy-Head During Measurements in the Head-Shadow Area based on HRTF Evaluation Wersényi György SZÉCHENYI ISTVÁN.
Hearing Research Center
Pitch perception in auditory scenes 2 Papers on pitch perception… of a single sound source of more than one sound source LOTS - too many? Almost none.
SOUND PRESSURE, POWER AND LOUDNESS MUSICAL ACOUSTICS Science of Sound Chapter 6.
Auditory & tactile displays EGR 412 Human Factors Engineering ISE
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,
Listeners weighting of cues for lateral angle: The duplex theory of sound localization revisited E. A. MacPherson & J. C. Middlebrooks (2002) HST. 723.
Introduction to psycho-acoustics: Some basic auditory attributes For audio demonstrations, click on any loudspeaker icons you see....
Hearing Detection Loudness Localization Scene Analysis Music Speech.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.
Feature Extraction 主講人:虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.
Multiple comparisons problem and solutions James M. Kilner
Acoustic Phonetics 3/14/00.
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 20,
12.3 The Dot Product. The dot product of u and v in the plane is The dot product of u and v in space is Two vectors u and v are orthogonal  if they meet.
Fletcher’s band-widening experiment (1940)
SOUND PRESSURE, POWER AND LOUDNESS
SPATIAL HEARING Ability to locate the direction of a sound. Ability to locate the direction of a sound. Localization: In free field Localization: In free.
Speech and Singing Voice Enhancement via DNN
Auditory Localization in Rooms: Acoustic Analysis and Behavior
PATTERN COMPARISON TECHNIQUES
PSYCHOACOUSTICS A branch of psychophysics
Precedence-based speech segregation in a virtual auditory environment
CS 591 S1 – Computational Audio
New ProMAX modules for reflectivity calibration and noise attenuation
CHAPTER 10 Auditory Sensitivity.
Volume 77, Issue 5, Pages (March 2013)
Perceptual Echoes at 10 Hz in the Human Brain
EE513 Audio Signals and Systems
7.1 Introduction to Fourier Transforms
Speech Perception (acoustic cues)
Attentive Tracking of Sound Sources
Presentation transcript:

A koktelparti effektus Hogy lehet ebben a helyzetben a hallgato egyaltalan kepes megerteni a beszedet? Mik a koktelparti effektus faktorai es dimenzioi? Lehetseges-e a kutatonak a realis helyzetet leegyszerusitenie es leszukitenie ahhoz, hogy igy parametrikus kiserleteket vegezzen el? Lehetseges- e az ilyen kiserletek eredmenyeit visszavezetni a teljes, leegyszerusitetlen realis helyzethez?

Segitseg jon Albert Bregmantol (“Auditory Scene Analysis”, 1990) “Stream segregation” – hangzo folyamatok elkulonitese Ket fele elkkulonites: 1automatikus, primitiv (periferikusan eredo, alulrol- felfele halado) 2sema-altal meg hatarozott (magas szinten eredo, felulrol-lefele halado) Csoportositasi elv (=grouping principle) hangokat vagy hangok komponenseit akkor tekintjuk egy forrasbol eredonek, ha csoportosithatjuk oket kozos jellegzetesseg(ek) alapjan, pl. ugyanazon alaphang felhangjai, vagy ugyanolyan idoburkolat, vagy ugyanolyan beesesi szog, stb.

The “cocktail-party effect:” (trying to) follow one particular talker’s speech in a crowd kHz kHz

Auditory Segregation: Definitions The psychophysical space of auditory segregation dimensions Part I: -- The problem of dimensionality -- 1D data: discrimination in informational masking --Prediction of 2D segregation from 1D informational masking estimates Part II: -- Correlation between pairs of segregation dimensions computed from obtained and predicted 2D data

THE "COCKTAIL PARTY EFFECT": One speech source (=the "target") is segregated from other simultaneous speech sources FACT:  Simultaneous speech sources differ along multiple dimensions  Differences along dimensions have to be resolved  Values on all dimensions have to be correctly associated with a given source

DEFINITION OF SEGREGATION: Two simultaneous sounds that differ along two dimensions are segregated when (1) the differences along both dimensions can be resolved and (2) the correct values of each dimension are associated with either sound Thus, if Speaker “A” utters “X” and Speaker “B” utters “Y”, saying that “A  X”  “B  Y” indicates segregation, but “A  Y”  “B  X” does not

:high formant; F hi Right :low formant Dimensions: pitch and (unique) formant peak frequency

THREE CARDINAL DIMENSIONS OF THE AUDITORY SCENE:  “WHAT”  “WHEN”  “WHERE”

THREE CARDINAL DIMENSIONS OF THE AUDITORY SCENE: “WHAT”“WHEN” “WHERE” 0 O Azimuth 0 O Elevation Frequency (Hz) Amplitude 700 Random Masker (P = P MSK ) or 150 ms Signal (P = P SIG ) 300 ms m S m M (Subject’s own HRTFs) F (spectral region) f 0 (pitch)

(t) ()() (f)

THREE CARDINAL DIMENSIONS OF THE AUDITORY SCENE:  “WHAT”  “WHEN”  “WHERE” Outside the  “WHAT”/  “WHEN”/  “WHERE” space:  SEGREGATION Inside the  “WHAT”/  “WHEN”/  “WHERE space:  FUSION Between the  “WHAT”/  “WHEN”/  “WHERE dimensions:  TRADE-OFF

TRADE-OFF: The Heisenberg-Gabor principle  f  t = k extended:  f  t  = k or  f  t  [(1-  ft )(1-  f  )(1-  t  )] -1 = k

Are the three dimensions orthogonal? Why is orthogonality (or correlation) important? Can we determine the correlation between the dimensions? Questions:

TTo psychophysically measure segregation of two speech sources and to determine how much each dimension contributes to the segregation of speech sources Two sources  two “streams” Keep only vestigial features of speech (f 0, modulation) Look at two dimensions at once we must first reduce the complexity of speech in a "cocktail-party" situation, to a degree sufficient for studying it in the lab

:left of midline;  Left Right :right of midline Dimensions: pitch and azimuth

Hypothesis: 1D resolution in “informational noise” is a prerequisite for segregation, where “informational noise” could be: Informational noise: Pitch: many f 0 ’s each with many components (same location and flat envelope) Location: many locations (same spectrum/pitch and flat envelope) Envelope structure: random pattern of bursts (same spectrum/pitch and location) 1.Informational masking within one dimension between streams 2.Interference of information between dimensions Goal: Compare thresholds obtained for different dimensions

Pitch diff. (3- comp. signals)  Informational maskers 

Spectrum < 1 kHz Azimuth diff. (multicomp. signals)  Informational maskers 

Rhythmic pattern (3- comp. signal)  Informational maskers  Diff. rhythmic patterns (3- comp. signals)

Finding: because the masking functions are (quasi-) linear in log, i.e., b log  D  constant, informational masking in 1D resolution seems to obey the power law  D b = C Use b obtained from 1D informational masking results to transform 2D thresholds  D into informational masking S/N thresholds in dB

2D segregation on dimensions D 1, D 2 can be predicted from one-dimensional observations through the trade-off  D 1  D 2 = k or b 1 log  D 1 = log k – b 2 log  D 2 Since b log  D  constant, and informational masking in 1D resolution approximately obeys the power law  D b = C, b1b1 b2b2

Spectrum < 1 kHz Azimuth vs. rhythm in 1D (predicted)

Spectrum 1<2.5 kHz Azimuth vs. frequency in 1D (predicted)

Frequency vs. rhythm in 1D (predicted)

Now let’s see real 2D segregation data First use  x/x scales for both dimensions Then show the same data with both scales transformed to dB as indicated by the 1D informational masking data

Spectrum < 2.4 kHz Azimuth vs. Pitch (rhythm same)

Rhythm vs. Spectrum/Pitch (azimuth same) 2D INFO. MASK. FOR SPECTR./PITCH SEGREG. (dB) Average f mod = Hz

Spectrum < 2.4 kHz Azimuth vs. rhythm (pitch/spectrum same)

Now let us compare predicted and obtained slopes of informational masking of one dimension by another: The difference between predicted and observed slopes will be estimated by changing the angle between the x and y axes of the 1D data lines until they overlap with the 2D data lines. The difference between predicted (=orthogonal) and obtained 2D slopes for each subject thus provides an estimate of the correlation between segregation information carried by a particular pair of dimensions in the “cocktail-party” effect for that subject

Spectrum <1kHz obs. pred./orth.  =0.217  =0.017  =0.251 Azimuth vs. rhythm (pitch and spectrum same)

 =0.307  =0.340  =0.053 pred./orth. obs. Spectrum/Pitch vs. Rhythm (location same) Spectrum 1< kHz

ORTHOGONAL DIMENSIONS – MADE-UP DATA Temporal envelope plane Pitch plane Azimuth plane

Pitch plane Temporal envelope plane Azimuth plane SUBJECT 1

Azimuth plane Temporal envelope plane Pitch plane SUBJECT 2

Azimuth plane Temporal envelope plane Pitch plane SUBJECT 3

By and large, segregation cues provided by the three cardinal dimensions are not independent To segregate two streams, listeners will obtain cues from whatever dimension yields them the most easily Conclusions Non-optimal choice of cues leads to interference between streams and between dimensions Segregation is likely to be helped by highlighting streams rather than by aiding the processing of a given dimension

The End (can you segregate these?)