Approaches of Interest in Blind Source Separation of Speech

Approaches of Interest in Blind Source Separation of Speech
Julien Bourgeois DAIMLERCHRYSLER AG Research and Technology, RIC/AD 1

Background - Need of speech-based Human-Machine Interface in cars.
- Road noise, passengers speech create adverse conditions to Automatic Speech Recognition. 2

4 Approaches to the Cocktail Party Problem
1 - Computational Auditory Scene Analysis (CASA) 2 - Sparse Decomposition Approach 3 - Statistical Blind Source Separation 4 - Beamforming Conclusion & Future plans 3

Computational Auditory Scene Analysis (CASA) Generalities
Aim: get an algorithmic description of higher auditory functions. Strong biological inspiration. One or two sensors (microphones) are considered. Mic signal is filtered like in a human ear. Variations on a Segmentation - Grouping scheme. 4

Segmentation is based on temporal continuity.
CASA - Segmentation Frequency Index Time Segmentation is based on temporal continuity. 5

CASA - Grouping Frequency Index Time Grouping rules are (1) harmonicity and (2) synchronous start or end. These rules agree with certain psychoacoustical phenomena. 6

CASA - Audio example mixture separated

1 - Computational Auditory Scene Analysis (CASA) 2 - Sparse Decomposition Approach 3 - Statistical Blind Source Separation 4 - Beamforming Conclusion & Future plans

Sparse Decomposition - Generalities
2 sensors x1 and x2 of N acoustic sources si are given. Aim : Find an invertible transform T so that the N sources are disjoint in the transformed domain. DUET : T = STFT works !! (Windowed Short Term Fourier Transform) Indeed, statistically S1(w,t) S2(w,t) is small. 7

Sparse Decomposition - DUET
Assumption : “At each point (w,t) of the spectrogram, only one source is active.” Angle(X1(w,t)/X2(w,t))/w [Group delay] Group delay 1 Group delay 2 Which source Si is active at (w,t) ? Look at the phase between X1(w,t) and X2(w,t). Frequency Index Time Then set Si(w,t) = X1(w,t) 8

Sparse Decomposition - Audio Example
Mix 1 Mix 2 Out 1 Out 2

Statistical Blind Source Separation
Assumption: “The sources are decorrelated.” or “The sources are independent.” ICA = Independent Component Analysis Generally needs (at least) as many sensors as sources. Permutation and scale ambiguities: If s1 and s2 are independent, so are s2 and b s1 9

Mixture model: x(n) = A(0)s(n) A(K)s(n-K) = A* s (n) (TF) X(w,t) = A(w)S(w,t) Separation filters W: find W(w) so that the components of Y(w,t) = W(w)X(w,t) are independent or decorrelated. (Y estimates the sources S). For a decorrelation criterion, the output Y is decorrelated at each t. One can find W minimizing the off-diagonal terms of RYY(w,t) = E[Y(w,t)YH(w,t)] jointly for all t. 10

Very few assumption on the sources. But: In frequency domain, the ambiguities occur independently at each frequency bin w. Can be CPU-expensive because of iterative optimization. 11

Audio example Mix 1 Mix 2 Out 1 Out 2

1 - Computational Auditory Scene Analysis (CASA) 2 - Sparse Decomposition Approach 3 - Statistical Blind Source Separation 4 - Beamforming Conclusion & Future Plans

Beamforming - Array signal processing
Spatial locations of the sources (direction of arrival - D.O.A.) are mapped on delays between sensors. Array signal processing addresses 3 estimation problems: 1) number of sources, 2) their spatial locations, 3) spatial filtering. Can require more sensors than sources, depending on the spatial resolution. s1 s2 x1 xi xN xi(t) = s1(t-d1,i ) + s2(t-d2,i ) 12

Beamforming - Source Location
1/ Energy-Based: Search for the delays di that maximize sy2 y(t) = x1(t+d1 ) xN(t+dN ) [output of a delay-sum beamformer] 2/ Correlation Based: Search for the delay d that maximizes E[xi (t)xj (t-d )], for some pairs (i,j) 3/ High Resolution: X(w,t) = A(w)S(w,t) The eigendecomposition of RXX=A RSS AH provides information on A, i.e. on the source location. diagonal if the sources are decorrelated 13

Beamforming - Spatial Filtering
xi x1 xN di dN d1 ... Fi FN F1 + Beamforming - Spatial Filtering direction of interest 1/ Data-Independant: e.g. delay sum beamforming 2/ Statistically optimal: Constrain the response in the direction of interest and minimize the output power 14

Beamforming - Audio example
Mix 1 Mix 2 Out 1 Out 2

Conclusion & Questions
Different definitions of “source”. Perceptual,Topological, Statistical, Spatial: Complementary approaches. No perfect solution to the cocktail party problem. 15

Future plans in Hoarse Combination of existing methods:
DUET if the sources are disjoint ICA or beamforming if they overlap Investigation of specific open questions Estimation of the number of sources at each (w,t) point. Sparse Decomposition: Optimal transform T ? Extension to more than 2 mics ? Theoretical Boundaries ? Equivalencies between these approaches (e.g. Second Order BSS and Beamforming) ? 16

Short Bibliography CASA
Guy J Brown, Martin Cooke. Computational Auditory Scene Analysis. Computer Speech and Language, vol. 8, no. 4, pp , 1994. A. S. Bregman. “Auditory Scene Analysis”, MIT Press, Cambridge, MA, 1990. Guoning Hu and DeLiang Wang, Monaural speech separation, NIPS 2002

Sparse Decomposition - DUET
Short Bibliography Sparse Decomposition - DUET M. Zibulevsky, B. A. Pearlmutter, P. Bofill, and P. Kisilev, "Blind Source Separation by Sparse Decomposition", chapter in the book: S. J. Roberts, and R.M. Everson eds., Independent Component Analysis: Principles and Practice, Cambridge, 2001. O. Yilmaz and S. Rickard, Blind Separation of Speech Mixtures via Time-Frequency Masking, Submitted to the IEEE Transactions on Signal Processing, November 4, 2002 Jourjine, S. Rickard, and O. Yilmaz, Blind Separation of Disjoint Orthogonal Signals: Demixing N Sources from 2 Mixtures, Proceedings of the 2000 IEEE Conference on Acoustics, Speech, and Signal Processing (ICASSP2000), Volume 5, Pages , Istanbul, Turkey, June 2000

Statistical Blind Source Separation - ICA
Short Bibliography Statistical Blind Source Separation - ICA Lucas Parra, Clay Spence, "Convolutive blind source separation of non-stationary sources", IEEE Trans. on Speech and Audio Processing pp , May 2000 Te-Won Lee, Independent Component Analysis: Theory and Applications Kluwer Academic Publishers, September 1998

Short Bibliography Beamforming
B.D. van Veen and K.M. Buckley, ``Beamforming: A Versatile Approach to Spatial Filtering,'' IEEE ASSP Magazine, vol.5, pp. 4-24, Apr M. Brandstein and H. Silverman, "A practical methodology for speech source localization with microphone arrays," Computer, Speech and Language, vol. 11, no. 2, pp , 1997. D. Ward and M. Brandstein (Eds.), 'Microphone Arrays: Techniques and Applications', Springer, Berlin, 2001, pp

Approaches of Interest in Blind Source Separation of Speech

Similar presentations

Presentation on theme: "Approaches of Interest in Blind Source Separation of Speech"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Approaches of Interest in Blind Source Separation of Speech

Similar presentations

Presentation on theme: "Approaches of Interest in Blind Source Separation of Speech"— Presentation transcript:

Similar presentations

About project

Feedback