Audio Source Separation And ICA by Mike Davies & Nikolaos Mitianoudis Digital Signal Processing Lab Queen Mary, University of London.

Audio Source Separation And ICA by Mike Davies & Nikolaos Mitianoudis Digital Signal Processing Lab Queen Mary, University of London

Outline of Talk Introduction : -Audio Source Separation: Beamforming, ICA & CASA ICA for source separation - dealing with convolutive mixtures A Frequency Domain Framework - unmixing in the frequency domain - source modelling & the permutation problem Beamforming for Source Separation - Using geometric information A Reverberant BSS Example: - ICA as a beamformer - Real reverberant transfer functions - Using beamforming with ICA -Moving sources

Cocktail Party Problem

Computational Cocktail Party Problem

Audio Source Separation Problem

Computational tools for audio source separation Computational Auditory Scene Analysis (CASA) ─ Typically extracts one source from a single channel of audio using heuristic psychological grouping rules (pattern matching). Blind Source Separation (BSS aka ICA) ─ Uses spatial diversity based on source independence. Extensions include: convolutional mixing, overcomplete mixtures. Beamforming ─ Uses spatial diversity based on the known geometry of the microphone array and the directions of arrival (DOA) of the source signals

Review of ICA The ICA model: Aim: estimate s(t) from x(t), (mixing matrix A unknown). If no. of sources = no. of observations, we can estimate s(t) by estimating W = A -1 to give s = Wx. A is identifiable if we assume the sources are statistically independent: and non-Gaussian.

Audio observations are linear convolution (plus additive noise) Unmixing filter uses an FIR approximation (complete case): ICA for Audio Source Separation

Frequency (subband) filtering............ L point STFT L point STFT L point STFT L point STFT L point ISTFT L point ISTFT WnWn W1W1 x1x1 x1x1 x2x2 x2x2 x3x3 x3x3............ L point STFT L point STFT L point STFT L point STFT L point STFT L point STFT............ X(ω) s3s3 s3s3 s2s2 s2s2 s1s1 s1s1 S(ω) The unmixing filtering can be efficiently performed within a subband framework. This does not necessarily imply a frequency domain model for the sources.

Various authors have suggested the simple gradient- based algorithm for ICA: This can be viewed as a Maximum Likelihood estimator with  (s) often takes a tanh-like shape  superGaussian prior. For convolutive mixing this can be adapted to: (time domain source model) ML Natural Gradient Algorithm

Frequency (subband) filtering............ L point STFT L point STFT L point STFT L point STFT L point ISTFT L point ISTFT WnWn W1W1 x1x1 x1x1 x2x2 x2x2 x3x3 x3x3............ L point STFT L point STFT L point STFT L point STFT L point STFT L point STFT............ X(ω) s3s3 s3s3 s2s2 s2s2 s1s1 s1s1 S(ω) Source model Time domain modelling e.g Lee et al. 1997 STFT

Frequency Domain Source Model An alternative strategy is to model the sources in the frequency domain (e.g. Smaragdis 1997). Advantages: Computational Efficiency Sparser Statistics (→ better estimates)

Frequency (subband) filtering............ L point STFT L point STFT L point STFT L point STFT L point ISTFT L point ISTFT WnWn W1W1 x1x1 x1x1 x2x2 x2x2 x3x3 x3x3............ L point STFT L point STFT L point STFT L point STFT L point STFT L point STFT............ X(ω) s3s3 s3s3 s2s2 s2s2 s1s1 s1s1 S(ω) Frequency domain modelling (e.g Smaragdis 1997). Disadvantage: The Permutation Problem.

Solutions to Permutation Problem Source Modelling Solutions Time Domain  no permutation problem (Lee et al. 1997). Time-Frequency  couples adaptive filters, –using signal envelopes (Ikeda et al. 1999) or –TF generative models (Mitianoudis & Davies 2001). Permutation problem can persist with gradient learning (Davies 2002). Channel Modelling Solutions Constrained Unmixing Filters  couples adaptive filters –Heuristic (Smaragdis 1997) –Constrained filter model (Parra & Spence 1998) Solutions tend to get trapped in local minima (Ikram & Morgan 2000) –Directivity patterns to resolve permutation (Kurita et al. 2000) Problems at high frequencies and with high reverberation

Permutation Problem Example Two speech signals mixed with a single echoes of about ~ 5ms Mitianoudis & Davies Alg. Smaragdis Alg.

Beamforming for Source Separation A traditional approach to microphone array processing is to use Beamforming. Microphone outputs are combined to amplify signals from desired direction while suppressing other signals from other directions. Hence ICA is a blind beamformer! Note beamformer directivity patterns are frequency dependent Direction of Arrival Narrowband beamformer directivity pattern Main lobe nulls

ICA as a Beamformer FD-ICA is essentially a FD-Beamformer, i.e. place nulls to other sources, so as to separate one at a time. d θ ICA employs statistical information only Beamforming employs geometrical info, i.e. Directions Of Arrival (DOA) One can perform permutation alignment for FD-ICA using DOA, i.e. align the directivity patterns. Null direction

Ideal Directivity Patterns Single Delay transfer function ~ anechoic room Ideal situation for permutation alignment Multiple ripples around c/d Hz A null around 25°

A real room experiment 2m ~ 7.5m ~ 6m 1m 1.5m We recorded a 2 microphone - 2 speaker setup in a real lecture room, to explore the application of beamforming on BSS.

Real Directivity Patterns Directivity pattern for source 1, estimated and aligned by Likelihood Ratio (amplitude only criterion). DOA around 22° Observations More smeared than single delay. A main DOA still apparent. Questions How can we accurately estimate DOA from a directivity pattern ? How can align the permutation to form a consistent beam-pattern? Can we approximate with a single delay ?

DOA estimation ambiguity Multiple nulls appear after c/d Hz Difficult to estimate DOA. Saruwatari et al used null statistics along all frequencies to estimate DOA. Ikram and Morgan used only lower frequencies to estimate DOA. Estimate the average along frequency directivity pattern for several frequency bands. The average directivity pattern between 0-2KHz can give a consistent DOA.

DOA estimation ambiguity (cont)  The exact low-frequency range is dependent on d. multiple nulls appear at higher frequencies For small d recorded signals will be more similar => low separation quality   For more accurate DOA estimation, one can use extra sensors and subspace methods like MuSIC. (Parra and Alvino 2002) Sensor spacing choice is a trade-off between separation quality and beampattern clarity.

Permutation alignment using DOA Basic Problem: The nulls are slightly drifted around the DOA, due to reverberation. Solution: Look for a null in a “neighbourhood” around the DOA. Not accurate enough. Definition of neighbourhood. Classification really difficult in mid-higher frequencies. Remedy: Use beamforming (phase information) in lower-mid frequencies and LR (amplitude information) for mid-higher frequencies.

Permutation alignment using DOA Sound Samples: Mixtures: Separated : using LR: Using BF:

Sensitivity analysis Effects of a misplaced beamformer: Repeated the recordings with source 2 misplaced by 50 cm. Beamformer’s sensitivity to movement We unmixed the 50cm recordings and compared the beampatterns. We observed the following:

Sensitivity analysis (cont) A moving source will not greatly affect our beamformer at lower frequencies, but mainly at higher frequencies. @160Hz @750Hz

Sensitivity analysis (cont) Distortion introduced due to movement. We used the original filters to unmix the 50cm case. Distortion is a function of frequency.

Sensitivity analysis (cont) Distortion introduced due to. Distortion introduced due to movement. The source that moved can still be separated, but is a bit more echoic due to incorrect mapping. The source that didn’t move won’t be separated due to incorrect beamforming. It will still be mapped back correctly.

Conclusions Beamforming is a useful tool for permutation alignment. It is a semi-blind method since it exploits known array configuration. Phase information seems more important at lower frequencies. Amplitude information seems more important at higher frequencies. (Lord Rayleigh’s Law of Hearing) Distortion introduced due to movement is a function of frequency. Problems when aligning at higher frequencies.

Audio Source Separation And ICA by Mike Davies & Nikolaos Mitianoudis Digital Signal Processing Lab Queen Mary, University of London.

Similar presentations

Presentation on theme: "Audio Source Separation And ICA by Mike Davies & Nikolaos Mitianoudis Digital Signal Processing Lab Queen Mary, University of London."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Audio Source Separation And ICA by Mike Davies & Nikolaos Mitianoudis Digital Signal Processing Lab Queen Mary, University of London.

Similar presentations

Presentation on theme: "Audio Source Separation And ICA by Mike Davies & Nikolaos Mitianoudis Digital Signal Processing Lab Queen Mary, University of London."— Presentation transcript:

Similar presentations

About project

Feedback