Audio Source Separation And ICA by Mike Davies & Nikolaos Mitianoudis Digital Signal Processing Lab Queen Mary, University of London.

Slides:

Advertisements

Similar presentations

Figures for Chapter 7 Advanced signal processing Dillon (2001) Hearing Aids.

Advertisements

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.

Air Force Technical Applications Center 1 Subspace Based Three- Component Array Processing Gregory Wagner Nuclear Treaty Monitoring Geophysics Division.

Beamforming Issues in Modern MIMO Radars with Doppler

August 2004Multirate DSP (Part 2/2)1 Multirate DSP Digital Filter Banks Filter Banks and Subband Processing Applications and Advantages Perfect Reconstruction.

Multipitch Tracking for Noisy Speech

Microphone Array Post-filter based on Spatially- Correlated Noise Measurements for Distant Speech Recognition Kenichi Kumatani, Disney Research, Pittsburgh.

SYED SYAHRIL TRADITIONAL MUSICAL INSTRUMENT SIMULATOR FOR GUITAR1.

Blind Source Separation of Acoustic Signals Based on Multistage Independent Component Analysis Hiroshi SARUWATARI, Tsuyoki NISHIKAWA, and Kiyohiro SHIKANO.

Independent Component Analysis (ICA)

Virtualized Audio as a Distributed Interactive Application Peter A. Dinda Northwestern University Access Grid Retreat, 1/30/01.

3/24/2006Lecture notes for Speech Communications Multi-channel speech enhancement Chunjian Li DICOM, Aalborg University.

Subband-based Independent Component Analysis Y. Qi, P.S. Krishnaprasad, and S.A. Shamma ECE Department University of Maryland, College Park.

Digital Voice Communication Link EE 413 – TEAM 2 April 21 st, 2005.

Independent Component Analysis (ICA) and Factor Analysis (FA)

1 Manipulating Digital Audio. 2 Digital Manipulation  Extremely powerful manipulation techniques  Cut and paste  Filtering  Frequency domain manipulation.

ICA Alphan Altinok. Outline  PCA  ICA  Foundation  Ambiguities  Algorithms  Examples  Papers.

Angle Modulation Objectives

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.

1 Blind Separation of Audio Mixtures Using Direct Estimation of Delays Arie Yeredor Dept. of Elect. Eng. – Systems School of Electrical Engineering Tel-Aviv.

Warped Linear Prediction Concept: Warp the spectrum to emulate human perception; then perform linear prediction on the result Approaches to warp the spectrum:

Normalised Least Mean-Square Adaptive Filtering

Dept. E.E./ESAT-STADIUS, KU Leuven homes.esat.kuleuven.be/~moonen/

Sarah Middleton Supervised by: Anton van Wyk, Jacques Cilliers, Pascale Jardin and Florence Nadal 3 December 2010.

High survival HF radio network Michele Morelli, Marco Moretti, Luca Sanguinetti CNIT- PISA.

Introduction to Spectral Estimation

GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.

A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST

„Bandwidth Extension of Speech Signals“ 2nd Workshop on Wideband Speech Quality in Terminals and Networks: Assessment and Prediction 22nd and 23rd June.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Introduction SNR Gain Patterns Beam Steering Shading Resources: Wiki:

Eigenstructure Methods for Noise Covariance Estimation Olawoye Oyeyele AICIP Group Presentation April 29th, 2003.

1 PROPAGATION ASPECTS FOR SMART ANTENNAS IN WIRELESS SYSTEMS JACK H. WINTERS AT&T Labs - Research Red Bank, NJ July 17,

Heart Sound Background Noise Removal Haim Appleboim Biomedical Seminar February 2007.

Wireless and Mobile Computing Transmission Fundamentals Lecture 2.

Independent Component Analysis Zhen Wei, Li Jin, Yuxue Jin Department of Statistics Stanford University An Introduction.

2010/12/11 Frequency Domain Blind Source Separation Based Noise Suppression to Hearing Aids (Part 1) Presenter: Cian-Bei Hong Advisor: Dr. Yeou-Jiunn Chen.

Multiuser Detection (MUD) Combined with array signal processing in current wireless communication environments Wed. 박사 3학기 구 정 회.

Blind Separation of Speech Mixtures Vaninirappuputhenpurayil Gopalan REJU School of Electrical and Electronic Engineering Nanyang Technological University.

Wireless Sensor Project Search Triangulation Aerial Rescue Team (START)

Adaptive Methods for Speaker Separation in Cars DaimlerChrysler Research and Technology Julien Bourgeois

Supervisor: Dr. Boaz Rafaely Student: Limor Eger Dept. of Electrical and Computer Engineering, Ben-Gurion University Goal Directional analysis of sound.

ECE 4710: Lecture #6 1 Bandlimited Signals  Bandlimited waveforms have non-zero spectral components only within a finite frequency range  Waveform is.

The Physical Layer Lowest layer in Network Hierarchy. Physical transmission of data. –Various flavors Copper wire, fiber optic, etc... –Physical limits.

Lecture 3 The Digital Image – Part I - Single Channel Data 12 September

ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.

An Introduction to Blind Source Separation Kenny Hild Sept. 19, 2001.

Hearing Research Center

2010/12/11 Frequency Domain Blind Source Separation Based Noise Suppression to Hearing Aids (Part 2) Presenter: Cian-Bei Hong Advisor: Dr. Yeou-Jiunn Chen.

Full-rank Gaussian modeling of convolutive audio mixtures applied to source separation Ngoc Q. K. Duong, Supervisor: R. Gribonval and E. Vincent METISS.

Non-negative Matrix Factor Deconvolution; Extracation of Multiple Sound Sources from Monophonic Inputs International Symposium on Independent Component.

Mixture Kalman Filters by Rong Chen & Jun Liu Presented by Yusong Miao Dec. 10, 2003.

Microphone Array Project ECE5525 – Speech Processing Robert Villmow 12/11/03.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:

Turning a Mobile Device into a Mouse in the Air

Independent Component Analysis Independent Component Analysis.

An Introduction of Independent Component Analysis (ICA) Xiaoling Wang Jan. 28, 2003.

Spatial Covariance Models For Under- Determined Reverberant Audio Source Separation N. Duong, E. Vincent and R. Gribonval METISS project team, IRISA/INRIA,

Spatial vs. Blind Approaches for Speaker Separation: Structural Differences and Beyond Julien Bourgeois RIC/AD.

Siemens Corporate Research Rosca et al. – Generalized Sparse Mixing Model & BSS – ICASSP, Montreal 2004 Generalized Sparse Signal Mixing Model and Application.

Variable-Frequency Response Analysis Network performance as function of frequency. Transfer function Sinusoidal Frequency Analysis Bode plots to display.

Benedikt Loesch and Bin Yang University of Stuttgart Chair of System Theory and Signal Processing International Workshop on Acoustic Echo and Noise Control,

Term paper on Smart antenna system

Motorola presents in collaboration with CNEL Introduction  Motivation: The limitation of traditional narrowband transmission channel  Advantage: Phone.

Chapter 2. Signals and Linear Systems

ARENA08 Roma June 2008 Francesco Simeone (Francesco Simeone INFN Roma) Beam-forming and matched filter techniques.

Generalized and Hybrid Fast-ICA Implementation using GPU

Approaches of Interest in Blind Source Separation of Speech

LECTURE 11: Advanced Discriminant Analysis

Microphone Array Project

Dealing with Acoustic Noise Part 1: Spectral Estimation

Presentation transcript:

Audio Source Separation And ICA by Mike Davies & Nikolaos Mitianoudis Digital Signal Processing Lab Queen Mary, University of London

Outline of Talk Introduction : -Audio Source Separation: Beamforming, ICA & CASA ICA for source separation - dealing with convolutive mixtures A Frequency Domain Framework - unmixing in the frequency domain - source modelling & the permutation problem Beamforming for Source Separation - Using geometric information A Reverberant BSS Example: - ICA as a beamformer - Real reverberant transfer functions - Using beamforming with ICA -Moving sources

Cocktail Party Problem

Computational Cocktail Party Problem

Audio Source Separation Problem

Computational tools for audio source separation Computational Auditory Scene Analysis (CASA) ─ Typically extracts one source from a single channel of audio using heuristic psychological grouping rules (pattern matching). Blind Source Separation (BSS aka ICA) ─ Uses spatial diversity based on source independence. Extensions include: convolutional mixing, overcomplete mixtures. Beamforming ─ Uses spatial diversity based on the known geometry of the microphone array and the directions of arrival (DOA) of the source signals

Review of ICA The ICA model: Aim: estimate s(t) from x(t), (mixing matrix A unknown). If no. of sources = no. of observations, we can estimate s(t) by estimating W = A -1 to give s = Wx. A is identifiable if we assume the sources are statistically independent: and non-Gaussian.

Audio observations are linear convolution (plus additive noise) Unmixing filter uses an FIR approximation (complete case): ICA for Audio Source Separation

Frequency (subband) filtering L point STFT L point STFT L point STFT L point STFT L point ISTFT L point ISTFT WnWn W1W1 x1x1 x1x1 x2x2 x2x2 x3x3 x3x L point STFT L point STFT L point STFT L point STFT L point STFT L point STFT X(ω) s3s3 s3s3 s2s2 s2s2 s1s1 s1s1 S(ω) The unmixing filtering can be efficiently performed within a subband framework. This does not necessarily imply a frequency domain model for the sources.

Various authors have suggested the simple gradient- based algorithm for ICA: This can be viewed as a Maximum Likelihood estimator with  (s) often takes a tanh-like shape  superGaussian prior. For convolutive mixing this can be adapted to: (time domain source model) ML Natural Gradient Algorithm

Frequency (subband) filtering L point STFT L point STFT L point STFT L point STFT L point ISTFT L point ISTFT WnWn W1W1 x1x1 x1x1 x2x2 x2x2 x3x3 x3x L point STFT L point STFT L point STFT L point STFT L point STFT L point STFT X(ω) s3s3 s3s3 s2s2 s2s2 s1s1 s1s1 S(ω) Source model Time domain modelling e.g Lee et al STFT

Frequency Domain Source Model An alternative strategy is to model the sources in the frequency domain (e.g. Smaragdis 1997). Advantages: Computational Efficiency Sparser Statistics (→ better estimates)

Frequency (subband) filtering L point STFT L point STFT L point STFT L point STFT L point ISTFT L point ISTFT WnWn W1W1 x1x1 x1x1 x2x2 x2x2 x3x3 x3x L point STFT L point STFT L point STFT L point STFT L point STFT L point STFT X(ω) s3s3 s3s3 s2s2 s2s2 s1s1 s1s1 S(ω) Frequency domain modelling (e.g Smaragdis 1997). Disadvantage: The Permutation Problem.

Solutions to Permutation Problem Source Modelling Solutions Time Domain  no permutation problem (Lee et al. 1997). Time-Frequency  couples adaptive filters, –using signal envelopes (Ikeda et al. 1999) or –TF generative models (Mitianoudis & Davies 2001). Permutation problem can persist with gradient learning (Davies 2002). Channel Modelling Solutions Constrained Unmixing Filters  couples adaptive filters –Heuristic (Smaragdis 1997) –Constrained filter model (Parra & Spence 1998) Solutions tend to get trapped in local minima (Ikram & Morgan 2000) –Directivity patterns to resolve permutation (Kurita et al. 2000) Problems at high frequencies and with high reverberation

Permutation Problem Example Two speech signals mixed with a single echoes of about ~ 5ms Mitianoudis & Davies Alg. Smaragdis Alg.

Beamforming for Source Separation A traditional approach to microphone array processing is to use Beamforming. Microphone outputs are combined to amplify signals from desired direction while suppressing other signals from other directions. Hence ICA is a blind beamformer! Note beamformer directivity patterns are frequency dependent Direction of Arrival Narrowband beamformer directivity pattern Main lobe nulls

ICA as a Beamformer FD-ICA is essentially a FD-Beamformer, i.e. place nulls to other sources, so as to separate one at a time. d θ ICA employs statistical information only Beamforming employs geometrical info, i.e. Directions Of Arrival (DOA) One can perform permutation alignment for FD-ICA using DOA, i.e. align the directivity patterns. Null direction

Ideal Directivity Patterns Single Delay transfer function ~ anechoic room Ideal situation for permutation alignment Multiple ripples around c/d Hz A null around 25°

A real room experiment 2m ~ 7.5m ~ 6m 1m 1.5m We recorded a 2 microphone - 2 speaker setup in a real lecture room, to explore the application of beamforming on BSS.

Real Directivity Patterns Directivity pattern for source 1, estimated and aligned by Likelihood Ratio (amplitude only criterion). DOA around 22° Observations More smeared than single delay. A main DOA still apparent. Questions How can we accurately estimate DOA from a directivity pattern ? How can align the permutation to form a consistent beam-pattern? Can we approximate with a single delay ?

DOA estimation ambiguity Multiple nulls appear after c/d Hz Difficult to estimate DOA. Saruwatari et al used null statistics along all frequencies to estimate DOA. Ikram and Morgan used only lower frequencies to estimate DOA. Estimate the average along frequency directivity pattern for several frequency bands. The average directivity pattern between 0-2KHz can give a consistent DOA.

DOA estimation ambiguity (cont)  The exact low-frequency range is dependent on d. multiple nulls appear at higher frequencies For small d recorded signals will be more similar => low separation quality   For more accurate DOA estimation, one can use extra sensors and subspace methods like MuSIC. (Parra and Alvino 2002) Sensor spacing choice is a trade-off between separation quality and beampattern clarity.

Permutation alignment using DOA Basic Problem: The nulls are slightly drifted around the DOA, due to reverberation. Solution: Look for a null in a “neighbourhood” around the DOA. Not accurate enough. Definition of neighbourhood. Classification really difficult in mid-higher frequencies. Remedy: Use beamforming (phase information) in lower-mid frequencies and LR (amplitude information) for mid-higher frequencies.

Permutation alignment using DOA Sound Samples: Mixtures: Separated : using LR: Using BF:

Sensitivity analysis Effects of a misplaced beamformer: Repeated the recordings with source 2 misplaced by 50 cm. Beamformer’s sensitivity to movement We unmixed the 50cm recordings and compared the beampatterns. We observed the following:

Sensitivity analysis (cont) A moving source will not greatly affect our beamformer at lower frequencies, but mainly at

Sensitivity analysis (cont) Distortion introduced due to movement. We used the original filters to unmix the 50cm case. Distortion is a function of frequency.

Sensitivity analysis (cont) Distortion introduced due to. Distortion introduced due to movement. The source that moved can still be separated, but is a bit more echoic due to incorrect mapping. The source that didn’t move won’t be separated due to incorrect beamforming. It will still be mapped back correctly.

Conclusions Beamforming is a useful tool for permutation alignment. It is a semi-blind method since it exploits known array configuration. Phase information seems more important at lower frequencies. Amplitude information seems more important at higher frequencies. (Lord Rayleigh’s Law of Hearing) Distortion introduced due to movement is a function of frequency. Problems when aligning at higher frequencies.