Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds Paris Smaragdis, Madhusudana Shashanka, Bhiksha Raj NIPS 2009.

Similar presentations


Presentation on theme: "A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds Paris Smaragdis, Madhusudana Shashanka, Bhiksha Raj NIPS 2009."— Presentation transcript:

1 A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds Paris Smaragdis, Madhusudana Shashanka, Bhiksha Raj NIPS 2009

2 Introduction Problem : Single channel signal separation – Separating out signals from individual sources in a mixed recording General approach – Derive a generalizable model that captures the salient features of each source – Separation is achieved by abstracting components from the mixed signal that conform to the characterization of the individual sources

3 Physical Intuition Recover sources by reweighting of frequency subbands from a single recording

4 Latent Variable Model Given magnitude spectrogram of a single source, each spectral frame is modeled as a histogram of repeated draws from a multinomial distribution over the frequency bins – At a given time frame t, P t (f) represents the probabilty of drawing frequency f – The model assumes that P t (f) is comprised of bases indexed by a latent variable z

5 Latent Variable Model (Contd.) Now let the matrix V F×T of entries v ft represent the magnitude spectrogram of the mixture sound and v t represent time frame t (the t-th column vector of matrix V) First we assume that we have an already trained model in the form of basis vector P s (f/z) – These bases represent a dictionary of spectra that best describe each source

6 Source separation Decompose a new mixture of these known sources in terms of the contributions of the dictionaries of each source – Use EM algorithm to estimate P t (z/s) and P t (s) The reconstruction of the contribution of source s in the mixture is given by

7 Contribution of this paper Use training data directly as a dictionary – Authors argue that given any sufficiently large collection of data from a source the best possible characterization of any data is quite simply the data themselves (e.g., non-parametric density learning using Parzen-window) – Side-step the need for separate model training step – Large dictionary provides a better description of the sources, as opposed to the less expressive learned basis models – Source estimates are guaranteed to lie on the source manifold as opposed to trained approaches which can produce arbitrary outputs that will not necessarily be plausible source estimates

8 Using Training data as Dictionary Use each frame of the spectrograms of the training sequences as the bases P s (f/z) – Let be the training spectrogram from source s. In this case, the latent variable z for source s takes T (s) values, and the z-th basis function will be given by the z-th column vector of W (s) With the above model ideally one would want to use one dictionary element per source at any point of time – Ensure output lie on the source manifold – Similar to a nearest neighbor model (search is computationally very expensive) – In this paper authors propose using sparsity

9 Entropic prior Given a probability distribution θ the entropic prior is defined as – α is a weighting factor and determines the level of sparsity – A sparse representation has a low entropy (since only few elements are ‘active”) – Imposing this prior during MAP estimation is a way to minimize entropy during estimation which will result in sparse θ representation

10 Sparse approximation We would like to minimize the entropies of both the speaker dependent mixture weights and the source priors at every frame However, – Thus reducing the entropy of the joint distribution is equivalent to reducing the conditional entropy of the source dependent mixture weights and the entropy of the source priors

11 Sparse approximation The model written in terms of this parameter is given by, To impose sparsity we apply the entropic prior given by, Apply EM to estimate Reconstructed source is given by,

12 Results on real data

13

14 Comments The use of sparsity ensures that the output is a plausible speech signal devoid of artifacts like distortion and musical noise Unfortunate side effect is the need to use a very large dictionary – However significant reduction in dictionary size may be achieved by using an energy threshold to select the loudest frames of he training spectrogram as bases – Outperforms trained basis models of same size


Download ppt "A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds Paris Smaragdis, Madhusudana Shashanka, Bhiksha Raj NIPS 2009."

Similar presentations


Ads by Google