Presentation is loading. Please wait.

Presentation is loading. Please wait.

Survey on ICA Technical Report, Aapo Hyvärinen, 1999.

Similar presentations


Presentation on theme: "Survey on ICA Technical Report, Aapo Hyvärinen, 1999."— Presentation transcript:

1 Survey on ICA Technical Report, Aapo Hyvärinen, 1999. http://ww.icsi.berkeley.edu/~jagota/NCS

2 2nd-order methods PCA / factor analysis Higher order methods Projection pursuit / Blind deconvolution ICA definitions criteria for identifiability relations to other methods Applications Contrast functions Algorithms Outline

3 x = As + n General model Observations Mixing matrix Noise Latent variables, factors, independent components

4 s = Wx Find transformation s = f (x) Consider only linear transformation:

5 Principal component analysis Find direction(s) where variance of w T x is maximized. Equivalent to finding the eigenvectors of C=E(xx T ) corresponding to the k largest eigenvalues

6 Principal component analysis

7 Closely related to PCA x = As + n Method of principal factors: –Assumes knowledge of covariance matrix of the noise: E(nn T ) –PCA on: C = E(xx T )– E(nn T ) Factors are not defined uniquely, but only up to a rotation Factor analysis

8 Projection pursuit Redundancy reduction Blind deconvolution Requires assumption that data are not Gaussian Higher order methods

9 Find direction w, such that w T x has an ’interesting’ distribution Argued that interesting directions are those that show the least Gaussian distribution Projection pursuit

10 Differential entropy Maximised when f is a Gaussian density Minimize H(w T x) to find projection pursuit directions (y = w T x) Difficult to estimate the density of w T x

11 Example: projection pursuit

12 Observe filtered version of s(t): x(t) = s(t)*g(t) Find filter h(t), such that s(t) = h(t)*x(t ) Blind deconvolution

13 Seismic: ”statistical deconvolution” Example blind deconvolution

14 Blind deconvolution (3) g(t) s(t) t t

15 Blind deconvolution (4)

16 Definition 1 (General definition) ICA of a random vector x consists of finding a linear transformation, s=Wx, so that the components, s i, are as independent as possible, in the sense of maximizing some function F(s 1,..,s m ) that measure independence. ICA definitions

17 Definition 2 (Noisy ICA) ICA of a random vector x consists of estimating the following model for the data: x = As + n where the latent variables s i are assumed independent Definition 3 (Noise-free ICA) x = As ICA definitions

18 ICA requires statistical independence Distinguish between statistically independent and uncorrelated variables Statistically independent: Uncorrelated: Statistical independence

19 All the independent components, but one, must be non-Gaussian The number of observed mixtures must be at least as large the number of independent components, m >= n The matrix A must be of full column rank Note: with m < n, A may still be indentifiable Identifiability of ICA model

20 Redundancy reduction Noise free case –Find ’interesting’ projections –Special case of projection pursuit Blind deconvolution Factor analysis for non-Gaussian data Related to non-linear PCA Relations to other methods

21 Relations to other methods (2)

22 Blind source separation –Cocktail party problem Feature extraction Blind deconvolution Applications of ICA

23 Blind source separation

24 ICA method = Objective function + Optimization algorithm Objective (contrast) functions Multi-unit contrast functions –Find all independent components One-unit contrast functions –Find one independent component (at a time)

25 Mutual information Mutual information is zero if the y i are independent Difficult to estimate, approximations exist

26 Mutual information (2) Alternative definition

27 Mutual information (3) H(X) H(Y) H(X|Y) H(Y|X) I(X,Y)

28 Non-linear PCA Add non-linearity function g(.) in the formula for PCA

29 Find one vector, w, so that w T x equals one of the independent components, s i Related to projection pursuit Prior knowledge of number of independent components not needed One-unit contrast functions

30 Difference between differential entropy of y and differential entropy of Gaussian variable with same variance Negentropy If the y i are uncorrelated, the mutual information can be expressed as J(y) can be approximated by higher-order cumulants, but estimation is sensitive to outliers

31 Have x=As, want to find s=Wx Preprocessing –Centering of x –Sphering (whitening) of x Find transformation; v=Qx such that E(vv T )=I Found via PCA / SVD Sphering does not solve problem alone Algorithms

32 Jutten-Herault –Cancel non-linear cross-correlations –Non-diagonal terms of W are updated according to Algorithms (2) –The y i are updated iteratively as y = (I+W) -1 x Non-linear decorrelation Non-linear PCA FastICA,..., etc.

33 Definitions of ICA Conditions for identifiability of model Relations to other methods Contrast functions –One-unit / multi-unit –Mutual information / Negentropy Applications of ICA Algorithms Summary

34 Noisy ICA Tailor-made methods for certain applications Use of time correlations if x is a stochastic process Time delays/echoes in cocktail-party problem Non-linear ICA Future research


Download ppt "Survey on ICA Technical Report, Aapo Hyvärinen, 1999."

Similar presentations


Ads by Google