# Independent Component Analysis

## Presentation on theme: "Independent Component Analysis"— Presentation transcript:

Independent Component Analysis
CMPUT 466/551 Nilanjan Ray

The Origin of ICA: Factor Analysis
Multivariate data are often thought to be indirect measurements arising from some underlying sources, which cannot be directly measured/observed. Examples Educational and psychological tests use the answers to questionnaires to measure the underlying intelligence and other mental abilities of subjects EEG brain scans measure the neuronal activity in various parts of the brain indirectly via electromagnetic signals recorded at sensors placed at various positions on the head. Factor analysis is a classical technique developed in statistical literature that aims at identifying these latent sources. Independent component analysis (ICA) is a kind of factor analysis that can uniquely identify the latent variables.

Latent Variables and Factor Analysis
Latent variable model: or, Observed variable Latent components Mixing matrix Factor analysis attempts to find out both the mixing coefficients and the latent components given some instances of observed variables

Latent Variables and Factor Analysis…
Typically we require the latent variables to have unit variance and to be uncorrelated. Thus, in the following model, cov(S) = I. This representation has an ambiguity. Consider, for example an orthogonal matrix R: So, is also a factor model with unit variance, uncorrelated latent variables. Classical factor analysis cannot remove this ambiguity; ICA can remove this ambiguity.

Classical Factor Analysis
Model: ’s are zero mean, uncorrelated Gaussian noise. q < p, i.e., the number of underlying latent factor is assumed less than the number of observed components. Diagonal matrix The covariance matrix takes this form: Maximum likelihood estimation is used to estimate A. However, still the previous problem of ambiguity remains here too…

Independent Component Analysis
Step 1: Center data: Step 2: Whiten data: compute SVD of the centered data matrix After whitening in the factor model, the covariance of x, cov(x) = I, and A become orthogonal Step 3: Find out orthogonal A and unit variance, non-Gaussian and independent S PCA

Example: PCA and ICA Model:
Blind source separation (cocktail party problem)

PCA vs. ICA PCA: Find projections to minimize reconstruction error
Variance of projected data is as large as possible 2nd-order statistics needed (cov(x)) ICA: Find “interesting” projections Projected data look as non-Gaussian, independent as possible Higher-order statistics needed to measure degree of independence

Computing ICA Model: Step 3: Find out orthogonal A and unit variance, non-Gaussian and independent S. The computational approaches are mostly based on information theoretic criterion. Kullback-Leibler (KL) divergence Negentropy Another different approach emerged recently is called “Product Density Approach”

ICA: KL Divergence Criterion
x is zero-mean and whitened KL divergence measures “distance” between two probability densities Find A such that KL(.) is minimized: Joint density Before whitening, ICA means finding components as non-Gaussian as possible After whitening procedure: Cov(X) = I, ICA means finding components as independent as possible. Independent density H is differential entropy:

ICA: KL Divergence Criterion…
Theorem for random variable transformation says: So, Verify formula: why I got + log |det(A)| instead of –ve log |det(A)| Hence, Minimize with respect to orthogonal A

ICA: Negentropy Criterion
Differential entropy H(.) is not invariant to scaling of variable Negentropy is a scale-normalized version of H(.): Negentropy measures the departure of a r.v. s from a Gaussian r.v. with same variance Optimization criterion: I would think that normalized entropy would be better if (gaussian) noise is present. (not sure if the claim is true)

ICA: Negentropy Criterion…
Approximate the negentropy from data by: FastICA (http://www.cis.hut.fi/projects/ica/fastica/) is based on negentropy. Free software in Matlab, C++, Python…

ICA Filter Bank for Image Processing
An image patch is modeled as a weighted sum of basis images (basis functions): Image patch Basis functions (a.k.a. ICA filter bank) Rows of AT are filters Columns of A are filters Filter responses Jenssen and Eltoft, “ICA filter bank for segmentation of textured images,” 4th International symposium on ICA and BSS, Nara, Japan, 2003

Texture and ICA Filter Bank
Training textures 12x12 ICA basis functions or ICA filters Jenssen and Eltoft, “ICA filter bank for segmentation of textured images,” 4th International symposium on ICA and BSS, Nara, Japan, 2003

Segmentation By ICA FB ICA Filter Bank With n filters I1, I2,…, In
These are filter responses Image, I Segmented image, C Clustering Above is an unsupervised setting. Segmentation (i.e., classification in this context) can also be performed by a supervised method on the output feature images I1, I2 , …, In. A texture image Segmentation Jenssen and Eltoft, “ICA filter bank for segmentation of textured images,” 4th International symposium on ICA and BSS, Nara, Japan, 2003

On PCA and ICA PCA & ICA differ in choosing projection directions:
Different principle: least-square (PCA), independence (ICA) For data compression, PCA would be a good choice For discovering structures of data, ICA would be a reasonable choice