Blind Information Processing: Microarray Data Hyejin Kim, Dukhee KimSeungjin Choi Department of Computer Science and Engineering, Department of Chemical Engineering POSTECH, Korea
Outline Blind Information Processing? Independent Component Analysis (ICA) Application of ICA to Microarray Data Time courses Yeast cell cycle data
Information Processing Blind Information Processing Little Prior Knowledge
Latent Variable Models Data Space (observation) Latent Variable Space Generative Model (FA, PPCA, ICA, GTM) Recognition Model (PCA, ICA, SOM)
What is ICA? ICA is a statistical method, the goal of which is to decompose given multivariate data into a linear sum of statistically independent components. For example, given two-dimensional vector, x = [ x 1 x 2 ] T, ICA aims at finding the following decomposition where a 1, a 2 are basis vectors and s 1, s 2 are basis coefficients Constraint: Basis coefficients s 1 and s 2 are statistically independent.
Information Geometry of ICA s y yp Mutual information Marginal mismatch Product manifold
PCA vs ICA Linear Transform Compression Classification PCA Orthogonal transform Second-order statistics Optimal coding in MS sense ICA Non-orthogonal transform Higher-order statistics Related to the projection pursuit Better than PCA in classification task?
Example of PCA
PCA vs ICA PCA (orthogonal coordinate) ICA (non-orthogonal coordinate)
PCA vs ICA x1x2 ICA PCA
Microarray Data (1)
Microarray Data Analysis(1) gene influence profile Expression mode of a sample x = gene sample influence gene expression profile
ICA: Time Courses (1) Time courses Yeast cell cycle data 77 by 6178 ORF expression ( Spellman et al ) Each mode shows specific cell-cycle behavior ICA modes remain inactive within some of the experiments Dimension reduction improve a prediction of cell-cycle regulated genes
ICA: Time Courses (2) by Liebermeister Mode1 76 components Mode2 76 components Mode1 12 components Mode1 12 components alphaelucidationcdc15cdc28
PCA Results
ICA Results(I)
ICA Results (II)
Conclusion Linear models of gene expression Model assumptions Matrix decomposition is simultaneously To interpret expression pattern and To cluster co-activated genes ICA advantage More biological meaningful analysis No order, No orthogonality More sensitive to detect expression pattern