Presentation is loading. Please wait.

Presentation is loading. Please wait.

Linking Genetic Profiles to Biological Outcome Paul Fogel Consultant, Paris S. Stanley Young National Institute of Statistical Sciences NISS, NMF Workshop.

Similar presentations


Presentation on theme: "Linking Genetic Profiles to Biological Outcome Paul Fogel Consultant, Paris S. Stanley Young National Institute of Statistical Sciences NISS, NMF Workshop."— Presentation transcript:

1 Linking Genetic Profiles to Biological Outcome Paul Fogel Consultant, Paris S. Stanley Young National Institute of Statistical Sciences NISS, NMF Workshop February 23, ‘07

2 Scotch whiskey database Original matrix = Prototypical flavor patterns + Residual X Mixing levels (weights)

3 How many flavor patterns? Scree plot Profile likelihood (Zhu and Ghodsi) Volume filled (Determinant)

4 AnCnoc Floral Sweetness Fruity Malty Nutty

5 Balmenach Winey Body Honey Sweetness Nutty Malty

6 GlenGarioch Spicy Fruity Sweetness Body Malty

7 Lagavulin & Laphroig Medicinal Smoky Body

8 Statistical Issues 1.Massive testing: Hundreds of “omic” predictors and several questions per sample. 2.Family-wise versus false discovery. 3.Missing data, outliers. Don’t fool yourself.

9 Matrix Factorization Methods 1.Principle component analysis. 2.Singular value decomposition. 3.Non-negative matrix factorization. 4.Independent component analysis. 5.Robust MF. Area of active research.

10 Key Papers 1. Good (1969) Technometrics – SVD. 2. Liu et al. (2003) PNAS – rSVD. 3. Lee and Seung (1999) Nature – NMF. 4. Kim and Tidor (2003) Genome Research. 5. Brunet et al. (2004) PNAS – Micro array. SVD eigen vectors come from a composite of  mechanisms. NMF commits one vector to each mechanism.

11 NMF Algorithm Green are the “spectra”. Red are the “weights”. = + E WH Samples A Genes or Compounds Start with random elements in red and green. Optimize so that  (a ij – wh ij ) 2 is minimized.

12 Inference Test each variable sequentially within an ordered set. Each set corresponds to a particular eigenvector, which has been ordered by decreasing values. Increase in statistical power. Genomic example. Simulation.

13 Group AML: patients with acute myeloid leukemia Group ALL: patients with acute lymphoblastic leukemia –Subgroup ALL-T: T cell subtypes –Subgroup ALL-B: B cell subtypes Golub,T.R. et al. (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286, 531– 537. Micro Array Example

14 Clustering NMF clusters samples correctly. Brunet et al (2004). PNAS vol. 101 no. 12 4164–4169 Additional subgroup of ALL-B.

15 Clustering NMF clusters samples correctly. Brunet et al (2004). PNAS vol. 101 no. 12 4164–4169 Additional subgroup of ALL-B.

16 Clustering NMF clusters samples correctly. Brunet et al (2004). PNAS vol. 101 no. 12 4164–4169 Additional subgroup of ALL-B.

17 Cluster 3 ALL-B2 (169 genes) Immune Response 10 genes (p=0.00019) Cell Growth and Proliferation 61 genes Cluster 1 ALL-B1 (33 genes) RNA Processing 11 genes P = 0.00260 Cell Cycle 12 genes Transcription 16 genes DNA Repair and Replication 11 genes P = 0.01519 MHC class II 5 genes MHC class I & II 6 genes P = 0.00018 Proteasome 7 genes P = 0.00054 Immune Response 28 genes (p=0.00047) Sequential testing Upregulation in ALL-B2 genes Higher rate of transcription and replication processes More:  Proliferative nature compared with ALL-B1  Proteasomal activity  Energy production.

18 Simulation

19 Genes 1-5: up- regulated by T1 Genes 6-10: up- regulated by T2 Genes 11-20: up- regulated by T1 and T2 Intragroup correlation structure

20 Simulation results Increased power Same level of FDR For more details see paper

21 Summary The strategy is conceptually simple: –Non-negative matrix factorization is used to create groups of genes that are moving together in the dataset. –The error rate to be controlled is allocated over these groups. –Within each group, genes are tested sequentially. The strategy should be effective if there are sets of genes moving together so that group formation reflects biological reality. Areas of research: Robust algorithms Multiblock NMF (e.g. relate active motifs with differentially expressed genes) Speed

22 Contact Information Independent consultant Paul Fogel paul.fogel@wanadoo.fr +33 1 43 26 16 86 Stan Young National Institute of Statistical Sciences young@niss.org 919 685 9328 www.niss.org/irMF Literature Software


Download ppt "Linking Genetic Profiles to Biological Outcome Paul Fogel Consultant, Paris S. Stanley Young National Institute of Statistical Sciences NISS, NMF Workshop."

Similar presentations


Ads by Google