Download presentation

Presentation is loading. Please wait.

Published byTanya Cureton Modified about 1 year ago

1
A Factor Graph Model for Minimal Gene Set Enrichment Analysis Diana Uskat Computational Biology - Gene Center Munich

2
Diana Uskat - Gene Center Munich2 Problem Outline: Single gene analysis of microarray experiments entails a large multiple testing problem Even after appropriate multiple testing correction, the result is usually a long list of differentially expressed genes Interpretation is difficult by hand Possible improvement: Gene set enrichment analysis 1.Group genes into different biologically meaningful categories (Gene Ontology, KEGG Pathways, Transcription factor targets) 2.Use a statistical method for finding those categories which are enriched for differentially expressed genes Motivation Ontologizer from S. Bauer, J. Gagneur, P. N. Robinson Cutout of Gene Ontology Graph from Ontologizer by S. Bauer, J. Gagneur, P. N. Robinson (NAR 2010) Cutout of Gene Ontology

3
Diana Uskat - Gene Center Munich3 Established Methods: GSEA (Subramanian, Tamayo) TopGO (Alexa) Globaltest (Goemann, Mansmann) GOStats (Falcon, Gentleman) Drawbacks: There are often 1000’s of overlapping categories, genes can belong to multiple categories difficult new multiple testing problem Group testing returns often a large number of significant categories identification of biologically relevant categories difficult Motivation Graph from Ontologizer by S. Bauer, J. Gagneur, P. N. Robinson (NAR 2010) Cutout of Gene Ontology

4
Diana Uskat - Gene Center Munich4 Minimal Gene Set Enrichment Idea (Bauer, Gagneur et al., Nucleic Acids Research 2010) Search for a sparse explanation, i.e. a minimal number of categories that explain the data (sufficiently well) Use a simplistic probabilistic graphical model relating categories and genes, and do Bayesian inference on the marginal posterior for each category T2 E3E2E1 T1 T3 T2 E3E2E1 T1 T3 Correct explanationCorrect minimal explanation Genes Categories “gene E3 is element of category T3” (coloured means „on“)

5
Diana Uskat - Gene Center Munich5 Minimal Gene Set Enrichment T2 E3E2E1 T1 T3 D3D2D1 Genes Categories Observations (data) PosteriorLikelihoodPrior The model A Bayesian Network factorization of the full posterior: Main trick: Use a prior favoring sparse solutions

6
Diana Uskat - Gene Center Munich6 Factor Graphs T2 E3E2E1 T1 T3 D3D2D1 Graphical model (Kschischang IEEE, 2001 ) Bipartite graph with factor nodes and variable nodes Each factor node encodes a function for its neighbouring variables Efficient computation of marginal distribution with the sum-product algorithm (if factor graph is a tree...) Our method: Factor Graphs

7
Diana Uskat - Gene Center Munich7 Factor Graphs T2 E3E2E1 T1 T3 D3D2D1 f1f2f3 Graphical model ( Kschischang IEEE, 2001 ) Bipartite graph with factor nodes and variable nodes Each factor node encodes a function its neighbouring variables Efficient computation of marginal distribution with the sum-product algorithm (if factor graph is a tree...) Pr(D|E) given by dataset

8
Diana Uskat - Gene Center Munich Factor Graphs T2 E3E2E1 T1 T3 D3D2D1 g1 f1f2f3 g2g3g6 g4g5 Graphical model (Kschischang IEEE, 2001) Bipartite graph with factor nodes and variable nodes Each factor node encodes a function its neighbouring variables Efficient computation of marginal distribution with the sum-product algorithm (if factor graph is a tree...) E only active if at least one parent active 7

9
Diana Uskat - Gene Center Munich7 Factor Graphs T2 E3E2E1 T1 T3 D3D2D1 g1 f1f2f3 g2g3g6 g4g5 fTfT Graphical model ( Kschischang IEEE, 2001 ) Bipartite graph with factor nodes and variable nodes Each factor node encodes a function its neighbouring variables Efficient computation of marginal distribution with the sum-product algorithm (if factor graph is a tree...) with

10
Diana Uskat - Gene Center Munich8 Estimation Methods for Factor Graphs T2 E3E2E1 T1 T3 D3D2D1 g1 f1f2f3 g2g3g6 g4g5 fTfT Computation of posterior for T,E: Message-Passing Algorithm: Sum- Product-Algorithm Stops at correct result after one round if graph has a tree structure No guarantees if graph has cycles (e.g., oscillation may occur), however works well in practice Principle: Start in leaf nodes Message propagation: –variable to factor node („Sum“) –factor to variable node („Product“) Termination: Compute the marginal distribution of the variable nodes

11
Diana Uskat - Gene Center Munich9 Application: Yeast Salt Stress Categories: Transcritption factors (with their targets) instead of GO categories Given: –List of transcription factors with their corresponding genes –List of genes (their p-values) from a yeast salt stress experiment Question: Which transcription factors are active during salt stress? Task: Find a set of transcription factors that are most likely to be active TF1 TF2 g1 g2 g3 g4 g5 “g2 is target of TF2”

12
Diana Uskat - Gene Center Munich10 Results ~2.000 genes 118 transcription factors Graph obtained from re-analysis of Harbison TF binding data (Nat, 2004) by MacIsaac et al. (BMC Bioinformatics, 2006)

13
Diana Uskat - Gene Center Munich10 Results ~2.000 genes 118 transcription factors Graph obtained from re-analysis of Harbison TF binding data (Nat, 2004) by MacIsaac et al. (BMC Bioinformatics, 2006) Previously known transcription factors involved in salt stress (Capaldi et al., Nat.Gen 2008, Wu and Chen, Bioinform Biol Insights. 2009) Differentially phosphorylated transcription factors (Soufi et al., Mol.Biosyst 2009) YML081W DAL81 STB4 HSF1 UME6 SNT2 RGT1 MET28 MSN2 GAL4 SKO1

14
Diana Uskat - Gene Center Munich11 Summary and Outlook Todo: scalability and speed Lists of (meaningful) gene sets are better than lists of genes Search for biologically meaningful explanations requires a new minmal model (MGSE) for gene set enrichment analysis We use factor graphs for parameter estimation Wide application to GO analysis, TF-target analysis, Pathway enrichment

15
Diana Uskat - Gene Center Munich12 Acknowledgments Gene Center Munich: Achim Tresch, Theresa Niederberger, Björn Schwalb, Sebastian Dümcke Collaborating Partners: Gene Center Munich: Patrick Cramer, Christian Miller, Daniel Schulz, Dietmar Martin, Andreas Mayer EMBL Heidelberg: Julien Gagneur(talk nov. 2009, working group conference of the GMDS „AG Statistische Methoden in der Bioinformatik, Munich“)

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google