Download presentation

Presentation is loading. Please wait.

Published byMiracle Fears Modified over 2 years ago

1
Detecting active subnetworks in molecular interaction networks with missing data Luke Hunter Texas A&M University SHURP 2007 Student

2
Outline of Talk Introduction Overall Strategy Previous Papers Graph Construction Scoring Function Search Approaches Experiments Future Work

3
Introduction Background: Ideker et al. define an ‘active subnetwork’ as a connected set of genes with unexpectedly high levels of differential expression Objective: Find active subnetworks of metabolites Motivation: High throughput data analysis Mechanisms Cell state (disease, drug treatment, and environment)

4
Overall Strategy 1) Build graph 2) Obtain data (p-values) 3) Create scoring function 4) Find high-scoring subsets 5) Validate results

5
Previous Papers (1): Ideker et al. (2002) “Discovering regulatory and signalling circuits in molecular interaction networks” Goal: find active subnetworks Graph Galactose utilization (~300 nodes, ~300 links) P-P & P-DNA for yeast (~4000 nodes, ~7500 links) Data from perturbations of GAL pathway Scoring Aggregate z-score & calibration (more later) Scoring over multiple conditions Searching Simulated Annealing Results Don’t contradict literature Breaks up / organizes data

6
Previous Papers (2): Rajagopalan & Agarwal (2004) Goal: maximally include query list in minimal subset Graph Gathered data from 3 sources (~9000 nodes, ~30,000 links) Scoring Used aggregate z-score & calibration (from Ideker, 2002) Modified to consider node degree and node significance Searching Greedy Algorithm with DFS Results Experiments are not convincing “Inferring pathways from gene lists using a literature-derived _network of biological relationships”

7
Graph Construction KEGG Data (Kanehisa et al.) Nodes: ligands (i.e.--compounds, glycans, & drugs; ~25,000) Links: reactions (~29,000) Measured Data Chronic ischemia (304 ligands) Glucose tolerance (124 ligands) Planned myocardial infarction (107 ligands) Problems with measured data Ambiguity Not in KEGG Duplicates

8
Scoring Functions (1) Naïve Ideker et al. (2002)Whitlock (2005) Rajagopalan & Agarwal (2004) Use aggregate z-score of Ideker Create “corrected” node score Modify for node significance Modify for node degree Discrepancy with Ideker paper

9
Scoring Functions (2) Significance vs. Strength Geometric MeanPiecewise FunctionWeighted Geometric Mean

10
Scoring Functions (3) Establish Significance of Scores 1) Scramble 2) Search 3) Obtain distribution

11
Search Approaches (1): Simulated Annealing Ideker et al. 2002

12
Search Approaches (2): Greedy Algorithm w/ DFS 1)Build graph and calculate corrected node scores 2)Use BFS to group nodes with positive corrected scores 3)For each connected component do a limited DFS and try to merge with nearby connected components if merge would increase the overall score 4)Prune nodes with small z-scores (so long as connectivity is maintained)

13
Algorithm Test

14
Future Goals Remove “distant” unknown nodes? Evaluate scoring functions Evaluate search strategies Implement Google MapReduce Apply to more data sets Use cytoscape software

15
Acknowledgements NSF REU Program Fritz Gabriel Everyone else

16
References Ideker, T., Ozier, O., Schwikowski, B., and Siegel, A.F. 2002. Discovering regulatory and signalling circuits in molecular interaction networks. Bioinformatics 18: S233–S240. Rajagopalan, D., & Agarwal, P. (2005). Inferring pathways from gene lists using a literature-derived network of biological relationships. Bioinformatics 21, 788– 793. Whitlock, M. (2005). Combining probability from independent tests: the weighted Z-method is superior to Fisher’s approach. J. Evol. Biol. 16, 1368- 1373. Kanehisa, M., Goto, S., Hattori, M., Aoki-Kinoshita, K.F., Itoh, M., Kawashima, S., Katayama, T., Araki, M., and Hirakawa, M.; From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res. 34, D354-357 (2006). Dean, J., & Ghemawat, S. (2004). MapReduce: Simplified Data Processing on Large Clusters. OSDI 2004.

17
Questions?

Similar presentations

OK

Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.

Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.

© 2018 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on social contract theory of government Ppt on fibonacci numbers definition Ppt on power diode rectifier Ppt on rainwater harvesting free download Ppt on solar system for class 6 download Ppt on word association test saturday Ppt on line drawing algorithm Ppt on social networking sites Ppt on asian continental divide Ppt on financial services in india