Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistical Inference Using Graphs for Protein Complex Identification Denise Scholtens Robert Gentleman Marc Vidal Workshop on Statistical Inference, Computing,

Similar presentations


Presentation on theme: "Statistical Inference Using Graphs for Protein Complex Identification Denise Scholtens Robert Gentleman Marc Vidal Workshop on Statistical Inference, Computing,"— Presentation transcript:

1 Statistical Inference Using Graphs for Protein Complex Identification Denise Scholtens Robert Gentleman Marc Vidal Workshop on Statistical Inference, Computing, and Visualization for Graphs Stanford University August 1-2, 2003

2 Graphic from: U.S. Department of Energy Human Genome Program http://www.ornl.gov/hgmis

3 High-throughput Protein Complex Identification Gavin, et al. (Nature, 2002) –TAP : Tandem Affinity Purification Ho, et al. (Nature, 2002) –HMS-PCI: High-throughput Mass Spectromic Protein Complex Identification

4 Protein Complex Identification Using TAP Data Spoke Model Matrix Model Bader, et al. (Nature Biotechnology, 2002)

5 Protein-Complex Affiliation Network Incidence Matrix C1C1 C2C2 C3C3 C4C4 C5C5 …CmCm P1P1 P2P2 P3P3 P4P4 P5P5 P6P6 P7P7 … PnPn 1110000…01110000…0 0000111…00000111…0 1001000…01001000…0 1101000…01101000…0 1111000…01111000…0 A =

6 Cohesive vs. Dynamic Protein Complexes Cohesive Complex: a complex of invariable composition whose proteins are associated only with that complex and its particular function

7 Cohesive Complex Affiliation Network Incidence Matrix C1C1 Bait Hit 1 Hit 2 Hit 3 Hit 4 Hit 5 111111111111 A =

8 Cohesive vs. Dynamic Protein Complexes Dynamic Complex: complex composed of proteins that may also be involved in other complexes

9 Dynamic Complex Affiliation Network Incidence Matrices A = C1C1 C2C2 C3C3 C4C4 C5C5 Bait11111 Hit 110000 Hit 201000 Hit 300100 Hit 400010 Hit 500001 C1C1 C2C2 Bait11 Hit 110 Hit 201 Hit 310 Hit 401 Hit 510 A = C1C1 C2C2 Bait11 Hit 111 Hit 211 Hit 301 Hit 401 Hit 501 A =

10 All 5 complexes above would yield the same TAP Data:

11 Statistical Inference Problem What is A? A captures the cohesive/dynamic distinction. At best, we observe all but the main diagonal of X=AA. Current analyses focus on X, not on A.

12 Protein Complex Data as a Directed Graph ?

13 Cohesive Complex described in Gavin, et al.

14 Dynamic Complex described in Gavin, et al.

15 Largest Connected Component in Gavin, et al. using Bait Proteins Only, Colored by Outdegree

16 Gavin DataHo Data

17 SubGraph of Bait Proteins from Previous Graphs with Outdegree 7 Gavin DataHo Data

18 Examples of Distinct Complexes Identified by Gavin, et al.

19 Back to Affiliation Networks C1 B11 B21 B31 A = B1B2B3 B1111 B2111 B3111 X=AA = One Three-Way Conversation

20 Affiliation Networks C1C2C3 B1110 B2101 B3011 A = B1B2B3 B1211 B2121 B3112 X=AA = Three Two-Way Conversations

21 Statistical Inference Problem Which A is correct? –A uniquely defines X, but X does not uniquely define the observable part of A. Extra information and directed graph model for the TAP data –Cellular Component Data –Gene Expression Data –Hit Data

22 Possible Use of Hit Data to Help Estimate A

23 Conclusions In the protein complex setting, directed graphs are useful for EDA, as well as framing the correct questions for statistical inference. Statistical inference problem for cohesive and dynamic protein complex identification should focus on A, not X. Digraph model of the TAP data better reflects what we actually observe, and is informative for estimating A.


Download ppt "Statistical Inference Using Graphs for Protein Complex Identification Denise Scholtens Robert Gentleman Marc Vidal Workshop on Statistical Inference, Computing,"

Similar presentations


Ads by Google