Presentation is loading. Please wait.

Presentation is loading. Please wait.

Rich Probabilistic Models for Gene Expression Eran Segal (Stanford) Ben Taskar (Stanford) Audrey Gasch (Berkeley) Nir Friedman (Hebrew University) Daphne.

Similar presentations


Presentation on theme: "Rich Probabilistic Models for Gene Expression Eran Segal (Stanford) Ben Taskar (Stanford) Audrey Gasch (Berkeley) Nir Friedman (Hebrew University) Daphne."— Presentation transcript:

1 Rich Probabilistic Models for Gene Expression Eran Segal (Stanford) Ben Taskar (Stanford) Audrey Gasch (Berkeley) Nir Friedman (Hebrew University) Daphne Koller (Stanford)

2 Our Goals u Find patterns in gene expression data

3 Experiments Genes Data Organization Induced Repressed i j A ij - mRNA level of gene i in experiment j

4 Experiments Genes Standard Clustering Organization

5 Bi-Clustering Organization Experiments Genes Undetected Similarity

6 Note: rows and columns no longer correspond to genes and experiments Desired Organization Detect similarities over subsets of genes and experiments

7 Clinical information Experimental Details Annotations (GO, MIPS, YPD) ACGCCTAACGCCTA Incorporate Heterogeneous Data u Find correlations directly u Focus on novel discoveries

8 Clinical information Experimental Details Annotations (GO, MIPS, YPD) ACGCCTAACGCCTA Our Approach Level Gene Cluster Lipid HSF Endoplasmatic GCN4 Exp. cluster Exp. type LEARNERLEARNER hypotheses

9 Level Gene Exp. cluster Experiment Gene Cluster Expression Probabilistic Relational Models (Koller & Pfeffer 98; Friedman,Getoor,Koller & Pfeffer 99)

10 Level Gene Exp. cluster Experiment Gene Cluster Expression + Resulting Bayesian Network Gene Cluster 1 Level 1,1 Gene Cluster 2 Gene Cluster 3 Exp. Cluster 2 Exp. Cluster 1 Level 2,1 Level 2,2 Level 3,1 Level 3,2 Level 1,2

11 G Cluster E Cluster   1 1 0.8 1.2 1 2 -0.7 0.6 … CPD Level Gene Exp. cluster Experiment Gene Cluster Expression Probabilistic Relational Models 0.8 P(Level) Level P(Level) Level -0.7

12 Level Gene Exp. cluster Experiment Gene Cluster Adding Heterogeneous Data Expression Lipid Endoplasmatic u Annotations HSF GCN4 u Binding sites Exp. type u Experimental details

13 Level Gene Expression Gene Cluster Lipid HSF Endoplasmatic GCN4 Exp. cluster Experiment Exp. type + Experimental Details Annotations (GO, MIPS, YPD) ACGCCTAACGCCTA Resulting Bayesian Network Level 2,2 Level 3,2 Level 1,2 Gene Cluster 1 Lipid 1 HSF 1 Endoplasmatic 1 GCN4 1 Gene Cluster 2 Lipid 2 HSF 2 Endoplasmatic 2 GCN4 2 Gene Cluster 3 Lipid 3 HSF 3 Endoplasmatic 3 GCN4 3 Exp. type 1 Exp. cluster 2 Exp. type 2 Exp. cluster 1 Level 2,1 Level 1,1 Level 3,1

14 Level Gene Exp. cluster Experiment Gene Cluster Lipid HSF Endoplasmatic GCN4 Exp. type Expression Problem: Exponential Blowup GC LP END HSF EC TYP   1 No No No 1 1 0.8 1.2 1 No No No 1 2 0.7 0.6 … 6 parents 2 6 cases k parents 2 k cases!

15 Solution: Context Specificity Level DNA repair UV Light Gene Expression Experiment 0 0 0 0 UV = NoUV = Yes Repair = Yes Repair = No Ultra Violet Light DNA DamageDNA repair genes transcribed

16 Solution: Context Specificity Level DNA repair UV Light Gene Expression Experiment 0 0 0 00 0 UV = NoUV = Yes 0 0 Ultra Violet Light DNA repair genes transcribed DNA Damage

17 Solution: Context Specificity Level DNA repair UV Light Gene Expression Experiment 0 0 0 0 UV = Yes truefalse Repair = Yes true false Ultra Violet Light DNA repair genes transcribed DNA Damage

18 Modeling Context Specificity Level Gene Exp. cluster Experiment Gene Cluster Lipid HSF Endoplasmatic GCN4 Exp. type Expression Grouping = a leaf in the tree Exp. Cluster = 2 HSF= Yes true false true Lipid = Yes false GCN4 = Yes true... false GCN4 = Yes -3 P(Level) Level... truefalsetruefalse 2 P(Level) Level 3 P(Level) Level 0 P(Level) Level

19 How do I learn these models?

20 LEARNERLEARNER Learning the Models Experimental Details Annotations (GO, MIPS, YPD) ACGCCTAACGCCTA Exp. Cluster = 2 HSF= Yes Lipid = Yes GCN4 = Yes... GCN4 = Yes... G C E C   …… 1 1 0.8 1.2 1 2 -0.7 0.6 2 1 0.8 1.2 2 2 -0.7 0.6 Level Gene Expression Gene Cluster Lipid HSF Endoplasmatic GCN4 Exp. cluster Experiment Exp. type

21 Automatic Induction u Structure Learning:  Dependency structure  Tree structure u Missing Data:  Gene cluster & experiment cluster never observed u Bayesian score u Heuristic search u Expectation Maximization (EM) Learning Algorithm

22 Learning Process Level Gene Exp. cluster Experiment Gene Cluster Lipid HSF Endoplasmatic GCN4Exp. type Expression

23 Learning Process Gene Exp. cluster Experiment Gene Cluster Lipid HSF Endoplasmatic GCN4Exp. type Expression Experiment Similarity Exp. Cluster = 2 Level

24 Learning Process Gene Exp. cluster Experiment Gene Cluster Lipid HSF Endoplasmatic GCN4Exp. type Expression Gene Similarity Exp. Cluster = 2 Level Gene Cluster = Yes

25 Learning Process Gene Exp. cluster Experiment Gene Cluster Lipid HSF Endoplasmatic GCN4Exp. type Expression Separability by binding site Exp. Cluster = 2 Level HSF= Yes... Gene Cluster = Yes

26 Learning Process Gene Exp. cluster Experiment Gene Cluster Lipid Endoplasmatic GCN4Exp. type Expression Attribute dependencies: induce cluster changes Exp. Cluster = 2 Level HSF HSF= Yes... Gene Cluster = Yes

27 Learning Process Gene Exp. cluster Experiment Gene Cluster Lipid Endoplasmatic GCN4Exp. type Expression Exp. Cluster = 2 Level HSF HSF= Yes GCN4 = Yes... Achieved desired clustering Gene Cluster = Yes...

28 Yeast Stress Data (Gasch et al 2001) u Measured response to stress cond. u 92 arrays u We selected ~900 genes u Added data: TRANSFAC, MIPS Results: u 15 significant TFs u 7 significant function categories u 793 Groupings

29 Context Specific Groupings u Metabolism of amino acids u Transporter genes u Down in nitrogen depletion

30 Context Specific Groupings u Metabolism of nitrogen u Transporter genes u Up in Starvation, Nitrogen depletion & DTT

31 Example Biological Finding u Discovered grouping of 17 genes  All induced in diauxic shift  All have  2 binding sites for MIG1 transcription factor  Many not known to be regulated by MIG1 u Context-sensitive groupings were key to finding cluster

32 Compendium Data (Hughes et al 2000) u 300 samples of yeast deletion mutants Expression Level Gene ACluste r GCluster Lipid Lipid (of mutated gene) GCluster (of mutated gene) HSF Endoplasmatic GCN4 Array/Mutated Gene

33 Level 2,2 Level 3,2 Level 1,2 Gene Cluster 1 HSF 1 Gene Cluster 2 HSF 2 HSF 3 Lipid 1 Lipid 3 Level 1,1 Level 3,1 Gene 1 mutantGene 3 mutant Array. cluster 1 Array. cluster 3 Gene 1 Gene 2 Gene 3 Level 3,2 Gene Cluster 4 HSF 4 Level 3,1 Level 2,1 Gene 4 Gene Cluster 3 Resulting Bayesian Network

34 Experimental Setup Array. cluster u Example: predicting the effect of mutating gene 4 Gene 4 mutant ? ? u Available information:  Attributes of gene 4 Lipid 4 Gene Cluster 4 HSF 4  Gene Cluster of gene 4 as a gene u Goal: predict the effect of mutating specific genes without performing the experiment (!)

35 Experimental Setup ? Lipid 4 Array. cluster ? Level 2,2 Level 3,2 Level 1,2 Gene Cluster 1 HSF 1 Gene Cluster 2 HSF 2 HSF 3 Lipid 1 Lipid 3 Level 1,1 Level 3,1 Gene 1 mutantGene 3 mutant Array. cluster 1 Array. cluster 3 Level 3,2 Gene Cluster 4 HSF 4 Level 3,1 Level 2,1 Gene Cluster 3 Gene 4 mutant

36 Results Training set: 180 mutants Level Gene Cluster Lipid HSF Endoplasmatic GCN4 Exp. cluster Exp. type Test set: 20 mutants u 44 arrays predicted at 99% confidence and 95% accuracy u Relational model is key to prediction 0 10 20 30 40 50 60 70 80 90 100 PRMs Accuracy (%) 95% accuracy

37 Conclusions u Presented a unified probabilistic framework:  Models complex biological domains  Expressive data organization  Incorporates heterogeneous data u Future directions:  Incorporate DNA and protein sequence data  Discover regulatory networks u Paper: http://www.cs.stanford.edu/~eran u Software (soon): http://dags.stanford.edu/bio u Contact: eran@cs.stanford.edu Thank You!


Download ppt "Rich Probabilistic Models for Gene Expression Eran Segal (Stanford) Ben Taskar (Stanford) Audrey Gasch (Berkeley) Nir Friedman (Hebrew University) Daphne."

Similar presentations


Ads by Google