Presentation is loading. Please wait.

Presentation is loading. Please wait.

From Sequence to Expression: A Probabilistic Framework Eran Segal (Stanford) Joint work with: Yoseph Barash (Hebrew U.) Itamar Simon (Whitehead Inst.)

Similar presentations


Presentation on theme: "From Sequence to Expression: A Probabilistic Framework Eran Segal (Stanford) Joint work with: Yoseph Barash (Hebrew U.) Itamar Simon (Whitehead Inst.)"— Presentation transcript:

1 From Sequence to Expression: A Probabilistic Framework Eran Segal (Stanford) Joint work with: Yoseph Barash (Hebrew U.) Itamar Simon (Whitehead Inst.) Nir Friedman (Hebrew U.) Daphne Koller (Stanford)

2 Understanding Cellular Processes u Complex biological processes (e.g. cell cycle)  Coordination of multiple events  Each event requires different modules S G2 M G1 Can we recover the regulatory circuits that control such processes?

3 Gene Structure Coding Region Promoter Region CTAGTAGATATCGATCAG mRNA Protein

4 Gene Regulation Gene 1 Gene 2 Gene 3 Gene 4 Gene 5 A AGACTTCAGA Sequence Motif mRNA

5 Gene Regulation Gene 1 Gene 2 Gene 3 Gene 4 Gene 5 A A A Swi5 - Transcription Factor mRNA

6 Gene Regulation Gene 1 Gene 2 Gene 3 Gene 4 Gene 5 A A A Activated A Swi5 mRNA More mRNA (higher expression)

7 Gene Regulation Gene 1 Gene 2 Gene 3 Gene 4 Gene 5 A A A Activated A Swi5 B B B B AGTTGA mRNA

8 Gene Regulation Gene 1 Gene 2 Gene 3 Gene 4 Gene 5 A A A Swi5 B B B B Ndd1 Activated B A +mRNA

9 Goal ACTAGTGCTGA CTATTATTGCA CTGATGCTAGC + AGCTAGCTGAGACTGCACACTGATCGAG CCCCACCATAGCTTCGGACTGCGCTATA TAGACTGCAGCTAGTAGAGCTCTGCTAG AGCTCTATGACTGCCGATTGCGGGGCGT CTGAGCTCTTTGCTCTTGACTGCCGCTTA TTGATATTATCTCTCTTGCTCGTGACTGC TTTATTGTGGGGGGGACTGCTGATTATGC TGCTCATAGGAGAGACTGCGAGAGTCGT CGTAGGACTGCGTCGTCGTGATGATGCT GCTGATCGATCGGACTGCCTAGCTAGTA GATCGATGTGACTGCAGAAGAGAGAGGG TTTTTTCGCGCCGCCCCGCGCGACTGCT CGAGAGGAAGTATATATGACTGCGCGCG CCGCGCGCCGGACTGCAGCTGATGCAT GCATGCTAGTAGACTGCCTAGTCAGCTG CGATCGACTCGTAGCATGCATCGACTGC AGTCGATCGATGCTAGTTATTGGACTGC GTAGTAGTGCGACTGCTCGTAGCTGTAG R(t 1 ) G1 t 1 Motif R(t 2 ) G2 t 2 Motif

10 Model of Gene Regulation GeneExperiment Expression Sequence Probabilistic Relational Models (PRMs) Pfeffer and Koller (1998) Friedman et al (1999) Segal et al (2001) Promoter sequences Regulation by transcription factors Expression measurements Context Cluster

11 Regulation to Expression Level GeneExperiment Expression R(t 1 ) R(t 2 ) Exp. type R(t 1 ) = yes  t 1 regulates gene R(t 1 ) = no  t 1 does not regulate gene Exp. cluster

12 Regulation to Expression Level GeneExperiment Expression R(t 1 ) R(t 2 ) Exp. type R(t 1 ) R(t 2 ) E type   0 0 I -0.7 1.2 0 1 II 0.8 0.6 … CPD P(Level) Level -0.7 0.8 P(Level) Level Exp. cluster

13 Modeling Context Specificity Level GeneExperiment Expression R(t 1 ) Exp. type Exp. type = G1 R(t 2 )=ye s true false true R(t 1 ) = Yes false true false... 3 P(Level) Level 0 P(Level) Level 2 P(Level) Level u Gaussian decision tree u T1 only relevant in G1 u T2 only relevant in G2 Exp. cluster R(t 2 )

14 Sequence Model Level GeneExperiment Expression R(t 1 ) R(t 2 ) Exp. type Sequence Assumptions:  Binding site is of length k  Binding may occur at any k-mer  TF regulates gene if binding occurs anywhere Exp. cluster

15 From Sequence to Regulation u Assumptions:  Binding site is of length k  Binding may occur at any k-mer  TF regulates gene if binding occurs anywhere u PSSM:  Background distribution  Motif distribution  Discriminative training where

16 From Sequence to Regulation u Model for one gene g, promoter region of length 5 and k=2 S1S1 S3S3 S2S2 S4S4 S5S5 sequence residues g.R(t) variable for “t regulates g” m[1].B m[2].B m[3].B m[4].B k-mer binding events Logistic function motif model

17 Joint Probabilistic Model Level GeneExperiment Expression R(t 1 ) R(t 2 ) Exp. type Exp. Cluster k-mer s1s1 sksk … B(t 1 )B(t 2 ) Discriminative model: Maximizes Discriminative model: Maximizes

18 Localization Assay

19 Swi5 DNA u Induce TF protein level Swi5

20 DNA Localization Assay Swi5 Gene Bound Gene Not Bound  TF binds to targets u Induce TF protein level

21 Localization Assay DNA u Measure TF binding to promoter of every gene  Assign confidence for each binding Swi5 Gene Bound Gene Not Bound  TF binds to targets u Induce TF protein level

22 Localization Assay Simon et al (2001) u Localization data: measure TF binding to promoter of each gene (assign binding confidence)

23 Is Regulation Observed? u Not quite… u Localization is measured for specific conditions u Localization is measured for large DNA regions u Localization is noisy

24 Incorporating Localization Level GeneExperiment Expression R(t 1 ) R(t 2 ) Exp. type Exp. Cluster L(t 1 ) L(t 2 ) Observed localization u Localization p-value is noisy sensor of actual regulation  If regulation occurs, p-value likely to be low  If no regulation, p-value likely to be high

25 Gene R(t 1 ) L(t 1 ) Localization Model u Localization p-value is noisy sensor of actual regulation  If regulation occurs, p-value likely to be low  If no regulation, p-value likely to be high Observed

26 Joint Probabilistic Model Level GeneExperiment Expression R(t 1 ) R(t 2 ) Exp. type Exp. Cluster promoter s1s1 sksk … L(t 1 ) L(t 2 )

27 Learning the Models ACGCCTAACGCCTA Experimental Details L E A R N E R Level Gene R(t 1 ) R(t 2 ) Ehase ster Clu s1s1 sksk B(t 1 )B(t 2 ) Localization Data Exp. Phase = IV R(t 1 ) true false true R(t 1 ) = Yes false R(t 2 ) = Yes true false truefalse R(t 1 ) R(t 2 ) E Phase   0 0 I 0.8 1.2 0 1 II -0.7 0.6 …

28 Learning the Models u Ndd1 activates Ace2 and Swi5 in G1, which together activate in S u Mcm1 activates the DNA repair pathway in S ACGCCTAACGCCTA Experimental Details L E A R N E R Level Gene R(t 1 ) R(t 2 ) Ehase ster Clu s1s1 sksk B(t 1 )B(t 2 ) Localization Data

29 Model Learning u Structure Learning:  Tree structure u Missing Data:  Experiment cluster  Regulation variables u Motif Model:  Parameter estimation u Expectation Maximization u Bayesian score u Heuristic search u Discriminative training (conjugate gradient)

30 Model Learning Gene Expression R(t 2 ) R(t 1 ) Experiment Exp. type Level + Experimental Details Localization Data ACGCCTAACGCCTA promoter s1s1 sksk … Exp. cluster L(t 1 )

31 Resulting Bayesian Network Level 1,2 R(t 2 ) 1 R(t 1 ) 1 Exp. type Exp. type 2 Level 1,1 Level2, 2 R(t 2 ) 2 R(t 1 ) 2 Level 2,1 Level 3,2 R(t 2 ) 3 R(t 1 ) 3 Level 3,1 L(t 2 ) 1 L(t 1 ) 1 L(t 2 ) 2 L(t 1 ) 2 L(t 2 ) 3 L(t 1 ) 3 s 11 s k1 s 12 s k2 s 13 s k3 Exp. cluster

32 Model Learning: E-Step Level 1,2 R(t 2 ) 1 R(t 1 ) 1 Exp. type Exp. type 2 Level 1,1 Level2, 2 R(t 2 ) 2 R(t 1 ) 2 Level 2,1 Level 3,2 R(t 2 ) 3 R(t 1 ) 3 Level 3,1 L(t 2 ) 1 L(t 1 ) 1 L(t 2 ) 2 L(t 1 ) 2 L(t 2 ) 3 L(t 1 ) 3 s 11 s k1 s 12 s k2 s 13 s k3 Exp. cluster Loopy belief propagation

33 Model Learning: M-Step Level 1,2 R(t 2 ) 1 R(t 1 ) 1 Exp. type Exp. type 2 Level 1,1 Level2, 2 R(t 2 ) 2 R(t 1 ) 2 Level 2,1 Level 3,2 R(t 2 ) 3 R(t 1 ) 3 Level 3,1 L(t 2 ) 1 L(t 1 ) 1 L(t 2 ) 2 L(t 1 ) 2 L(t 2 ) 3 L(t 1 ) 3 s 11 s k1 s 12 s k2 s 13 s k3 Exp. cluster Standard ML estimation Conjugate Gradient

34 Experimental Results Yeast u Cell Cycle expression data (Spellman et al) u Localization data for 9 TFs (Simon et al) u Yeast genome (promoters)

35 Generalization Level Gene Expression R(t 1 ) R(t 2 ) Experiment Exp. Cluster Gene log-likelihood u Clustering genes -112.24

36 Generalization Level Gene Expression L(t 1 ) L(t 2 ) Experiment Exp. type Gene log-likelihood u Clustering genes -112.24 u Localization -121.48 -112.24

37 Generalization Level Gene Expression R(t 1 ) R(t 2 ) Experiment Exp. type Exp. Cluster L(t 1 ) L(t 3 ) Gene log-likelihood u Clustering genes -112.24 u Localization -121.48 u Localization + exp. cluster -103.76 -112.24

38 Generalization Level Gene Expression R(t 1 ) R(t 2 ) promoter s1s1 sksk … Experiment Exp. type Exp. Cluster L(t 1 ) L(t 3 ) Gene log-likelihood u Clustering genes -112.24 u Localization -121.48 u Localization + exp. cluster -103.76 u + Sequence -94.59 -112.24

39 Generating Hypotheses Example: Genes regulated by Swi6, not by Mcm1 and not by Fkh2, exhibit unique expression pattern in phase G1 in the cell cycle Gene functions: DNA repair [P 3e-09] DNA synthesis [P 7e-05]

40 Expression vs Regulation 02142638410510701001301601902202500306090120150090180270360 alpha cdc15cdc28elu -0.5 0 0.5 1 Phase Swi5 regulated Swi5 expression Genes predicted to be regulated by Swi5 are probably real Swi5 targets

41 Combinatorial Effects 02142638410510701001301601902202500306090120150090180270360 alpha cdc15cdc28elu -0.5 0 0.5 1 Phase Fkh2 & Swi4 Fkh2 & Ndd1

42 Combinatorial Effects -0.5 0 0.5 1 02142638410510701001301601902202500306090120150090180270360 alpha cdc15cdc28elu Mcm1 & Ndd1 Mcm1 & Ace2 Mcm1 & Swi5 Phase

43 Localization Assignment Changes

44 Motifs Found u Ndd1 Simon et al. Expanded Set Remaining Genes 17 1 28 Expanded set identified additional genes regulated by Ndd1

45 TFSimonExpandedRestP-Value Ace210911.4e-6 Fkh1292584.4e-10 Fkh229 105.4e-11 Mbp1665681.9e-45 Mcm1282424.2e-18 Ndd1172811.9e-24 Swi4413756.4e-26 Swi5282324.9e-15 Swi6505262.3e-48

46 Induced Interaction Network u TF pairs whose regulation predicts expression of same gene cluster Ace2 Swi5 Ndd1 Fkh2 Fkh1 Swi4 Swi6 Mcm1 Mbp1 G1 S G2 M M/G1 M G1 G2 S

47 Conclusions u Unified probabilistic model explaining gene regulation using sequence, localization and expression data u Models complex interactions between regulators u Discriminative model maximizing P(Expr. | Seq.) u Sequence data helps explain expression patterns

48 Big Picture u Goal: unified probabilistic framework  Models complex biological domains  Incorporates heterogeneous data u Framework incorporates explicitly within model basic biological building blocks:  Genes, TFs, proteins, patients, cells, species, … u Much closer connection between biology and model  Can read biology directly from model  Can incorporate prior knowledge easily u Can explicitly represent and learn biological models


Download ppt "From Sequence to Expression: A Probabilistic Framework Eran Segal (Stanford) Joint work with: Yoseph Barash (Hebrew U.) Itamar Simon (Whitehead Inst.)"

Similar presentations


Ads by Google