Presentation is loading. Please wait.

Presentation is loading. Please wait.

Outline Who regulates whom and when? Model Learning algorithm Evaluation Wet lab experiments Perspective: why does it work? Reg. ACGTGC.

Similar presentations


Presentation on theme: "Outline Who regulates whom and when? Model Learning algorithm Evaluation Wet lab experiments Perspective: why does it work? Reg. ACGTGC."— Presentation transcript:

1 Outline Who regulates whom and when? Model Learning algorithm Evaluation Wet lab experiments Perspective: why does it work? Reg. ACGTGC

2 ActivatorRepressor Regulated gene ActivatorRepressor Regulated gene Activator Regulated gene Repressor State 1 Activator State 2 Activator Repressor State 3 Gene Regulation: Simple Example Regulated gene DNA Microarray Regulators DNA Microarray Regulators

3 truefalse true false Regulation Tree Activator? Repressor? State 1State 2State 3 true Regulation program Module genes Activator expression Repressor expression Genes in the same module share the same regulation program

4 Module Networks Goal: Discover regulatory modules and their regulators Module genes: set of genes that are similarly controlled Regulation program: expression as function of regulators Modules HAP4  CMK1  true false true false

5 Expression level in each module is a function of expression of regulators Module Network Probabilistic Model Experiment Gene Expression Module Regulator 1 Regulator 2 Regulator 3 Level What module does gene “g” belong to? Expression level of Regulator 1 in experiment BMH1  GIC2  0 0 0 2 1 Module P(Level | Module, Regulators) HAP4  CMK1  0 0 0

6 Outline Who regulates whom and when? Model Learning algorithm Evaluation Wet lab experiments Perspective: why does it work? Reg. ACGTGC

7 Learning Problem Experiment Gene Expression Module Regulator 1 Regulator 2 Regulator 3 Level HAP4  CMK1  0 0 0 Find gene module assignments and tree structures that maximize P(M|D) Goal: Gene module assignments Tree structures Hard Genes: 5000-10000 Regulators: ~500

8 Learning Algorithm Overview Relearn gene assignments to modules clustering Gene module assignment Regulatory modules Learn regulation programs HAP4  CMK1 

9 Learning Regulation Programs Experiments Module genes Experiments sorted in original order Experiments sorted by Hap4 expression log P(M|D)  log P(D| ,  ) + log P( ,  ) HAP4  log P(M|D)  log P(D HAP4  |  HAP4 ,  HAP4  ) + log P(D HAP4  |  HAP4 ,  HAP4  ) + log P(  HAP4 ,  HAP4 ,  HAP4 ,  HAP4  ) SIP4  log P(M|D)  log P(D SIP4  |  SIP4 ,  SIP4  ) + log P(D SIP4  |  SIP4 ,  SIP4  ) + log P(  SIP4 ,  SIP4 ,  SIP4 ,  SIP4  ) log P(M|D)  log P(D HAP4  |  HAP4 ,  HAP4  ) + log P(D CMK1  |  CMK1 ,  CMK1  ) + log P(D CMK1  |  CMK1 ,  CMK1  ) + … HAP4  CMK1  Module genes Hap4 expression Regulator

10 Learning Algorithm Performance -131 -130 -129 -128 05101520 Bayesian score (avg. per gene) Algorithm iterations 0 10 20 30 40 50 05101520 Algorithm iterations Gene module assignment changes (% from total) Significant improvements across learning iterations Many genes (50%) change module assignment in learning

11 Outline Who regulates whom and when? Model Learning algorithm Evaluation Wet lab experiments Perspective: why does it work? Reg. ACGTGC

12 Yeast Stress Data Genes Selected 2355 that showed activity Experiments (173) Diverse environmental stress conditions: heat shock, nitrogen depletion,…

13 Comparison to Bayesian Networks Problems Robustness Interpretability Cmk1 Hap4 Mig1 Ste12 Bayesian Network Friedman et al ’00 Hartemink et al. ’01 Yap1 Gic1 Expression level of each gene is a function of expression of regulators Fragment of learned Bayesian network 2355 variables (genes) 173 instances (experiments)

14 Comparison to Bayesian Networks Problems Robustness Interpretability Cmk1 Hap4 Mig1 Ste12 Bayesian Network Friedman et al ’00 Hartemink et al. ’01 Yap1 Gic1 Module Network SPRKF ’03 (UAI) Solutions Robustness  sharing parameters Interpretability  module-level model Regulator 1 Regulator 2 Regulator 3 Level Module

15 Comparison to Bayesian Networks Problems Robustness Interpretability Solutions Robustness  sharing parameters Interpretability  module-level model Test Data Log-Likelihood (gain per instance) Number of modules Bayesian Network performance -150 -100 -50 0 50 100 150 0100200300400500 Learn which parameters are shared (by learning which genes are in the same module)

16 Module From Model to Regulatory Modules Regulator 1 Regulator 2 Regulator 3 Level HAP4  CMK1  Biologically relevant? HAP4  CMK1  0 0 0

17 Respiration Module Regulation program Module genes Energy production (oxid. phos. 26/55 P<10 -30 ) Hap4+Msn4 known to regulate module genes Module genes functionally coherent? Module genes known targets of predicted regulators?   Predicted regulator

18 Energy, Osomlarity, & cAMP Signaling Tpk1:  Regulation by non-TFs (Tpk1 is a catalytic unit of cAMP dependent protein kinase)  Module contains known Tpk1 targets (e.g. Tps1)  Tpk1-mediated STRE motif (50/64 genes; p<3x10 -11 )

19 EM: Biological Improvement

20 Hap4Xbp1Yer184cYap6Gat1Ime4Lsg1Msn4Gac1Gis1 Ypl230w Not3Sip2 Amino acid metabolism Energy and cAMP signaling DNA and RNA processing nuclear 123 253341 STREN41HAP234 4 26 REPCARCAT8N26ADR1 3947 HSFHAC1XBP1 3042 MCM1N30 3136 ABF_CN36 516 Kin82Cmk1Tpk1Ppt1 N11GATA 8109 GCN4CBF1_B Tpk2Pph3 13141517 N14N13 Regulation supported in literatureRegulator (Signaling molecule)Regulator (transcription factor) Inferred regulation 48 Module (number) Experimentally tested regulator Enriched cis-Regulatory Motif Bmh1Gcn20 GCR1 18 MIG1N18 11

21 Biological Evaluation Summary Are the module genes functionally coherent? Are some module genes known targets of the predicted regulators? 46/50 30/50 Functionally coherent = module genes enriched for GO annotations with hypergeometric p-value < 0.01 (corrected for multiple hypotheses) Known targets = direct biological experiments reported in the literature

22 Outline Who regulates whom and when? Model Learning algorithm Evaluation Wet lab experiments Perspective: why does it work? Reg. ACGTGC

23 From Model to Detailed Predictions Prediction: Experiment: Regulator ‘X’ regulates process ‘Y’ Knock out ‘X’ and repeat experiment HAP4  Ypl230w X ?

24 Does ‘X’ Regulate Predicted Genes? Experiment: knock out Ypl230w (stationary phase) 1334 regulated genes (312 expected by chance) wild-typemutant >4x Regulated genes Rank modules by regulated genes Predicted modules ModuleSig. Protein foldingP<0.0001 Cell diferentiationP<0.02 Glycolysis and foldingP<0.04 Mitochondrial and protein fateP<0.04 ModuleSig. Protein foldingP<0.0001 Cell diferentiationP<0.02 Glycolysis and foldingP<0.04 Mitochondrial and protein fateP<0.04 Modules predicted to be regulated by Ypl230w Ypl230w regulates computationally predicted genes

25 Regulated genes (1014) Ppt1 knockout (hypo-osmotic stress) wild-typemutant Regulated genes (1034) wild-typemutant Kin82 knockout (heat shock) ModuleSig. Energy and osmotic stressP<0.0001 Energy, osmolarity & cAMP signalingP<0.006 mRNA, rRNA and tRNA processingP<0.02 ModuleSig. Ribosomal and phosphate metabolismP<0.009 Amino acid and purine metabolismP<0.01 mRNA, rRNA and tRNA processingP<0.02 Protein foldingP<0.02 Cell cycleP<0.02 Does ‘X’ Regulate Predicted Genes?

26 Wet Lab Experiments Summary 3/3 regulators regulate computationally predicted genes New yeast biology suggested Ypl230w activates protein-folding, cell wall and ATP-binding genes Ppt1 represses phosphate metabolism and rRNA processing Kin82 activates energy and osmotic stress genes

27 Outline Who regulates whom and when? Model Learning algorithm Evaluation Wet lab experiments Perspective: why does it work? Reg. ACGTGC

28 Why does it work? Underlying assumption: Regulators are transcriptionally regulated Regulators are part of regulatory structures in which they are themselves regulated* Statistical methods can detect associations between regulators and their targets * [Shen-Orr et al., ’02] find many such structures

29 Regulator Chain Respiration module Time Active protein level mRNA expression level Phd1 Hap4 Targets Phd1 Hap4 Targets Phd1 (TF) Hap4 (TF) Cox4Cox6Atp17 Black: regulators that cannot be detected Red: correctly predicted regulator Blue: targets

30 Auto Regulation Snf kinase regulated processes module Yap6 (TF) Vid24Tor1Gut2 Black: regulators that cannot be detected Red: correctly predicted regulator Blue: targets

31 Positive Signaling Loop Sporulation and cAMP pathway module Sip2 (SM) Msn4 (TF) Vid24Tor1Gut2 Black: regulators that cannot be detected Red: correctly predicted regulator Blue: targets

32 Negative Signaling Loop Energy and osmotic stress module Tpk1 (SM) Msn4 (TF) Nth1Tps1Glo1 Black: regulators that cannot be detected Red: correctly predicted regulator Blue: targets

33 Why Does it Work? Feed-forward and feedback loops Some transcription factors and signal transduction molecules have a detectable expression signature Module Networks infers their regulatory relationships

34 Assignment Download the yeast stress expression dataset Download the list of transcription factor regulators Randomly partition the dataset in a 5-fold cross validation scheme For k=50: Create a hard-clustering model (use code from earlier exercise). At each array, this model has a separate Gaussian distribution for each of the 50 values of the cluster variable Use the assignment of genes to clusters that you learned in the hard-clustering, and for each cluster, learn a decision tree with at most: (1) one split (2) two splits (3) three splits Note 1: allow only splits with >=5 arrays in each side of the split Note 2: split question is whether the expression level of the transcription factor is greater than some value

35 Assignment Continued Note 3: at each leaf of the resulting model, there is a single Gaussian distribution that is used for all arrays that map to that leaf Compute the log-likelihood of the test data for each model (hard-clustering, and each of the three regulation models) Plot the avg. and std. test log-likelihood for each model For the model with two splits on each cluster, use the Gaussian distribution at each array to sample a new expression dataset with exactly the same number of genes and number of arrays. For each original gene and array, you sample from the Gaussian distribution associated with that gene and that array Learn a model with two splits for each cluster Plot the number of regulation tree splits that are identical between the model that sampled the data and the new model that you learned


Download ppt "Outline Who regulates whom and when? Model Learning algorithm Evaluation Wet lab experiments Perspective: why does it work? Reg. ACGTGC."

Similar presentations


Ads by Google