Presentation is loading. Please wait.

Presentation is loading. Please wait.

Understanding Gene Regulation: From Networks to Mechanisms Daphne Koller Stanford University.

Similar presentations


Presentation on theme: "Understanding Gene Regulation: From Networks to Mechanisms Daphne Koller Stanford University."— Presentation transcript:

1 Understanding Gene Regulation: From Networks to Mechanisms Daphne Koller Stanford University

2 Gene Regulatory Networks http://en.wikipedia.org/wiki/Gene_regulatory_network Controlled by diverse mechanisms Modified by endogenous and exogenous perturbations

3 Goals Infer regulatory network and mechanisms that control gene expression Identify effect of perturbations on network Understand effect of gene regulation on phenotype

4 Outline Regulatory networks for gene expression Individual genetic variation and gene regulation Cell differentiation and gene regulation Expression changes underlying phenotype

5 Regulatory Network I mRNA level of regulator can indicate its activity level Target expression is predicted by expression of its regulators Use expression of regulatory genes as regulators PHO5 PHM6 SPL2 PHO3 PHO84 VTC3 GIT1 PHO2 TEC1 GPA1 ECM18 UTH1 MEC3 MFA1 SAS5 SEC59 SGS1 PHO4 ASG7 RIM15 HAP1 PHO2 GPA1 MFA1 SAS5 PHO4 RIM15 Segal et al., Nature Genetics 2003; Lee et al., PNAS 2006 Transcription factors, signal transduction proteins, mRNA processing factors, …

6 Co-regulated genes have similar regulation program Exploit modularity and predict expression of entire module Allows uncovering complex regulatory programs Regulatory Network II module PHO5 PHM6 SPL2 PHO3 PHO84 VTC3 GIT1 PHO2 TEC1 GPA1 ECM18 UTH1 MEC3 MFA1 SAS5 SEC59 SGS1 PHO4 ASG7 RIM15 HAP1 PHO2 GPA1 MFA1 SAS5 PHO4 RIM15 Targets Segal et al., Nature Genetics 2003; Lee et al., PNAS 2006 “Regulatory Program” ?

7 Module Networks* Learning quickly runs out of statistical power Poor regulator selection lower in the tree Many correct regulators not selected Arbitrary choice among correlated regulators Combinatorial search Multiple local optima repressor  true repressor expression false context B module genes regulation program true context C activator  activator expression context A false target gene expression induced repressed ActivatorRepressor Genes in module Gene1Gene2Gene3 * Segal et al., Nature Genetics 2003

8 Regulation as Linear Regression minimize w (Σw i x i - E Targets ) 2 But we often have hundreds or thousands of regulators … and linear regression gives them all nonzero weight! xNxN … x1x1 x2x2 w1w1 w2w2 wNwN E Targets Problem: This objective learns too many regulators parameters w1w1 w2w2 wNwN E Targets = w 1 x 1 +…+w N x N + ε = MFA1 Module GPA1 -3 x + 0.5 x

9 Lasso* (L 1 ) Regression minimize w (w 1 x 1 + … w N x N - E Targets ) 2 +  C |w i | Induces sparsity in the solution w (many w i ‘s set to zero) Provably selects “right” features when many features are irrelevant Convex optimization problem Unique global optimum Efficient optimization But, arbitrary choice among correlated regulators xNxN … x1x1 x2x2 w1w1 w2w2 wNwN E Targets parameters w1w1 w2w2 x1x1 x2x2 * Tibshirani, 1996 L2L2 L1L1

10 Elastic Net* Regression minimize w (w 1 x 1 + … w N x N - E Targets ) 2 +  C |w i | +  D w i 2 Induces sparsity But avoids arbitrary choices among relevant features Convex optimization problem Unique global optimum Efficient optimization algorithms Lee et al., PLOS Genetics 2009 * Zhou & Hastie, 2005 xNxN … x1x1 x2x2 w1w1 w2w2 wNwN E Targets w1w1 w2w2 x1x1 x2x2 L2L2 L1L1

11 Cluster genes into modules Learn a regulatory program for each module Learning Regulatory Network PHO5 PHM6 SPL2 PHO3 PHO84 VTC3 GIT1 PHO2 TEC1 GPA1 ECM18 UTH1 MEC3 MFA1 SAS5 SEC59 SGS1 PHO4 ASG7 RIM15 HAP1 PHO2 GPA1 MFA1 SAS5 PHO4 RIM15 Lee et al., PLoS Genet 2009 = MFA1 Module GPA1 -3 x + 0.5 x This is a Bayesian network But multiple genes share same program Dependency model is linear regression

12 Outline Regulatory networks for gene expression Individual genetic variation and gene regulation Effect of genotype on expression Regulatory potential Cell differentiation and gene regulation Expression changes underlying phenotype

13 Genotype  phenotype …ACTCGGTTGGCCTAAATTCGGCCCGG… …ACCCGGTAGGCCTTAATTCGGCCCGG… : …ACTCGGTAGGCCTATATTCGGCCGGG… Different sequences Different phenotypes ? ? Perturbations to regulatory network

14 Genotype  Regulation …ACTCGGTTGGCCTAAATTCGGCCCGG… …ACCCGGTAGGCCTTAATTCGGCCCGG… : …ACTCGGTAGGCCTATATTCGGCCGGG… Different sequences ? ? Perturbations to regulatory network Goals: Infer regulatory network that controls gene expression Identify mechanisms by which genetic variation affects gene expression

15 eQTL Data [Brem et al. (2002) Science] BY × RM : 112 progeny Markers Expression data 112 individuals 6000 genes 0101100100…011 1011110100…001 0010110000…010 : 0000010100…101 0010000000…100 Genotype data 3000 markers 112 individuals 

16 Expression quantitative trait loci (eQTL) mapping For each gene, find the marker that is most predictive of its expression level [Yvert G et al. (2003) Nat Gen]. Traditional Approach: Single Marker genes Genotype data Expression data markers individuals 0101100100100…011 1011110111100…001 0010110001000…010 : 0000010110100…101 1110000110000…100 Gene i Marker j induced repressed 1 2 3 4 5 … Marker mRNA Gene

17 LirNet Regulatory network E-regulators: Activity (expression) of regulatory genes G-regulators: Genotype of genes Measured as values of chromosomal markers Lee et al., PNAS 2006; Lee et al., PLoS Genetics 2009 PHO5 PHM6 SPL2 PHO3 PHO84 VTC3 GIT1 PHO2 TEC1 GPA1 ECM18 UTH1 MEC3 MFA1 SAS5 SEC59 SGS1 PHO4 ASG7 RIM15 HAP1 PHO2 GPA1 MFA1 SAS5 PHO4 RIM15 M1 M120 M22 M1011 M321 marker

18 The Telomere Module Includes Rif2 – control telomere length establishes telomeric silencing 6 coding & 8 promoter SNPs Binds to Rap1p C-terminus Enrichment for Rap1p targets (29/42; p < 10 -15 ) Chr XII: 1056097 40/42 genes in telomeres Enriched for telomere maintenance (p < 10 -11 ) & helicase activity (p < 10 -18 ) Lee et al., PNAS 2006

19 Some Chromatin Modules Locus containing Sir1 4 coding SNPs 4/5 consecutive genes Locus containing uncharacterized Sir1 homologue 87(!) coding SNPs 5/7 consecutive genes Known Sir1 targets Chr XI: 643655 Chr X: 22213 Lee et al., PNAS 2006

20 Chromatin as Mechanism 23 modules (out of 165) with “chromosomal features” 16 have “chromatin regulators” (p < 10 -7 ) Chromatin modification explains significant part of variation in gene expression between strains Lee et al., PNAS 2006 Mechanism I “Evolutionary strategy” to make coordinated changes in gene expression by modifying small number of hubs

21 The Puf3 Module +1 0 - 1 112 segregants BYRM HAP4 TOP2 KEM1 GCN1 GCN20 DHH1 weight PUF3 expression genotype 147/153 genes (P ~ 10 -130 ) are pulldown targets of mRNA binding protein Puf3 P-body components PUF family: Sequence specific mRNA binding proteins (3’ UTR) Regulate degradation of mRNA and/or repress translation Lee et al., PLOS Genetics 2009

22 P-Bodies Dhh1 regulates mRNA decapping in p-bodies mRNAs stored in P-bodies are translationally repressed Translation mRNAs can exit P-bodies and resume translation ** Beliakova-Bethell, et al. (2006) RNA 12:94 GFP of Dhh1** RNA degradation RNA stored in P-bodies can be degraded (via decapping) * Sheth and Parker (2003) Science 300:805 cell P-body Puf3 protein ? But what regulates sequence-specific localization of mRNAs to P-bodies?

23 Microscopy experiment Fluorescent microscopy: Puf3 localizes to P-body Supports hypothesis of Puf3 involvement in regulating mRNA degradation by P-bodies Dhh1 Puf3 cell P-body Puf3 protein ? Dhh1 Joint work with David Drubin and Pam Silver Lee et al., PLOS Genetics 2009

24 What Regulates the P-bodies? A marker that covers a large region in Chr 14. Region contains ~30 genes and ~318 SNPs. Experiments for all 30 genes not feasible! ChrXIV:449639 DHH1 BY RM GCN20 GCN1 KEM1 BLM3 Lee et al., PLOS Genetics 2009

25 Outline Regulatory networks for gene expression Individual genetic variation and gene regulation Effect of genotype on expression Regulatory potential Cell differentiation and gene regulation Expression changes underlying phenotype

26 Motivation Not all SNPs are equally likely to be causal. Gene SNPs TACGTAGGAACCTGTACCA … GGAAAATATCAAATCCAACGACGTTAGCCAATGCGATCGAATGGGAACGTA ChrXIV: 449,639-502,316 SNP 1: Conserved residue in a gene involved in RNA degradation SNP 2: In nonconserved intergenic region “Regulatory features” F 1. Gene region? 2. Protein coding region? 3. Nonsynonymous? 4. Create a stop codon? 5. Strong conservation? : Idea: Prioritize SNPs that have “good” regulatory features But how do we weight different features? Lee et al., PLOS Genetics 2009

27 Bayesian L 1 -Regularization w mk ymym Module m x mk Regulator k ~ P(y m |x;w) = N (  k w mk x mk,ε 2 ) ~ P(w) = Laplacian(0,C) higher prior variance  weight can more easily deviate from 0  regulator more likely to be selected wiwi wiwi Prior variance Lee et al., PLOS Genetics 2009

28 Metaprior Model (Hierarchical Bayes) ymym x mk w mk ~ Laplacian (0,C mk ) = g(ß T f mk ) : ~ P(y m |x;w) = N (  k w mk x mk,ε 2 ) C mk f mk Regulatory prior ß Module m Regulator k Module 1 Module M xNxN … x1x1 x2x2 x1x1 x2x2 E module 1 Potential regulators xNxN … x1x1 x2x2 x2x2 E module M xNxN Regulatory features Inside a gene? Protein coding region? Strong conservation? TF binds to module genes : “Regulatory potential” = ß 1 x Inside a gene? + ß 2 x Protein coding region? + ß 3 x Conserved? … YES NO : YES NO : NO : w 11 w 12 w 1N w N1 w N2 w MN Lee et al., PLOS Genetics 2009

29 Metaprior Method Regulatory potentials … x1x1 x2x2 xNxN EiEi … x1x1 x2x2 EiEi xNxN Module i+2 … x1x1 x2x2 EiEi xNxN Module i+1 … x1x1 x2x2 EiEi xNxN Module i Regulatory programs 1. Learn regulatory programs 3. Compute regulatory potential of SNPs Regulatory weights ß Non-synonymous Conservation AA small  large Cell cycle 0 0.1 0.2 0.3 BY(lab) MVLT ELVQ VSDASKQLWDI RM(wild) MVLT ELVQ VSDASKQLWDI  1  0 LDLD  1  0  1 : 0.3 0.9 == Regulatory features F AGAG 2. Learn ß Lee et al., PLOS Genetics 2009 Maximize P(E,ß,W|X) Empirical hierarchical Bayes Use point estimate of model parameters Learn priors from data to maximize joint posterior

30 Transfer Learning What do regulatory potentials do? They do not change selection of “strong” regulators – those where prediction of targets is clear They only help disambiguate between weak ones Strong regulators help teach us what to look for in other regulators Transfer of knowledge between different prediction tasks

31 Learned regulatory weights Human regulatory weights Location AA property change Gene function Pairwise feature Regulatory features Lee et al., PLOS Genetics 2009 Yeast regulatory weights

32 Statistical Evaluation # of genes with PGV > X % PGV: Percent genetic variation explained by the predicted regulatory program for each gene 100 90 80 70 60 50 40 30 20 10 0 500 1000 1500 2000 2500 PGV (%) 1450 genes (Lirnet without regulatory prior) 250 genes (Brem & Kruglyak) 850 genes (Geronemo*) 1650 genes (Lirnet) * Lee et al. PNAS 2006 Lee et al., PLOS Genetics 2009

33 How many predicted interactions have support in other data? Deletion/ over-expression microarrays [Hughes et al. 2000; Chua et al. 2006] ChIP-chip binding experiments [Harbison et al. 2004] Transcription factor binding sites [Maclsaac et al. 2006] mRNA binding pull-down experiments [Gerber et al. 2004] Literature-curated signaling interactions Biological evaluation I %interactions %modules Reg Module Via cascade Lirnet without regulatory features % Supported interactions TF Lee et al., PLOS Genetics 2009

34 Biological Evaluation II Lee et al., PLOS Genetics 2009 Lirnet Zhu et al (Nature Genet 2008) Random significance of support

35 What Regulates the P-Bodies? Regulatory potential 0.7 0.6 0.5 0.4 ChrXIV:415,000-495,000 Saccharomyces Genome Database (SGD) High-scoring regulatory features Conservation0.230Mass (Da)0.066 Cis-regulation0.208pK10.060 Same GO proc0.204Polar0.057 Non-syn0.158pI0.050 Translation regulation0.046 MKT1 The regulatory potential over all 318 SNPs in the region ChrXIV:449639 Lee et al., PLOS Genetics 2009

36 XPG-NPbp1 InteractionXPG-I * Mkt1 Mkt1 binds to mRNAs at 3’ UTR BY has SNP at conserved residue in nuclease domain MKT1 Puf3 module P-body module BY RM mkt1  in RM Lee et al., PLOS Genetics 2009

37 Predicting Causal Regulators RegionZhu et al [Nat Genet 08] Lirnet (top 3 are considered) 1None SEC18RDH54SPT7 2 TBS1, TOS1, ARA1, CSH1, SUP45, CNS1, AMN1 AMN1CNS1TOS1 3None TRS20ABD1PRP5 4LEU2, ILV6, NFS1, CIT2, MATALPHA1 LEU2PGS1ILV6 5MATALPHA1 MATALPHA2RBK1 6URA3 NPP2PAC2 7GPA1 STP2GPA1NEM1 8HAP1 NEJ1GSY2 9YRF1-4, YRF1-5, YLR464W SIR3HMG2ECM7 10None ARG81 TAF13CAC2 11SAL1, TOP2 MKT1TOP2MSK1 12PHM7 ATG19BRX1 13None ADE2ORT1CAT5 Finding causal regulators for 13 “chromosomal hotspots” 8 validated regulators in 7 regions 14 validated regulators in 11 regions Lee et al., PLOS Genetics 2009

38 Learning Regulatory Priors Learns regulatory potentials that are specific to organism and even data set Can use any set of regulatory features: Sequence features Functional features for relevant gene Features of regulator/target(s) pairs Applicable to any organism, including ones where functional data may not be readily available Lee et al., PLOS Genetics 2009

39 Conclusion Framework for modeling gene regulation Use machine learning to identify regulatory program Hierarchical Bayesian techniques to capture regularities in effects of perturbations on network Uncovers diverse regulatory mechanisms Chromatin remodeling mRNA degradation

40 Acknowledgements National Science Foundation Nevan Krogan, UCSF Pam Silver, David Drubin, Harvard Medical School Su-In Lee University of Washington Dana Pe’er Columbia University Aimee Dudley Institute for System Biology


Download ppt "Understanding Gene Regulation: From Networks to Mechanisms Daphne Koller Stanford University."

Similar presentations


Ads by Google