Presentation is loading. Please wait.

Presentation is loading. Please wait.

Understanding Gene Regulation: From Networks to Mechanisms

Similar presentations


Presentation on theme: "Understanding Gene Regulation: From Networks to Mechanisms"— Presentation transcript:

1 Understanding Gene Regulation: From Networks to Mechanisms
Daphne Koller Stanford University

2 Gene Regulatory Networks
Controlled by diverse mechanisms Modified by endogenous and exogenous perturbations

3 Goals Infer regulatory network and mechanisms that control gene expression Identify effect of perturbations on network Understand effect of gene regulation on phenotype

4 Outline Regulatory networks for gene expression
Individual genetic variation and gene regulation Expression changes underlying phenotype

5 Regulatory Network I mRNA level of regulator can indicate its activity level Target expression is predicted by expression of its regulators Use expression of regulatory genes as regulators Transcription factors, signal transduction proteins, mRNA processing factors, … PHO5 PHM6 SPL2 PHO3 PHO84 VTC3 GIT1 PHO2 TEC1 GPA1 ECM18 UTH1 MEC3 MFA1 SAS5 SEC59 SGS1 PHO4 ASG7 RIM15 HAP1 PHO2 GPA1 MFA1 SAS5 PHO4 RIM15 Segal et al., Nature Genetics 2003; Lee et al., PNAS 2006 5

6 Regulatory Network II Co-regulated genes have similar regulation program Exploit modularity and predict expression of entire module Allows uncovering complex regulatory programs module PHO5 PHM6 SPL2 PHO3 PHO84 VTC3 GIT1 PHO2 TEC1 GPA1 ECM18 UTH1 MEC3 MFA1 SAS5 SEC59 SGS1 PHO4 ASG7 RIM15 HAP1 PHO2 GPA1 MFA1 SAS5 PHO4 RIM15 “Regulatory Program” ? Targets Segal et al., Nature Genetics 2003; Lee et al., PNAS 2006 6

7 target gene expression
Module Networks* Learning quickly runs out of statistical power Poor regulator selection lower in the tree Many correct regulators not selected Arbitrary choice among correlated regulators Combinatorial search Multiple local optima Activator Repressor Genes in module Gene1 Gene2 Gene3 repressor true repressor expression false context B module genes regulation program context C activator activator expression context A target gene expression induced repressed * Segal et al., Nature Genetics 2003

8 Regulation as Linear Regression
minimizew (Σwixi - ETargets)2 But we often have hundreds or thousands of regulators … and linear regression gives them all nonzero weight! x1 x2 xN = MFA1 Module GPA1 -3 x + 0.5 x w1 w2 wN w2 w1 wN parameters ETargets ETargets= w1 x1+…+wN xN+ε Problem: This objective learns too many regulators

9 L1 (Lasso) Regression minimizew (w1x1 + … wNxN - ETargets)2+  C |wi| Induces sparsity in the solution w (many wi‘s set to zero) Provably selects “right” features when many features are irrelevant Convex optimization problem Unique global optimum Efficient optimization But, arbitrary choice among correlated regulators x1 x2 x1 x2 xN L2 L1 w1 w2 w2 w1 wN parameters ETargets * Tibshirani, 1996

10 L1/L2 (Elastic Net*) Regression
minimizew (w1x1 + … wNxN - ETargets)2+  C |wi| +  D wi2 Induces sparsity But avoids arbitrary choices among relevant features Convex optimization problem Unique global optimum Efficient optimization algorithms x1 x2 x1 x2 xN L2 L1 w1 w2 w2 w1 wN ETargets * Zhou & Hastie, 2005 Lee et al., PLOS Genetics 2009

11 Learning Regulatory Network
Cluster genes into modules Learn a regulatory program for each module = MFA1 Module GPA1 -3 x + 0.5 x ECM18 ASG7 MEC3 UTH1 GPA1 GPA1 MFA1 MFA1 This is a Bayesian network But multiple genes share same program Dependency model is linear regression TEC1 HAP1 PHO3 Make sure to point out that genes in rows, columns are individuals “such that in these individuals (point to target genes on the left) the target genes go up and in these individuals (point to target genes on the right) the target genes go down.” “The method also constructs a predicted regulatory program by identifying either genetic polymorphisms or gene expression patterns that are significantly correlated or anti-correlated with the expression levels of target genes… It then iterates over two steps: learning a regulatory program for each module and reassigning each gene to the module whose regulation program provides the best prediction for the gene’s expression profile. PHO5 PHO84 SGS1 RIM15 RIM15 PHM6 PHO2 PHO2 PHO4 PHO4 SEC59 SAS5 SAS5 SPL2 GIT1 VTC3 Lee et al., PLoS Genet 2009 11

12 Outline Regulatory networks for gene expression
Individual genetic variation and gene regulation Effect of genotype on expression Regulatory potential Cell differentiation and gene regulation Expression changes underlying phenotype

13 Genotype  phenotype ? ? Different sequences Perturbations to
regulatory network Different phenotypes …ACTCGGTTGGCCTAAATTCGGCCCGG… …ACCCGGTAGGCCTTAATTCGGCCCGG… : …ACTCGGTAGGCCTATATTCGGCCGGG… ?

14 Genotype  Regulation ? ? Goals:
Different sequences ? Perturbations to regulatory network …ACTCGGTTGGCCTAAATTCGGCCCGG… …ACCCGGTAGGCCTTAATTCGGCCCGG… : …ACTCGGTAGGCCTATATTCGGCCGGG… ? Goals: Infer regulatory network that controls gene expression Identify mechanisms by which genetic variation affects gene expression

15 eQTL Data [Brem et al. (2002) Science]
BY RM : 112 progeny Markers Genotype data Expression data × 112 individuals 112 individuals …011 …001 …010 : …101 …100 데이터가 어떻게 만들어지나 3000 markers 6000 genes

16 Traditional Approach: Single Marker
Expression quantitative trait loci (eQTL) mapping For each gene, find the marker that is most predictive of its expression level [Yvert G et al. (2003) Nat Gen]. Marker mRNA Gene individuals …011 …001 …010 : …101 …100 markers genes Gene i Marker j induced repressed Genotype data Expression data

17 LirNet Regulatory network
E-regulators: Activity (expression) of regulatory genes G-regulators: Genotype of genes Measured as values of chromosomal markers M1 M120 M22 M1011 M321 marker ECM18 ASG7 MEC3 PHO2 GPA1 MFA1 SAS5 PHO4 RIM15 UTH1 GPA1 MFA1 TEC1 HAP1 PHO3 PHO5 PHO84 SGS1 RIM15 PHM6 PHO2 PHO4 SEC59 SAS5 SPL2 GIT1 VTC3 Lee et al., PNAS 2006; Lee et al., PLoS Genetics 2009 17

18 The Telomere Module 40/42 genes in telomeres
Enriched for telomere maintenance (p < 10-11) & helicase activity (p < 10-18) Includes Rif2 – control telomere length establishes telomeric silencing 6 coding & 8 promoter SNPs Binds to Rap1p C-terminus Chr XII: SNPs in Rif2?? Enrichment for Rap1p targets (29/42; p < 10-15) Lee et al., PNAS 2006

19 Some Chromatin Modules
Locus containing Sir1 4 coding SNPs Chr XI: 4/5 consecutive genes Known Sir1 targets Locus containing uncharacterized Sir1 homologue 87(!) coding SNPs Chr X: 22213 5/7 consecutive genes Lee et al., PNAS 2006

20 Chromatin as Mechanism
Mechanism I Chromatin as Mechanism 23 modules (out of 165) with “chromosomal features” 16 have “chromatin regulators” (p < 10-7) Chromatin modification explains significant part of variation in gene expression between strains “Evolutionary strategy” to make coordinated changes in gene expression by modifying small number of hubs Out of 165 modules Chromosome features – Summary of chromosomal modules. Shown is a list of all modules containing chromosomal characteristics or that have chromatin modifiers as trans-E regulators. The columns in the table (in order) are as follows: module, module number; #genes, number of genes in the module; runs, whether the module contains multiple or long runs of genes along the chromosome (blue); dom, whether the module exhibits enrichment for some chromosomal domain (light cyan, telomeres; dark cyan, Ty (transposable element)/LTR); target enrichment, list of chromatin modification complexes such that the module is enriched for DEGs of some gene in the complex (sorted in order of P value); chrom, whether the module was characterized as chromosomal (purple); reg-E, chromatin modifiers that are trans-E module regulators; reg-G, chromatin modifiers with SNPs that are in the region of trans-G module regulators. The strong overlap between different chromosomal characteristics (runs, domains, and chromatin DEGs) supports our definition of a chromosomal module as one that contains two of three characteristics. There is significant overlap between chromosomal modules and modules predicted by our analysis to have a chromatin modifier as a regulator Lee et al., PNAS 2006 20

21 The Puf3 Module 147/153 genes (P ~ 10-130) are pulldown targets of
P-body components weight HAP4 TOP2 KEM1 GCN1 GCN20 DHH1 PUF family: Sequence specific mRNA binding proteins (3’ UTR) Regulate degradation of mRNA and/or repress translation BY RM 112 segregants 147/153 genes (P ~ ) are pulldown targets of mRNA binding protein Puf3 +1 - 1 PUF3 expression genotype Lee et al., PLOS Genetics 2009

22 P-Bodies Dhh1 regulates mRNA decapping in p-bodies
GFP of Dhh1** mRNAs stored in P-bodies are translationally repressed Translation mRNAs can exit P-bodies and resume translation RNA degradation RNA stored in P-bodies can be degraded (via decapping) Dhh1 regulates mRNA decapping in p-bodies But what regulates sequence-specific localization of mRNAs to P-bodies? P-body Puf3 protein ? cell * Sheth and Parker (2003) Science 300:805 ** Beliakova-Bethell , et al. (2006) RNA 12:94

23 Microscopy experiment
Fluorescent microscopy: Puf3 localizes to P-body Supports hypothesis of Puf3 involvement in regulating mRNA degradation by P-bodies Puf3 protein P-body ? Dhh1 cell Dhh1 Puf3 Joint work with David Drubin and Pam Silver Lee et al., PLOS Genetics 2009

24 What Regulates the P-bodies?
A marker that covers a large region in Chr 14. Region contains ~30 genes and ~318 SNPs. Experiments for all 30 genes not feasible! RM BY ChrXIV:449639 DHH1 BLM3 GCN20 KEM1 GCN1 Lee et al., PLOS Genetics 2009

25 Outline Regulatory networks for gene expression
Individual genetic variation and gene regulation Effect of genotype on expression Regulatory potential Expression changes underlying phenotype

26 Motivation Not all SNPs are equally likely to be causal.
ChrXIV: 449, ,316 Gene SNPs TACGTAGGAACCTGTACCA … GGAAAATATCAAATCCAACGACGTTAGCCAATGCGATCGAATGGGAACGTA “Regulatory features” F 1. Gene region? 2. Protein coding region? 3. Nonsynonymous? 4. Create a stop codon? 5. Strong conservation? : SNP 1: Conserved residue in a gene involved in RNA degradation SNP 2: In nonconserved intergenic region The problem is exacerbated by Idea: Prioritize SNPs that have “good” regulatory features But how do we weight different features? Lee et al., PLOS Genetics 2009

27 Bayesian L1-Regularization
wi Prior variance ym Module m higher prior variance weight can more easily deviate from 0 regulator more likely to be selected xmk Regulator k wmk ~ P(w) = Laplacian(0,C) ~ P(ym|x;w) = N (k wmkxmk,ε2) Lee et al., PLOS Genetics 2009

28 Metaprior Model (Hierarchical Bayes)
NO : Module 1 Potential regulators Regulatory features Inside a gene? Protein coding region? Strong conservation? TF binds to module genes : x1 x1 x2 x2 xN Module m w12 w11 w1N Regulatoryprior ß Regulator k Cmk fmk Emodule 1 “Regulatory potential” = ß1 x Inside a gene? + ß2 x Protein coding region? + ß3 x Conserved? … = g(ßTfmk) xmk : wmk Module M ~ Laplacian (0,Cmk) x1 x2 x2 xN xN ym wN2 ~ P(ym|x;w) = N (k wmkxmk,ε2) wN1 wMN Emodule M Lee et al., PLOS Genetics 2009

29 3. Compute regulatory potential of SNPs Regulatory potentials
Metaprior Method BY(lab) MVLT ELVQ VSDASKQLWDI RM(wild) MVLT ELVQ VSDASKQLWDI  1  0 L D : : = Regulatory features F A G Regulatory weights ß Non-synonymous Conservation AA small  large Cell cycle 3. Compute regulatory potential of SNPs Empirical hierarchical Bayes Use point estimate of model parameters Learn priors from data to maximize joint posterior 2. Learn ß Regulatory potentials x1 x2 xN Ei 1. Learn regulatory programs Maximize P(E,ß,W|X) x1 x2 Ei xN Module i+2 Module i+1 Module i Regulatory programs Animate it Maximize P(E,ß,W|X) Lee et al., PLOS Genetics 2009

30 between different prediction tasks
Transfer Learning What do regulatory potentials do? They do not change selection of “strong” regulators – those where prediction of targets is clear They only help disambiguate between weak ones Strong regulators help teach us what to look for in other regulators Transfer of knowledge between different prediction tasks

31 Learned regulatory weights
Yeast regulatory weights Human regulatory weights Regulatory features Location AA property change Show human? Gene function Pairwise feature Lee et al., PLOS Genetics 2009

32 Statistical Evaluation
PGV: Percent genetic variation explained by the predicted regulatory program for each gene 1650 genes (Lirnet) # of genes with PGV > X % 1450 genes (Lirnet without regulatory prior) Show the distribution 850 genes (Geronemo*) 250 genes (Brem & Kruglyak) PGV (%) Lee et al., PLOS Genetics 2009 * Lee et al. PNAS 2006

33 Biological evaluation I
How many predicted interactions have support in other data? Deletion/ over-expression microarrays [Hughes et al. 2000; Chua et al. 2006] ChIP-chip binding experiments [Harbison et al. 2004] Transcription factor binding sites [Maclsaac et al. 2006] mRNA binding pull-down experiments [Gerber et al. 2004] Literature-curated signaling interactions Lirnet without regulatory features % Supported interactions Reg TF Module %interactions %modules %interactions %modules Lee et al., PLOS Genetics 2009 Via cascade

34 Biological Evaluation II
Lirnet Zhu et al (Nature Genet 2008) Random significance of support Lee et al., PLOS Genetics 2009

35 What Regulates the P-Bodies?
The regulatory potential over all 318 SNPs in the region ChrXIV:415, ,000 MKT1 0.7 High-scoring regulatory features Regulatory potential Conservation 0.230 Mass (Da) 0.066 Cis-regulation 0.208 pK1 0.060 Same GO proc 0.204 Polar 0.057 Non-syn 0.158 pI 0.050 Translation regulation 0.046 0.6 0.5 0.4 ChrXIV:449639 Lee et al., PLOS Genetics 2009 Saccharomyces Genome Database (SGD)

36 Mkt1 Mkt1 binds to mRNAs at 3’ UTR
BY has SNP at conserved residue in nuclease domain XPG-N Pbp1 Interaction XPG-I * mkt1D in RM MKT1 Puf3 module P-body module RM BY Lee et al., PLOS Genetics 2009

37 Predicting Causal Regulators
8 validated regulators in 7 regions 14 validated regulators in 11 regions Finding causal regulators for 13 “chromosomal hotspots” Region Zhu et al [Nat Genet 08] Lirnet (top 3 are considered) 1 None SEC18 RDH54 SPT7 2 TBS1, TOS1, ARA1, CSH1, SUP45, CNS1, AMN1 AMN1 CNS1 TOS1 3 TRS20 ABD1 PRP5 4 LEU2, ILV6, NFS1, CIT2, MATALPHA1 LEU2 PGS1 ILV6 5 MATALPHA1 MATALPHA2 RBK1 6 URA3 NPP2 PAC2 7 GPA1 STP2 NEM1 8 HAP1 NEJ1 GSY2 9 YRF1-4, YRF1-5, YLR464W SIR3 HMG2 ECM7 10 ARG81 TAF13 CAC2 11 SAL1, TOP2 MKT1 TOP2 MSK1 12 PHM7 ATG19 BRX1 13 ADE2 ORT1 CAT5 Lee et al., PLOS Genetics 2009

38 Learning Regulatory Priors
Learns regulatory potentials that are specific to organism and even data set Can use any set of regulatory features: Sequence features Functional features for relevant gene Features of regulator/target(s) pairs Applicable to any organism, including ones where functional data may not be readily available Lee et al., PLOS Genetics 2009

39 Outline Regulatory networks for gene expression
Individual genetic variation and gene regulation Cell differentiation and gene regulation Expression changes underlying phenotype Cancer survival Genotype → Expression → Phenotype

40 Regulation  Phenotype
? Perturbations to regulatory network Different phenotypes ? Goals: Understand expression changes that affect phenotype Understand regulatory mechanisms that underlie them

41 Transformation of FL to DLBCL
Occurs in 40-60% of patients Dramatically worse prognosis Diverse mechanisms seem to drive transformation Survival (nontransforming) (transforming) Time to transformation Year Probability Gentles, Alizadeh et al., Blood (to appear)

42 Module Network for Transformation
88 FL/DLBCL arrays* 30 DLBCL 40 FL-t (transformed) 18 FL-nt (nontransformed) 12743 array probes Gene-based analysis inconclusive Constructed regulatory network 2846 putative regulators We ignore the ones with ambiguous histology; range found is genes (56 median); median of 6 regulators per module (p<0.01) – range is 1:12 * Glas et al. “Gene expression profiling in follicular lymphoma to assess clinical aggressiveness and to guide the choice of treatment” Blood 2005 Gentles, Alizadeh et al., Blood (to appear)

43 Example Module – Cell Cycle
Cdc6 Regulates DNA replication Nek2 Regulates mitosis Implicated in chromosomal instability FL-nt FL-t DLBCL CDC6 (which regulates DNA replication) and NEK2 (which regulates mitosis and has been implicated in chromosomal instability) Plausible cell cycle regulators DLBCL exhibits increased cell proliferation Gentles, Alizadeh et al., Blood (to appear)

44 Key Modules in Transformation
Represent each module as “metagene” expression profile average expression of genes Use machine learning* to identify modules distinguishing FL-t (pre-transformation) from transformed DLBCL * PAM (Tibshirani et al, PNAS 2002) Gentles, Alizadeh et al., Blood (to appear)

45 Transformation Network
Module1 → Module2: Regulator in Module1 regulates Module2 ESC: Embryonic Stem Cell HSC: Hematopoietic Stem Cell SSS: Stromal Stem Cell Gentles, Alizadeh et al., Blood (to appear)

46 Predicting Survival Construct multivariate linear regression model for FL survival using “stemness” modules as predictors LPS = 1.14*ESC *ESC2 – 1.35*TGF-beta “Core” of HT network Enhances transformation Counteracts transformation Gentles, Alizadeh et al., Blood (to appear)

47 Therapeutic Implications?
Connectivity Map analysis Protease-inhibitor Bortezamib under clinical trials Has no effect in relapsed de novo DLBCL Sensitizes ABC de novo DLBCL but not GCB DLBCL to chemotherapy Transformed DLBCL similar (phenotype & expresion) to GCB Bortezamib Dunleavey et al. (2009). “Differential efficacy of bortezomib plus chemotherapy within molecular subtypes of diffuse large B-cell lymphoma”, Blood 113: Gentles, Alizadeh et al., Blood (to appear)

48 Therapeutic Implications?
Connectivity Map analysis HDAC inhibitor Trichostatin-A Inhibits eukaryotic cell cycle Primary use as antifungal Also used as anti-cancer drug Possible mechanisms: Promotes apoptosis Induces cell differentiation Trichostatin-A Trichostatin A is an organic compound that serves as an antifungal antibiotic and selectively inhibits the class I and II mammalian histone deacetylase (HDAC) families of enzymes, but not class III HDACs (i.e., Sirtuins)[1]. TSA inhibits the eukaryotic cell cycle during the beginning of the growth stage. TSA can be used to alter gene expression by interfering with the removal of acetyl groups from histones (histone deacetylases, HDAC) and therefore altering the ability of DNA transcription factors to access the DNA molecules inside chromatin. It is a member of a larger class of histone deacetylase inhibitors (HDIs or HDACIs) that have a broad spectrum of epigenetic activities. Thus, TSA has some uses as an anti-cancer drug[2]. One suggested mechanism is that TSA promotes the expression of apoptosis-related genes, leading to cancerous cells surviving at lower rates, thus slowing the progression of cancer[3]. Other mechanisms may include the activity of HDIs to induce cell differentiation, thus acting to "mature" some of the de-differentiated cells found in tumors. HDIs have multiple effects on non-histone effector molecules, so the anti-cancer mechanisms are truly not understood at this time. Drummond et al. (2005). “Clinical development of histone deacetylase inhibitors as anticancer agents” Annu Rev Pharmacol Toxicol 45: 495–528. Gentles, Alizadeh et al., Blood (to appear)

49 Conclusion Framework for modeling gene regulation
Use machine learning to identify regulatory program Hierarchical Bayesian techniques to capture regularities in effects of perturbations on network Uncovers diverse regulatory mechanisms Chromatin remodeling mRNA degradation Using network for understanding effect of gene regulation on phenotype

50 Future Directions Other evidence regarding regulation
chromatin IP Other types of perturbations Structural genomic variations Environmental conditions Pathology Small molecules Diverse phenotypes Response to treatment

51 Acknowledgements Yeast eQTL Lymphoma transformation Su-In Lee
Nevan Krogan Pam Silver, David Drubin Lymphoma transformation Su-In Lee, June H. Myklebust Catherine Shachaf, Babak Shahbaba Ronald Levy Su-In Lee Andrew Gentles Ash Alizadeh Sylvia Plevritis Aimee Dudley Dana Pe’er National Science Foundation, National Cancer Institute


Download ppt "Understanding Gene Regulation: From Networks to Mechanisms"

Similar presentations


Ads by Google