Understanding Gene Regulation: From Networks to Mechanisms Daphne Koller Stanford University.

Slides:



Advertisements
Similar presentations
Methods to read out regulatory functions
Advertisements

Genetic Analysis of Genome-wide Variation in Human Gene Expression Morley M. et al. Nature 2004,430: Yen-Yi Ho.
Periodic clusters. Non periodic clusters That was only the beginning…
Inferring Quantitative Models of Regulatory Networks From Expression Data Iftach Nachman Hebrew University Aviv Regev Harvard Nir Friedman Hebrew University.
Lectures 9 – Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.
1 Harvard Medical School Mapping Transcription Mechanisms from Multimodal Genomic Data Hsun-Hsien Chang, Michael McGeachie, and Marco F. Ramoni Children.
Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha.
Transcriptional-level control (10) Researchers use the following techniques to find DNA sequences involved in regulation: – Deletion mapping – DNA footprinting.
CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
. Inferring Subnetworks from Perturbed Expression Profiles D. Pe’er A. Regev G. Elidan N. Friedman.
A Probabilistic Dynamical Model for Quantitative Inference of the Regulatory Mechanism of Transcription Guido Sanguinetti, Magnus Rattray and Neil D. Lawrence.
From Sequence to Expression: A Probabilistic Framework Eran Segal (Stanford) Joint work with: Yoseph Barash (Hebrew U.) Itamar Simon (Whitehead Inst.)
Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman.
Naveen K. Bansal and Prachi Pradeep Dept. of Math., Stat., and Comp. Sci. Marquette University Milwaukee, WI (USA)
Epigenetics 12/05/07 Statisticians like data.
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Functional genomics and inferring regulatory pathways with gene expression data.
Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break 14:45 – 15:15Regulatory pathways lecture 15:15 – 15:45Exercise.
MSc GBE Course: Genes: from sequence to function Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
Integrated analysis of regulatory and metabolic networks reveals novel regulatory mechanisms in Saccharomyces cerevisiae Speaker: Zhu YANG 6 th step, 2006.
Indiana University Bloomington, IN Junguk Hur Computational Omics Lab School of Informatics Differential location analysis A novel approach to detecting.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
CS 374: Relating the Genetic Code to Gene Expression Sandeep Chinchali.
Office hours Wednesday 3-4pm 304A Stanley Hall Review session 5pm Thursday, Dec. 11 GPB100.
Give me your DNA and I tell you where you come from - and maybe more! Lausanne, Genopode 21 April 2010 Sven Bergmann University of Lausanne & Swiss Institute.
Epistasis Analysis Using Microarrays Chris Workman.
Identification of obesity-associated intergenic long noncoding RNAs
Computational Molecular Biology Biochem 218 – BioMedical Informatics Gene Regulatory.
Special Topics in Genomics Lecture 1: Introduction Instructor: Hongkai Ji Department of Biostatistics
Characterizing the role of miRNAs within gene regulatory networks using integrative genomics techniques Min Wenwen
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
Finish up array applications Move on to proteomics Protein microarrays.
Vidyadhar Karmarkar Genomics and Bioinformatics 414 Life Sciences Building, Huck Institute of Life Sciences.
Lectures 2 – Oct 3, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.
Analysis of the yeast transcriptional regulatory network.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
Lectures 9 – Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Class 23, 2001 CBCl/AI MIT Bioinformatics Applications and Feature Selection for SVMs S. Mukherjee.
Recombination breakpoints Family Inheritance Me vs. my brother My dad (my Y)Mom’s dad (uncle’s Y) Human ancestry Disease risk Genomics: Regions  mechanisms.
Introduction to biological molecular networks
Module Networks BMI/CS 576 Mark Craven December 2007.
Learning disjunctions in Geronimo’s regression trees Felix Sanchez Garcia supervised by Prof. Dana Pe’er.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
1 Paper Outline Specific Aim Background & Significance Research Description Potential Pitfalls and Alternate Approaches Class Paper: 5-7 pages (with figures)
1 What forces constrain/drive protein evolution? Looking at all coding sequences across multiple genomes can shed considerable light on which forces contribute.
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
Genetics of Gene Expression BIOS Statistics for Systems Biology Spring 2008.
Transcriptional Enhancers Looking out for the genes and each other Sridhar Hannenhalli Department of Cell Biology and Molecular Genetics Center for Bioinformatics.
Simultaneous identification of causal genes and dys-regulated pathways in complex diseases Yoo-Ah Kim, Stefan Wuchty and Teresa M Przytycka Paper to be.
Enhancers and 3D genomics Noam Bar RESEARCH METHODS IN COMPUTATIONAL BIOLOGY.
Understanding GWAS SNPs Xiaole Shirley Liu Stat 115/215.
Understanding Gene Regulation: From Networks to Mechanisms
Yiming Kang, Hien-haw Liow, Ezekiel Maier, & Michael Brent
EQTLs.
Genetic Regulatory Complexity: Lessons from yeast to cancer
Gene Hunting: Design and statistics
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
Schedule for the Afternoon
Network Inference Chris Holmes Oxford Centre for Gene Function, &,
Principle of Epistasis Analysis
Presentation transcript:

Understanding Gene Regulation: From Networks to Mechanisms Daphne Koller Stanford University

Gene Regulatory Networks Controlled by diverse mechanisms Modified by endogenous and exogenous perturbations

Goals Infer regulatory network and mechanisms that control gene expression Identify effect of perturbations on network Understand effect of gene regulation on phenotype

Outline Regulatory networks for gene expression Individual genetic variation and gene regulation Cell differentiation and gene regulation Expression changes underlying phenotype

Regulatory Network I mRNA level of regulator can indicate its activity level Target expression is predicted by expression of its regulators Use expression of regulatory genes as regulators PHO5 PHM6 SPL2 PHO3 PHO84 VTC3 GIT1 PHO2 TEC1 GPA1 ECM18 UTH1 MEC3 MFA1 SAS5 SEC59 SGS1 PHO4 ASG7 RIM15 HAP1 PHO2 GPA1 MFA1 SAS5 PHO4 RIM15 Segal et al., Nature Genetics 2003; Lee et al., PNAS 2006 Transcription factors, signal transduction proteins, mRNA processing factors, …

Co-regulated genes have similar regulation program Exploit modularity and predict expression of entire module Allows uncovering complex regulatory programs Regulatory Network II module PHO5 PHM6 SPL2 PHO3 PHO84 VTC3 GIT1 PHO2 TEC1 GPA1 ECM18 UTH1 MEC3 MFA1 SAS5 SEC59 SGS1 PHO4 ASG7 RIM15 HAP1 PHO2 GPA1 MFA1 SAS5 PHO4 RIM15 Targets Segal et al., Nature Genetics 2003; Lee et al., PNAS 2006 “Regulatory Program” ?

Module Networks* Learning quickly runs out of statistical power Poor regulator selection lower in the tree Many correct regulators not selected Arbitrary choice among correlated regulators Combinatorial search Multiple local optima repressor  true repressor expression false context B module genes regulation program true context C activator  activator expression context A false target gene expression induced repressed ActivatorRepressor Genes in module Gene1Gene2Gene3 * Segal et al., Nature Genetics 2003

Regulation as Linear Regression minimize w (Σw i x i - E Targets ) 2 But we often have hundreds or thousands of regulators … and linear regression gives them all nonzero weight! xNxN … x1x1 x2x2 w1w1 w2w2 wNwN E Targets Problem: This objective learns too many regulators parameters w1w1 w2w2 wNwN E Targets = w 1 x 1 +…+w N x N + ε = MFA1 Module GPA1 -3 x x

Lasso* (L 1 ) Regression minimize w (w 1 x 1 + … w N x N - E Targets ) 2 +  C |w i | Induces sparsity in the solution w (many w i ‘s set to zero) Provably selects “right” features when many features are irrelevant Convex optimization problem Unique global optimum Efficient optimization But, arbitrary choice among correlated regulators xNxN … x1x1 x2x2 w1w1 w2w2 wNwN E Targets parameters w1w1 w2w2 x1x1 x2x2 * Tibshirani, 1996 L2L2 L1L1

Elastic Net* Regression minimize w (w 1 x 1 + … w N x N - E Targets ) 2 +  C |w i | +  D w i 2 Induces sparsity But avoids arbitrary choices among relevant features Convex optimization problem Unique global optimum Efficient optimization algorithms Lee et al., PLOS Genetics 2009 * Zhou & Hastie, 2005 xNxN … x1x1 x2x2 w1w1 w2w2 wNwN E Targets w1w1 w2w2 x1x1 x2x2 L2L2 L1L1

Cluster genes into modules Learn a regulatory program for each module Learning Regulatory Network PHO5 PHM6 SPL2 PHO3 PHO84 VTC3 GIT1 PHO2 TEC1 GPA1 ECM18 UTH1 MEC3 MFA1 SAS5 SEC59 SGS1 PHO4 ASG7 RIM15 HAP1 PHO2 GPA1 MFA1 SAS5 PHO4 RIM15 Lee et al., PLoS Genet 2009 = MFA1 Module GPA1 -3 x x This is a Bayesian network But multiple genes share same program Dependency model is linear regression

Outline Regulatory networks for gene expression Individual genetic variation and gene regulation Effect of genotype on expression Regulatory potential Cell differentiation and gene regulation Expression changes underlying phenotype

Genotype  phenotype …ACTCGGTTGGCCTAAATTCGGCCCGG… …ACCCGGTAGGCCTTAATTCGGCCCGG… : …ACTCGGTAGGCCTATATTCGGCCGGG… Different sequences Different phenotypes ? ? Perturbations to regulatory network

Genotype  Regulation …ACTCGGTTGGCCTAAATTCGGCCCGG… …ACCCGGTAGGCCTTAATTCGGCCCGG… : …ACTCGGTAGGCCTATATTCGGCCGGG… Different sequences ? ? Perturbations to regulatory network Goals: Infer regulatory network that controls gene expression Identify mechanisms by which genetic variation affects gene expression

eQTL Data [Brem et al. (2002) Science] BY × RM : 112 progeny Markers Expression data 112 individuals 6000 genes … … …010 : … …100 Genotype data 3000 markers 112 individuals 

Expression quantitative trait loci (eQTL) mapping For each gene, find the marker that is most predictive of its expression level [Yvert G et al. (2003) Nat Gen]. Traditional Approach: Single Marker genes Genotype data Expression data markers individuals … … …010 : … …100 Gene i Marker j induced repressed … Marker mRNA Gene

LirNet Regulatory network E-regulators: Activity (expression) of regulatory genes G-regulators: Genotype of genes Measured as values of chromosomal markers Lee et al., PNAS 2006; Lee et al., PLoS Genetics 2009 PHO5 PHM6 SPL2 PHO3 PHO84 VTC3 GIT1 PHO2 TEC1 GPA1 ECM18 UTH1 MEC3 MFA1 SAS5 SEC59 SGS1 PHO4 ASG7 RIM15 HAP1 PHO2 GPA1 MFA1 SAS5 PHO4 RIM15 M1 M120 M22 M1011 M321 marker

The Telomere Module Includes Rif2 – control telomere length establishes telomeric silencing 6 coding & 8 promoter SNPs Binds to Rap1p C-terminus Enrichment for Rap1p targets (29/42; p < ) Chr XII: /42 genes in telomeres Enriched for telomere maintenance (p < ) & helicase activity (p < ) Lee et al., PNAS 2006

Some Chromatin Modules Locus containing Sir1 4 coding SNPs 4/5 consecutive genes Locus containing uncharacterized Sir1 homologue 87(!) coding SNPs 5/7 consecutive genes Known Sir1 targets Chr XI: Chr X: Lee et al., PNAS 2006

Chromatin as Mechanism 23 modules (out of 165) with “chromosomal features” 16 have “chromatin regulators” (p < ) Chromatin modification explains significant part of variation in gene expression between strains Lee et al., PNAS 2006 Mechanism I “Evolutionary strategy” to make coordinated changes in gene expression by modifying small number of hubs

The Puf3 Module segregants BYRM HAP4 TOP2 KEM1 GCN1 GCN20 DHH1 weight PUF3 expression genotype 147/153 genes (P ~ ) are pulldown targets of mRNA binding protein Puf3 P-body components PUF family: Sequence specific mRNA binding proteins (3’ UTR) Regulate degradation of mRNA and/or repress translation Lee et al., PLOS Genetics 2009

P-Bodies Dhh1 regulates mRNA decapping in p-bodies mRNAs stored in P-bodies are translationally repressed Translation mRNAs can exit P-bodies and resume translation ** Beliakova-Bethell, et al. (2006) RNA 12:94 GFP of Dhh1** RNA degradation RNA stored in P-bodies can be degraded (via decapping) * Sheth and Parker (2003) Science 300:805 cell P-body Puf3 protein ? But what regulates sequence-specific localization of mRNAs to P-bodies?

Microscopy experiment Fluorescent microscopy: Puf3 localizes to P-body Supports hypothesis of Puf3 involvement in regulating mRNA degradation by P-bodies Dhh1 Puf3 cell P-body Puf3 protein ? Dhh1 Joint work with David Drubin and Pam Silver Lee et al., PLOS Genetics 2009

What Regulates the P-bodies? A marker that covers a large region in Chr 14. Region contains ~30 genes and ~318 SNPs. Experiments for all 30 genes not feasible! ChrXIV: DHH1 BY RM GCN20 GCN1 KEM1 BLM3 Lee et al., PLOS Genetics 2009

Outline Regulatory networks for gene expression Individual genetic variation and gene regulation Effect of genotype on expression Regulatory potential Cell differentiation and gene regulation Expression changes underlying phenotype

Motivation Not all SNPs are equally likely to be causal. Gene SNPs TACGTAGGAACCTGTACCA … GGAAAATATCAAATCCAACGACGTTAGCCAATGCGATCGAATGGGAACGTA ChrXIV: 449, ,316 SNP 1: Conserved residue in a gene involved in RNA degradation SNP 2: In nonconserved intergenic region “Regulatory features” F 1. Gene region? 2. Protein coding region? 3. Nonsynonymous? 4. Create a stop codon? 5. Strong conservation? : Idea: Prioritize SNPs that have “good” regulatory features But how do we weight different features? Lee et al., PLOS Genetics 2009

Bayesian L 1 -Regularization w mk ymym Module m x mk Regulator k ~ P(y m |x;w) = N (  k w mk x mk,ε 2 ) ~ P(w) = Laplacian(0,C) higher prior variance  weight can more easily deviate from 0  regulator more likely to be selected wiwi wiwi Prior variance Lee et al., PLOS Genetics 2009

Metaprior Model (Hierarchical Bayes) ymym x mk w mk ~ Laplacian (0,C mk ) = g(ß T f mk ) : ~ P(y m |x;w) = N (  k w mk x mk,ε 2 ) C mk f mk Regulatory prior ß Module m Regulator k Module 1 Module M xNxN … x1x1 x2x2 x1x1 x2x2 E module 1 Potential regulators xNxN … x1x1 x2x2 x2x2 E module M xNxN Regulatory features Inside a gene? Protein coding region? Strong conservation? TF binds to module genes : “Regulatory potential” = ß 1 x Inside a gene? + ß 2 x Protein coding region? + ß 3 x Conserved? … YES NO : YES NO : NO : w 11 w 12 w 1N w N1 w N2 w MN Lee et al., PLOS Genetics 2009

Metaprior Method Regulatory potentials … x1x1 x2x2 xNxN EiEi … x1x1 x2x2 EiEi xNxN Module i+2 … x1x1 x2x2 EiEi xNxN Module i+1 … x1x1 x2x2 EiEi xNxN Module i Regulatory programs 1. Learn regulatory programs 3. Compute regulatory potential of SNPs Regulatory weights ß Non-synonymous Conservation AA small  large Cell cycle BY(lab) MVLT ELVQ VSDASKQLWDI RM(wild) MVLT ELVQ VSDASKQLWDI  1  0 LDLD  1  0  1 : == Regulatory features F AGAG 2. Learn ß Lee et al., PLOS Genetics 2009 Maximize P(E,ß,W|X) Empirical hierarchical Bayes Use point estimate of model parameters Learn priors from data to maximize joint posterior

Transfer Learning What do regulatory potentials do? They do not change selection of “strong” regulators – those where prediction of targets is clear They only help disambiguate between weak ones Strong regulators help teach us what to look for in other regulators Transfer of knowledge between different prediction tasks

Learned regulatory weights Human regulatory weights Location AA property change Gene function Pairwise feature Regulatory features Lee et al., PLOS Genetics 2009 Yeast regulatory weights

Statistical Evaluation # of genes with PGV > X % PGV: Percent genetic variation explained by the predicted regulatory program for each gene PGV (%) 1450 genes (Lirnet without regulatory prior) 250 genes (Brem & Kruglyak) 850 genes (Geronemo*) 1650 genes (Lirnet) * Lee et al. PNAS 2006 Lee et al., PLOS Genetics 2009

How many predicted interactions have support in other data? Deletion/ over-expression microarrays [Hughes et al. 2000; Chua et al. 2006] ChIP-chip binding experiments [Harbison et al. 2004] Transcription factor binding sites [Maclsaac et al. 2006] mRNA binding pull-down experiments [Gerber et al. 2004] Literature-curated signaling interactions Biological evaluation I %interactions %modules Reg Module Via cascade Lirnet without regulatory features % Supported interactions TF Lee et al., PLOS Genetics 2009

Biological Evaluation II Lee et al., PLOS Genetics 2009 Lirnet Zhu et al (Nature Genet 2008) Random significance of support

What Regulates the P-Bodies? Regulatory potential ChrXIV:415, ,000 Saccharomyces Genome Database (SGD) High-scoring regulatory features Conservation0.230Mass (Da)0.066 Cis-regulation0.208pK Same GO proc0.204Polar0.057 Non-syn0.158pI0.050 Translation regulation0.046 MKT1 The regulatory potential over all 318 SNPs in the region ChrXIV: Lee et al., PLOS Genetics 2009

XPG-NPbp1 InteractionXPG-I * Mkt1 Mkt1 binds to mRNAs at 3’ UTR BY has SNP at conserved residue in nuclease domain MKT1 Puf3 module P-body module BY RM mkt1  in RM Lee et al., PLOS Genetics 2009

Predicting Causal Regulators RegionZhu et al [Nat Genet 08] Lirnet (top 3 are considered) 1None SEC18RDH54SPT7 2 TBS1, TOS1, ARA1, CSH1, SUP45, CNS1, AMN1 AMN1CNS1TOS1 3None TRS20ABD1PRP5 4LEU2, ILV6, NFS1, CIT2, MATALPHA1 LEU2PGS1ILV6 5MATALPHA1 MATALPHA2RBK1 6URA3 NPP2PAC2 7GPA1 STP2GPA1NEM1 8HAP1 NEJ1GSY2 9YRF1-4, YRF1-5, YLR464W SIR3HMG2ECM7 10None ARG81 TAF13CAC2 11SAL1, TOP2 MKT1TOP2MSK1 12PHM7 ATG19BRX1 13None ADE2ORT1CAT5 Finding causal regulators for 13 “chromosomal hotspots” 8 validated regulators in 7 regions 14 validated regulators in 11 regions Lee et al., PLOS Genetics 2009

Learning Regulatory Priors Learns regulatory potentials that are specific to organism and even data set Can use any set of regulatory features: Sequence features Functional features for relevant gene Features of regulator/target(s) pairs Applicable to any organism, including ones where functional data may not be readily available Lee et al., PLOS Genetics 2009

Conclusion Framework for modeling gene regulation Use machine learning to identify regulatory program Hierarchical Bayesian techniques to capture regularities in effects of perturbations on network Uncovers diverse regulatory mechanisms Chromatin remodeling mRNA degradation

Acknowledgements National Science Foundation Nevan Krogan, UCSF Pam Silver, David Drubin, Harvard Medical School Su-In Lee University of Washington Dana Pe’er Columbia University Aimee Dudley Institute for System Biology