Learning gene regulatory networks in Arabidopsis thaliana

Slides:



Advertisements
Similar presentations
Bayesian network for gene regulatory network construction
Advertisements

Inferring Quantitative Models of Regulatory Networks From Expression Data Iftach Nachman Hebrew University Aviv Regev Harvard Nir Friedman Hebrew University.
Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.
Computational discovery of gene modules and regulatory networks Ziv Bar-Joseph et al (2003) Presented By: Dan Baluta.
CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
Mechanistic models and machine learning methods for TIMET Dirk Husmeier.
. Inferring Subnetworks from Perturbed Expression Profiles D. Pe’er A. Regev G. Elidan N. Friedman.
A Probabilistic Dynamical Model for Quantitative Inference of the Regulatory Mechanism of Transcription Guido Sanguinetti, Magnus Rattray and Neil D. Lawrence.
Introduction of Probabilistic Reasoning and Bayesian Networks
Date:2011/06/08 吳昕澧 BOA: The Bayesian Optimization Algorithm.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
CISC667, F05, Lec26, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Genetic networks and gene expression data.
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
Cs726 Modeling regulatory networks in cells using Bayesian networks Golan Yona Department of Computer Science Cornell University.
Goal: Reconstruct Cellular Networks Biocarta. Conditions Genes.
6. Gene Regulatory Networks
Artificial Intelligence Term Project #3 Kyu-Baek Hwang Biointelligence Lab School of Computer Science and Engineering Seoul National University
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Cristina Manfredotti D.I.S.Co. Università di Milano - Bicocca An Introduction to the Use of Bayesian Network to Analyze Gene Expression Data Cristina Manfredotti.
Inferring subnetworks from perturbed expression profiles Dana Pe’er, Aviv Regev, Gal Elidan and Nir Friedman Bioinformatics, Vol.17 Suppl
Modeling the Gene Expression of Saccharomyces cerevisiae Δcin5 Under Cold Shock Conditions Kevin McKay Laura Terada Department of Biology Loyola Marymount.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Gaussian Processes for Transcription Factor Protein Inference Neil D. Lawrence, Guido Sanguinetti and Magnus Rattray.
Bayesian integration of biological prior knowledge into the reconstruction of gene regulatory networks Dirk Husmeier Adriano V. Werhli.
Detecting robust time-delayed regulation in Mycobacterium tuberculosis Iti Chaturvedi and Jagath C Rajapakse INCOB 2009.
Genetic network inference: from co-expression clustering to reverse engineering Patrik D’haeseleer,Shoudan Liang and Roland Somogyi.
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
A COMPREHENSIVE GENE REGULATORY NETWORK FOR THE DIAUXIC SHIFT IN SACCHAROMYCES CEREVISIAE GEISTLINGER, L., CSABA, G., DIRMEIER, S., KÜFFNER, R., AND ZIMMER,
Reverse engineering gene regulatory networks Dirk Husmeier Adriano Werhli Marco Grzegorczyk.
Reconstructing gene networks Analysing the properties of gene networks Gene Networks Using gene expression data to reconstruct gene networks.
Reconstruction of Transcriptional Regulatory Networks
Combinatorial State Equations and Gene Regulation Jay Raol and Steven J. Cox Computational and Applied Mathematics Rice University.
Using Bayesian Networks to Analyze Whole-Genome Expression Data Nir Friedman Iftach Nachman Dana Pe’er Institute of Computer Science, The Hebrew University.
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.
Introduction to biological molecular networks
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
Dependency Networks for Inference, Collaborative filtering, and Data Visualization Heckerman et al. Microsoft Research J. of Machine Learning Research.
Identifying submodules of cellular regulatory networks Guido Sanguinetti Joint work with N.D. Lawrence and M. Rattray.
Integrative Genomics I BME 230. Probabilistic Networks Incorporate uncertainty explicitly Capture sparseness of wiring Incorporate multiple kinds of data.
Inferring Regulatory Networks from Gene Expression Data BMI/CS 776 Mark Craven April 2002.
Lecture 7: Constrained Conditional Models
EQTLs.
Journal club Jun , Zhen.
Reverse-engineering transcription control networks timothy s
Representation, Learning and Inference in Models of Cellular Networks
Multi-task learning approaches to modeling context-specific networks
Departments of Biology and Mathematics
Dynamic Bayesian Networks
System Structures Identification
Inferring Models of cis-Regulatory Modules using Information Theory
Bud Mishra Professor of Computer Science and Mathematics 12 ¦ 3 ¦ 2001
CSCI 5822 Probabilistic Models of Human and Machine Learning
Building and Analyzing Genome-Wide Gene Disruption Networks
Christopher A. Penfold Vicky Buchanan-Wollaston Katherine J. Denby And
1 Department of Engineering, 2 Department of Mathematics,
Bayesian Models in Machine Learning
A Short Tutorial on Causal Network Modeling and Discovery
1 Department of Engineering, 2 Department of Mathematics,
CISC 841 Bioinformatics (Spring 2006) Inference of Biological Networks
1 Department of Engineering, 2 Department of Mathematics,
Schedule for the Afternoon
Network Inference Chris Holmes Oxford Centre for Gene Function, &,
Predicting Gene Expression from Sequence
Computational Biology
CISC 667 Intro to Bioinformatics (Spring 2007) Genetic networks and gene expression data CISC667, S07, Lec24, Liao.
Presentation transcript:

Learning gene regulatory networks in Arabidopsis thaliana Chris Needham, Andy Bulpitt School of Computing Iain Manfield, Phil Gilmartin Institute of Integrative and Comparative Biology David Westhead Institute of Molecular and Cellular Biology

Gene Regulatory Networks GRNs govern the functional development and biological processes of cells in all organisms. GRNs are a representation that encapsulate all info about gene regulation Incorporating time, conditions, development We aim to learn transcription networks for components of Arabidopsis thaliana from gene expression microarray data.

Gene Expression Microarrays transcription translation DNA mRNA protein microarrays genes experiments

Arabidopsis thaliana Plants are important Arabidopsis is the best annotated plant (poor rel. to yeast) has excellent large uniform microarray dataset has a large genome of ~30000 genes with many large gene families – duplications has many mutants analysis often not very successful has many transcription factors (TFs) what do they do? even well characterised TFs are not fully-characterised

Arabidopsis GATA Factor genes 6.0 5.4 4.8 4.2 3.6 3.0 2.4 1.8 1.2 0.6 0.0 L D 7.0x GATA 22 8.0 7.2 6.4 5.6 4.8 4.0 3.2 2.4 1.6 0.8 0.0 L D 75.3x Night-phased Clock regulation Light Up-regulated GATA 21 6.0 5.0 4.0 3.0 2.0 1.0 0.0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 Relative expression ZT Peak: 20h GATA 22 6.0 5.0 4.0 3.0 2.0 1.0 0.0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 Relative expression ZT Peak: 20h GATA 9 4.0 3.6 3.2 2.8 2.4 2.0 1.6 1.2 0.8 0.4 0.0 L D 0.7x GATA 12 2.0 1.8 1.6 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 L D 0.5x Day-phased Clock regulation GATA 9 3.0 2.5 2.0 1.5 1.0 0.5 0.0 Relative expression ZT 4 8 12 16 20 24 28 32 36 40 44 48 52 56 Peak: 28h GATA 12 3.0 2.5 2.0 1.5 1.0 0.5 0.0 Relative expression ZT 4 8 12 16 20 24 28 32 36 40 44 48 52 56 Peak: 28h GATA 2 6.0 5.4 4.8 4.2 3.6 3.0 2.4 1.8 1.2 0.6 0.0 L D 0.2x GATA 4 6.0 5.4 4.8 4.2 3.6 3.0 2.4 1.8 1.2 0.6 0.0 L D 0.3x Light Down-regulated Inconsistent Clock regulation of GATA2 and GATA4 between experiments

What can we get from the existing data? Biological approach The experimental biological work involved to discover regulatory networks is hard & expensive: mutants in TFs microarray experiments time course experiments How do poorly-characterised genes fit into well-characterised networks? such as: Light up-regulation, Light down-regulation, Clock, Abiotic stress What can we get from the existing data?

Informatics approaches Ordinary Differential Equations – Dynamical Systems Boolean networks – Logical relations between genes Bayesian networks – Modelling a stochastic system Friedman, Inferring cellular networks using probabilistic graphical models. Science 303(6). 2004. Review article. Imoto et al. Combining microarrays and biological knowledge for estimating gene networks via Bayesian networks. CSB 2003. Incorporate prior knowledge from protein-protein interactions, protein-DNA interactions, gene networks and literature. Analysis of Saccharomyces cerevisiae gene expression data newly obtained by disrupting 100 genes, mainly transcription factors. Sachs et al. Causal protein signalling networks derived from multi-parameter single-cell data. Science 308(5721) 2005.

Meaningful gene regulatory networks can be learned from microarray data without interventions but using large datasets publicly available start to design before extra data collection

Data: Arabidopsis thaliana 2466 Microarrays (NASC) ~25,000 genes Filtering Genes with low entropy are removed. Can select a subset of genes to consider Quantisation Expression signal values discretised into 2 or 3 classes. Boundaries chosen to create classes with equal probability masses. 825 819 822 GATA2 = AT2G45050 21.9 48.6

Bayesian networks x1 x3 x2 x4 x5 x6 x7 BNs are a framework for explaining causal relationships consisting of a set of variables connected by a set of directed edges Probability calculus is used to describe the probabilistic relationship of each variable with its parents The joint probability distribution over all the variables can be written as a product of conditional probability distributions: p(x1,…xn) = p(xi|pai) where pai are the parents of xi p(x1,…,x7) = p(x1)p(x2)p(x3)p(x4|x1,x2,x3)p(x5|x1,x3)p(x6|x4)p(x7|x4,x5)

Conditional Probability Distributions GATA2 GATA4 PIF3 p(xi|pai) Conditional probability tables for GATA4 Marginal probabilities for GATA4

Structure Learning x1 x3 x2 x4 x5 x6 x7 Aim is to find the model (network structure) that has the maximum likelihood for a given set of genes (nodes) For a given set of genes, likelihood L = P(D|S,θS) is the probability of the data D being generated by the model x7 x4 x5 x1 x6 x2 x3 To search for a good model structure, a greedy learning algorithm is used. From an initial network, edges are added, reversed or deleted until an optimum is reached. Learned structure S = arg maxS {ln p(D|θS,S) – ½ d ln N} The BIC score has a measure of how well the model fits the data, and a penalty term to penalise model complexity. θS is an estimate of the model parameters for the structure S, d is the number of model parameters, and N is the size of the dataset.

Conditional Independence The different structures encode the conditional independences between the genes. Causality: the directionality of the arrows can be determined when they lead into a v-structure – the gene at the v depends on all of its parents. Otherwise, the direction of the causal relation between genes cannot be discovered from data alone. Interventions can be used. i.e. test using mutants in the respective genes to see which gene is mis-regulated in which mutant. (transcript levels)

Method An initial set of key genes of interest is chosen and a network structure inferred e.g. Circadian clock regulated To this model a number of genes may be added. Genes are added separately… . . . Either all genes, or a selection The structure learning algorithm is applied to each set of genes, finding the GRN which is most likely to have generated the data . . . The best network structure is chosen, and the gene is added to the model

Results Meaningful gene regulatory networks can be learned from microarray data without interventions but using large datasets publicly available start to design before extra data collection Components of “central oscillator” or close to it Cold/salt-stress response genes Down-regulated by light

Predictive models Figure 2. Given information about the state of a gene’s expression level (or set of genes), the marginal probability of any other gene (or set of genes) being in a particular state may be calculated. Fixing of the value of a gene (in this case through growing a specific mutant) allows predictions about the likely values of other genes to be made and tested experimentally to verify the predictive model of the GRN. This figure shows the change in marginal likelihood of each gene (y-axis) in Figure 1 when one other gene’s value is fixed (x-axis), based on real data, and the learned network in Figure 1. Dark values show greatest expected change in expression levels, whereas white values show little observable change. Figure 1. Bayesian network of the transcription network for forty genes identified in light/clock regulation of selected GATAs from the literature.

Future Computation New structure learning algorithms Strength of connections Selecting relevant experiments Effect of discretisation Sensitivity to noise

Future Biology We wish to learn GRNs in order to form hypotheses about possible roles of a gene and likely redundant genes. Main aim is to reduce the number of related genes to be screened for experimental verification of findings. Look for mis-regulation of genes predicted to be downstream of e.g. well characterised regulators. Make mutants of poorly characterised genes and look for mis-regulation of gene expression or other phenotype. Carry these predictions from this model organism to a crop plant, e.g. rice, where many of the regulatory components are conserved.