FUNCTIONAL ANNOTATION OF REGULATORY PATHWAYS

Slides:



Advertisements
Similar presentations
An Intro To Systems Biology: Design Principles of Biological Circuits Uri Alon Presented by: Sharon Harel.
Advertisements

Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
PREDetector : Prokaryotic Regulatory Element Detector Samuel Hiard 1, Sébastien Rigali 2, Séverine Colson 2, Raphaël Marée 1 and Louis Wehenkel 1 1 Bioinformatics.
The multi-layered organization of information in living systems
CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
D ISCOVERING REGULATORY AND SIGNALLING CIRCUITS IN MOLECULAR INTERACTION NETWORK Ideker Bioinformatics 2002 Presented by: Omrit Zemach April Seminar.
Global Mapping of the Yeast Genetic Interaction Network Tong et. al, Science, Feb 2004 Presented by Bowen Cui.
CAVEAT 1 MICROARRAY EXPERIMENTS ARE EXPENSIVE AND COMPLICATED. MICROARRAY EXPERIMENTS ARE THE STARTING POINT FOR RESEARCH. MICROARRAY EXPERIMENTS CANNOT.
Gene Ontology John Pinney
University at BuffaloThe State University of New York Young-Rae Cho Department of Computer Science and Engineering State University of New York at Buffalo.
Clustered alignments of gene- expression time series data Adam A. Smith, Aaron Vollrath, Cristopher A. Bradfield and Mark Craven Department of Biosatatistics.
Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
Regulatory networks 10/29/07. Definition of a module Module here has broader meanings than before. A functional module is a discrete entity whose function.
Pattern Discovery in RNA Secondary Structure Using Affix Trees (when computer scientists meet real molecules) Giulio Pavesi& Giancarlo Mauri Dept. of Computer.
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
Modular Organization of Protein Interaction Network Feng Luo, Ph.D. Department of Computer Science Clemson University.
Gene Regulatory Networks - the Boolean Approach Andrey Zhdanov Based on the papers by Tatsuya Akutsu et al and others.
Graph, Search Algorithms Ka-Lok Ng Department of Bioinformatics Asia University.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Computational Molecular Biology Biochem 218 – BioMedical Informatics Gene Regulatory.
Inferring Cellular Networks Using Probabilistic Graphical Models Jianlin Cheng, PhD University of Missouri 2009.
Genetic network inference: from co-expression clustering to reverse engineering Patrik D’haeseleer,Shoudan Liang and Roland Somogyi.
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
Small RNAs and their regulatory roles. Presented by: Chirag Nepal.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Monday, November 8, 2:30:07 PM  Ontology is the philosophical study of the nature of being, existence or reality as such, as well as the basic categories.
Abstract ODE System Model of GRNs Summary Evolving Small GRNs with a Top-Down Approach Javier Garcia-Bernardo* and Margaret J. Eppstein Department of Computer.
Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.
Epigenetic Modifications in Crassostrea gigas Claire H. Ellis and Steven B. Roberts School of Aquatic and Fishery Sciences, University of Washington, Seattle,
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
Introduction to biological molecular networks
DNAmRNAProtein Small molecules Environment Regulatory RNA How a cell is wired The dynamics of such interactions emerge as cellular processes and functions.
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
Biological Networks. Can a biologist fix a radio? Lazebnik, Cancer Cell, 2002.
1Computer Sciences Department. 2 Advanced Design and Analysis Techniques TUTORIAL 7.
Chapter 1 Principles of Life
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
Chapter 1 Principles of Life. All organisms Are composed of a common set of chemical components. Genetic information that uses a nearly universal code.
Network Motifs See some examples of motifs and their functionality Discuss a study that showed how a miRNA also can be integrated into motifs Today’s plan.
Algorithms and Computational Biology Lab, Department of Computer Science and & Information Engineering, National Taiwan University, Taiwan Network Biology.
Identifying submodules of cellular regulatory networks Guido Sanguinetti Joint work with N.D. Lawrence and M. Rattray.
Inferring Regulatory Networks from Gene Expression Data BMI/CS 776 Mark Craven April 2002.
The Transcriptional Landscape of the Mammalian Genome
CSCI2950-C Lecture 12 Networks
1. SELECTION OF THE KEY GENE SET 2. BIOLOGICAL NETWORK SELECTION
GO : the Gene Ontology & Functional enrichment analysis
Biological networks CS 5263 Bioinformatics.
Statistical Testing with Genes
Inferring Models of cis-Regulatory Modules using Information Theory
Control of Gene Expression
Ingenuity Knowledge Base
Molecular Mechanisms of Gene Regulation
1 Department of Engineering, 2 Department of Mathematics,
Functional Coherence in Domain Interaction Networks
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
Quantitative Genetic Interactions Reveal Biological Modularity
CSCI2950-C Lecture 13 Network Motifs; Network Integration
Schedule for the Afternoon
Unit III Information Essential to Life Processes
Translation of Genotype to Phenotype by a Hierarchy of Cell Subsystems
SEG5010 Presentation Zhou Lanjun.
Anastasia Baryshnikova  Cell Systems 
Bioinformatics, Vol.17 Suppl.1 (ISMB 2001) Weekly Lab. Seminar
Statistical Testing with Genes
Presentation transcript:

FUNCTIONAL ANNOTATION OF REGULATORY PATHWAYS Jayesh Pandey, Mehmet Koyuturk, Wojciech Szpankowski, and Ananth Grama. PURDUE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE Supported by the National Institutes of Health Hello and welcome. I will talk about my thesis work today, which is about comparative analysis of molecular interaction networks.

GENE REGULATION Gene expression is the process of synthesizing a functional protein coded by the corresponding gene Genes (and their products) regulate the extent of each other’s expression Any step of gene expression can be modulated Transcription, translation, post-transcriptional modification, RNA transport, mRNA degradation… I will start with a brief overview of molecular interaction networks. I will talk about where the data comes from, how it is modeled, and present some biological observations that motivate comparative network analysis. Then, I will talk about how we address various algorithmic and analytical problems on interaction networks. Finally, I will briefly discuss problems I am currently working on and what I am planning explore in the near future. Ligand independent transcriptional regulation at chromatin level

GENE REGULATORY NETWORKS Model the organization of regulatory interactions in the cell Genes are nodes, regulatory interactions are directed edges Boolean network model: Edges are signed, indicating up- (promotion) and down-regulation (supression) Gene Up-regulation Down-regulation Flowering time in Arabidopsis

MOLECULAR ANNOTATION Similar systems involving different molecules (genes, proteins) in different species Functional annotation of genes provides an unified understanding of the underlying principles Molecular function: What is the role of a gene? Biological process: In which processes is a gene involved? Cellular component: Where is a gene’s product localized? Gene Ontology provides a library of molecular annotation We refer to each annotation class as a functional attribute

FROM MOLECULES TO SYSTEMS Networks are species-specific Annotation is at the molecular level Map networks from gene space to function space Can generate a library of annotated “modular (sub-) networks” Network of Gene Ontology terms based on significance of pairwise interactions in yeast synthetic gene array (SGA) network (Tong et al., Science, 2004)

INDIRECT REGULATION Assessment of pairwise interactions is simple, but not adequate g1 g3 g5 g1 g3 g5 g2 g4 g4 g2 g4 g4

FUNCTIONAL ATTRIBUTE NETWORKS Multigraph model A gene is associated with multiple functional attributes A functional attribute is associated with multiple genes Functional attributes are represented by nodes Genes are represented by ports, reflecting context g1 g2 g3 g4 g5 g6 Gene network Functional attribute network

FREQUENCY OF A MULTIPATH A pathway of functional attributes occurs in various contexts in the gene network Multipath in the functional attribute network Frequency of multipath is 4 on the left, it is 0 on the right

SIGNIFICANCE OF A PATHWAY We want to identify multipaths with unusual frequency These might correspond to modular pathways Frequency alone is not a good measure of statistical significance The distribution of functional attributes among genes is highly skewed The degree distribution in the gene network is highly skewed Pathways that contain common functional attributes have high frequency, but they are not necessarily interesting

STATISTICAL INTERPRETABILITY Additional positive observation => increased significance Additional negative observation => decreased significance B B’ A A P(B) < P(A) P(B’) > P(A) Frequency is not statistically interpretable!

MONOTONICITY Frequency is a monotonic measure If a pathway is frequent, then all of its sub-paths are frequent Algorithmic advantage: enumerate all frequent patterns in a bottom-up fashion Commonly exploited in traditional data mining applications Statistically interpretable measures are not monotonic! Statistical significance fluctuates in the search space Existing data mining algorithms do not apply Significance of pathways are non-monotonic in two dimensions

GO HIERARCHY P( ) < P( ) < P( ) Functional attributes are organized in a hierarchical manner “regulation of steroid biosynthetic process” is a “regulation of steroid metabolic process” and is part of “steroid biosynthetic process” Interpretable statistical measures are not monotonic with respect to GO hierarchy P( ) < g1 g5 g3 P( ) < g2 g4 P( )

PATHWAY LENGTH P( ) > P( ) P( ) < P( ) Open problems How can we effectively search in the pathway space, where significance fluctuates? How can we find optimal resolution in functional attribute space?

STATISTICAL MODEL π123: Emphasize modularity of pathways Condition on frequency of building blocks! We denote each frequency random variable by N, their realization by n Significance of pathway π123: p123 = P (N123 ≥ n123|N12=n12, N23=n23, N1=n1, N2=n2, N3=n3) π123: N1 N2 N3 N12 N23 N123

SIGNIFICANCE OF A PATHWAY Assume that regulatory interactions are independent There are n12 n23 occurrences of π 12 and π 23 The probability that these go through the same gene is 1/n2 The probability that at least n123 of the n12n23 pairs of edges go through the same gene can be bounded by p123≤ exp(n12n23Hq(t)) where q = 1/n2 and t = n123 / n12n23 Hq(t) = t log(q/t) +(1-t) log((1-q)/(1-t)) is the weighted entropy of t with respect to q Can be generalized to pathways of arbitrary length

SIGNIFICANCE OF AN EDGE A single regulatory interaction is the shortest pathway Statistical significance is evaluated with respect to baseline model The number of edges leaving and entering each functional attribute is specified Edges are assumed to be independent The frequency of a regulatory interaction is a hypergeometric random variable Can derive a similar bound for the p-value of a single regulatory interaction

ALGORITHMIC ISSUES Significance is not monotonic Need to enumerate all pathways? Strongly significant pathways A pathway is strongly significant if all its building blocks are significant (defined recursively) Allows pruning out the search space effectively Shortcutting common functional attributes Transcription factors, DNA binding genes, etc. are responsible for mediating regulation Shortcut these terms, consider regulatory effect of different processes on each other directly

NARADA http://www.cs.purdue.edu/homes/jpandey/narada/ A software for identification of significant pathways Queries Given functional attribute T, find all significant pathways that originate at T Given functional attribute T, find all significant pathways that terminate at T Given a sequence of functional attributes T1, T2, …, Tk, find all occurrences of the corresponding pathway Identified pathways are displayed as a tree User can explore back and forth between the gene network and the functional attribute network

RESULTS E. coli transcription network obtained from RegulonDB 3159 regulatory interactions between 1364 genes Using Gene Ontology, 881 of these genes are mapped to 318 processes Pathway length 2 3 4 5 All 427 580 1401 942 Strongly significant 208 183 142 Common terms shortcut 184 119 1

MOLYBDATE ION TRANSPORT Significant regulatory pathways that originate at molybdate ion transport Their occurrences in the gene network

WHAT IS SIGNIFICANT? Molybdate ion transport regulates various processes directly Mo-molybdopterin cofactor biosynthesis, oligopeptide transport, cytochrome complex assembly It regulates various other processes indirectly Through DNA-dependent regulation of transcription, two-component signal transduction system, nitrate assimilation Direct regulation of these mediator processes is not significant NARADA captures modularity of indirect regulation!

CONCLUSION Mapping gene regulatory networks to functional attribute space demonstrates great potential Abstract, unified understanding of regulatory systems Algorithmically, a wide range of new challenges How can we bound interpretable statistical measures? How can we handle hierarchy in functional attribute space? Discovering new information How can we project identified “canonical” patterns on other species to discover new regulatory relationships?