Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.

Slides:

Advertisements

Similar presentations

Periodic clusters. Non periodic clusters That was only the beginning…

Advertisements

A Genomic Code for Nucleosome Positioning Authors: Segal E., Fondufe-Mittendorfe Y., Chen L., Thastrom A., Field Y., Moore I. K., Wang J.-P. Z., Widom.

Computational discovery of gene modules and regulatory networks Ziv Bar-Joseph et al (2003) Presented By: Dan Baluta.

Computational detection of cis-regulatory modules Stein Aerts, Peter Van Loo, Ger Thijs, Yves Moreau and Bart De Moor Katholieke Universiteit Leuven, Belgium.

Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha.

CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.

Global Mapping of the Yeast Genetic Interaction Network Tong et. al, Science, Feb 2004 Presented by Bowen Cui.

Bioinformatics Motif Detection Revised 27/10/06. Overview Introduction Multiple Alignments Multiple alignment based on HMM Motif Finding –Motif representation.

Identification of Transcriptional Regulatory Elements in Chemosensory Receptor Genes by Probabilistic Segmentation Steven A. McCarroll, Hao Li Cornelia.

Bi-correlation clustering algorithm for determining a set of co- regulated genes BIOINFORMATICS vol. 25 no Anindya Bhattacharya and Rajat K. De.

Genomic analysis of regulatory network dynamics reveals large topological changes Paper Study Speaker: Cai Chunhui Sep 21, 2004.

Identification of a Novel cis-Regulatory Element Involved in the Heat Shock Response in Caenorhabditis elegans Using Microarray Gene Expression and Computational.

Statistical methods for identifying yeast cell cycle transcription factors Speaker: Chun-hui Cai.

Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.

Functional annotation and network reconstruction through cross-platform integration of microarray data X. J. Zhou et al

Integrated analysis of regulatory and metabolic networks reveals novel regulatory mechanisms in Saccharomyces cerevisiae Speaker: Zhu YANG 6 th step, 2006.

Indiana University Bloomington, IN Junguk Hur Computational Omics Lab School of Informatics Differential location analysis A novel approach to detecting.

Predicting protein functions from redundancies in large-scale protein interaction networks Speaker: Chun-hui CAI

Similar Sequence Similar Function Charles Yan Spring 2006.

The Hardwiring of development: organization and function of genomic regulatory systems Maria I. Arnone and Eric H. Davidson.

1 Predicting Gene Expression from Sequence Michael A. Beer and Saeed Tavazoie Cell 117, (16 April 2004)

Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002.

Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.

Bryan Heck Tong Ihn Lee et al Transcriptional Regulatory Networks in Saccharomyces cerevisiae.

Combinatorics of promoter regulatory elements determines gene expression profiles Yitzhak (Tzachi) Pilpel Priya Sudarsanam George Church DJ Club, Feb.

Cis-regultory module 10/24/07. TFs often work synergistically (Harbison 2004)

Comparative Expression Moran Yassour +=. Goal Build a multi-species gene-coexpression network Find functions of unknown genes Discover how the genes.

A Statistical Method for Finding Transcriptional Factor Binding Sites Authors: Saurabh Sinha and Martin Tompa Presenter: Christopher Schlosberg CS598ss.

Inferring Cellular Networks Using Probabilistic Graphical Models Jianlin Cheng, PhD University of Missouri 2009.

A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.

Detecting binding sites for transcription factors by correlating sequence data with expression. Erik Aurell Adam Ameur Jakub Orzechowski Westholm in collaboration.

Kristen Horstmann, Tessa Morris, and Lucia Ramirez Loyola Marymount University March 24, 2015 BIOL398-04: Biomathematical Modeling Lee, T. I., Rinaldi,

* only 17% of SNPs implicated in freshwater adaptation map to coding sequences Many, many mapping studies find prevalent noncoding QTLs.

Finish up array applications Move on to proteomics Protein microarrays.

Combinatorial State Equations and Gene Regulation Jay Raol and Steven J. Cox Computational and Applied Mathematics Rice University.

Proteome and interactome Bioinformatics.

Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.

Analysis of the yeast transcriptional regulatory network.

Supplementary Figure S1 eQTL prior model modified from previous approaches to Bayesian gene regulatory network modeling. Detailed description is provided.

CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.

Searching for structured motifs in the upstream regions of hsp70 genes in Tetrahymena termophila. Roberto Marangoni^, Antonietta La Terza*, Nadia Pisanti^,

Identification of cell cycle-related regulatory motifs using a kernel canonical correlation analysis Presented by Rhee, Je-Keun Graduate Program in Bioinformatics.

Starting Monday M Oct 29 –Back to BLAST and Orthology (readings posted) will focus on the BLAST algorithm, different types and applications of BLAST; in.

Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.

IMPROVED RECONSTRUCTION OF IN SILICO GENE REGULATORY NETWORKS BY INTEGRATING KNOCKOUT AND PERTURBATION DATA Yip, K. Y., Alexander, R. P., Yan, K. K., &

Introduction to biological molecular networks

. Finding Motifs in Promoter Regions Libi Hertzberg Or Zuk.

Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,

Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.

Network Analysis Goal: to turn a list of genes/proteins/metabolites into a network to capture insights about the biological system 1.Types of high-throughput.

Network Motifs See some examples of motifs and their functionality Discuss a study that showed how a miRNA also can be integrated into motifs Today’s plan.

Quest for epigenetic determinants of local coexpression clusters Wieslawa Mentzen Labrador and Corces, 2002.

Yiming Kang, Hien-haw Liow, Ezekiel Maier, & Michael Brent

Biological networks CS 5263 Bioinformatics.

From: DynaMIT: the dynamic motif integration toolkit

Bioinformatics tools to identify structured motifs in the upstream regions of stress-response-involved genes in Tetrahymena thermophila Antonietta La Terza*,

De novo Motif Finding using ChIP-Seq

1 Department of Engineering, 2 Department of Mathematics,

Albert Xue, Binbin Huang, Jianrong Wang

1 Department of Engineering, 2 Department of Mathematics,

1 Department of Engineering, 2 Department of Mathematics,

Revealing Global Regulatory Perturbations across Human Cancers

Mapping Global Histone Acetylation Patterns to Gene Expression

Fine-Resolution Mapping of TF Binding and Chromatin Interactions

Revealing Global Regulatory Perturbations across Human Cancers

Fine-Resolution Mapping of TF Binding and Chromatin Interactions

Volume 132, Issue 6, Pages (March 2008)

Predicting Gene Expression from Sequence

Volume 122, Issue 6, Pages (September 2005)

Genetic and Epigenetic Regulation of Human lincRNA Gene Expression

Presentation transcript:

Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai

Review – data sources  Numerous technological advances make possible the understanding of combinatorial transcriptional control.  Complete genome sequencing provides the DNA information of promoter regions;  Large-scale expression profiling provides the global perspective of gene expression;  Yeast two hybrid experiments provide interaction between proteins;  Chromatin immunoprecipitation (ChIP) provides the protein-DNA interactions.

Review – literature  Previously, various in silico approaches have been used to study combinatorial gene regulation. The main focus has been focused on examining the relationship between gene expression profiles and transcriptional control.  Synergistic relationships between TFs are inferred when their common target genes show highly correlated expression patterns;  Non-linear models and Bayesian approaches have also been used to identify the relationship between gene expression and interacting motifs;  In another approach, cooperative TFs are predicted by using the information from protein-protein interaction networks, based on the hypothesis that proteins that are close to each other in the interaction networks are more likely to be co- regulated by the same set of TFs

Idea in this paper  The authors propose a new method for identification of interactions between transcription factors (TFs) that relies on the relationship of their binding sites in the upstream promoters. The algorithm is sequence-based and it can be applied to genes without expression data or previously determined binding motifs  By taking groups of genes whose upstream sequences are known to be bound by two TFs (ChIP-on-chip data and literature evidences), the authors made predictions of their corresponding TF binding sites and examined the relationship between these two sites on the promoter sequences. The sequence relationships between the binding motifs were examined in terms of preferences in distance and orientation, reflecting possible spatial relationships between TFs. The authors further analyzed these predicted relationships using gene expression data and found that they dynamic and condition-dependent.

Detecting interacting motif pairs  Interacting motif pairs are defined as those that have  over-represented co-occurrence in the input promoters;  the distances (in the unit of bp) between the two motifs are significantly different from the random expectation.  Motif-PIE, a C++ program, was developed to identify interacting TF binding motif pairs.  The program first calculates the most over-represented single motifs (5-7mers) in the input promoter sequences. It then enumerates all possible pair combinations between the top 10 motifs. These motif pairs are then evaluated the significance – P value. If the most significant motif pair has a lower P-value than a threshold, the author predict that their binding TFs interact with each other.  The P-value of a motif pair should reflect two contributions – one is from motif pair co-occurrence and the other is from the distance constraint. P = P occ P d

 n – the number of the input promoters; N – the total number of yeast promoters; g – the occurrence of the motif pair in the input promoters; G – the overall occurrence of the motif pair in the promoters of entire yeast genome.  The equation is used to obtain the chance probability of observing the motif pair g or more times in the n input promoters.

 The contribution of the distance constraint between two motifs in promoter sequences, P d, is calculated by comparing the observed distance distribution with a background distribution using KS test.  The background distribution is considered to be from motifs that do not interact with each other. The chance of observing distance d at promoter length L can be normalized as  L – length of one promoter sequence; d – motif pair distance; wf, wd – the widths of two motifs; L-wf-wd-d+1 – the number of all possible arrangements for the motif pair.  Given the length distribution F(L) of promoter sequences in yeast, the random distribution of the motif distances d is P d is then calculated by comparing the observed distribution and f d.

Detecting interacting motif pairs  An initial set of known target genes of 152 TFs are collected (ChIP data and literature evidences). There are TF pair combinations.  From possible TF pairs, Motif- PIE found that 300 have significant interactions (the most significant P- values are smaller than a certain threshold), involving 77 TFs.  Homotypic TF-TF interactions are also examined (interaction of a single TF with itself), and 45% (69/152) of the TFs were predicted to have homotypic interactions.

 The average similarities between predicted and known motif sequences for the heterotypic and homotypic TF pairs are and respectively.

Distance constraint for interacting motif pairs  Interacting TF pairs may satisfy certain spatial requirements to have a functional interaction – their corresponding binding motifs may demonstrate characteristic distance relationships in promoter sequences.  The authors define a threshold as - log(p d )=4.93. According to this threshold, 154 of 300 detected motif pairs have one or more characteristic distances.  75% of the characteristic distances are smaller than 166 bp. The finding that some pairs have large characteristic distances may reflect secondary structure DNA looping or indirect interaction through complex formation.  The degree of co-expression of gene groups targeted by a motif pair with a characteristic distance is significantly higher than that by a motif pair without characteristic distances.

Orientation constraint for interacting motif pairs  Orientation was defined as the relative directions of a motif pair on the genome sequence: one divergent (opposite directions away), one convergent (opposite direction towards) and two tandem orientations.  P o – the fraction of the most dominant orientation. A large Po indicates strong orientation preference.  Motif pairs without characteristic distances, the distribution is only slightly shifted toward larger values as compared with a random distribution.  If only the target genes with characteristic distances are considered, the orientation preference is dramatically increased.  Further analysis of the 154 motif pairs with characteristic distances revealed that 35.1% have more than one characteristic distances. The author found that different distances often correspond to different preferred orientations. These multiple characteristic distances may reflect various interaction configurations.

Dynamic effect of TF pair interactions on target genes  For each of the TF pairs, the authors performed a whole- genomic search in the upstream promoter regions of the genes, and the target genes were defined as those genes whose promoter sequences contain the corresponding motif pair.  If significant co-expression of their target genes under one condition, the TF pair is active under this condition, otherwise the TF pair is likely to be inactive. (The combined database of expression data that was used consisted of 82 experiments and six conditions. Those conditions were cell cycle, elutriation, heat shock, DNA damage, sporulation and drug treatment.)  Some TF pairs (16%) show effect under all conditions, but most pairs (40%) are active only under certain specific conditions. Under different conditions, some TFs interact with different partners. The functions of a TF and its activity may be better described in relation to its interacting partners than to one factor alone.  The dynamics of TFs is linked to genes essentiality. TF pairs involved in more conditions were more likely to include essential genes which render cells non-viable if they are knocked out. The fraction of essential TFs shows an almost monotomic increase with number of active conditions. TFs active under more conditions are more likely to be essential than those that are only active under specific conditions.

Discussion  The authors presented a novel sequence-based algorithm to identify transcription factor interactions. This algorithm can be used more generally and in instances where genes expression experiments are not available. 369 significant interactions are predicted, including both homotypic and heterotypic interactions.  Short distances between binding sites are preferred over long distances. TFs do not seem to interact arbitrarily at any distance, but to have specific preference – characteristic distances and particular orientations.  The same TF pair may behave differently under different conditions, with different characteristic distances and different orientations.  The algorithm certainly has some limitations:  two overlapping binding sites – two TFs compete for the same binding sites;  short inter-motif distance may be attributed to dimeric TFs, which means the two independent binding sites might be two segments of one TF binding sequences.  Future prediction – super motif which include not only the sequence of the binding sites, but also multiple sites with set distances between them with a particular orientation. This capacity can greatly increase prediction specificity for genomic scans.