Cis-regultory module 10/24/07. TFs often work synergistically (Harbison 2004)

Slides:



Advertisements
Similar presentations
Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome ECS289A.
Advertisements

Gene Regulation and Microarrays. Finding Regulatory Motifs Given a collection of genes with common expression, Find the TF-binding motif in common......
. Markov Chains. 2 Dependencies along the genome In previous classes we assumed every letter in a sequence is sampled randomly from some distribution.
Periodic clusters. Non periodic clusters That was only the beginning…
Computational discovery of gene modules and regulatory networks Ziv Bar-Joseph et al (2003) Presented By: Dan Baluta.
Predicting Enhancers in Co-Expressed Genes Harshit Maheshwari Prabhat Pandey.
Computational detection of cis-regulatory modules Stein Aerts, Peter Van Loo, Ger Thijs, Yves Moreau and Bart De Moor Katholieke Universiteit Leuven, Belgium.
Promoter and Module Analysis Statistics for Systems Biology.
Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha.
Finding regulatory modules from local alignment - Department of Computer Science & Helsinki Institute of Information Technology HIIT University of Helsinki.
Finding Transcription Factor Binding Sites BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG.
From Sequence to Expression: A Probabilistic Framework Eran Segal (Stanford) Joint work with: Yoseph Barash (Hebrew U.) Itamar Simon (Whitehead Inst.)
Regulatory Motifs. Contents Biology of regulatory motifs Experimental discovery Computational discovery PSSM MEME.
Bioinformatics Finding signals and motifs in DNA and proteins Expectation Maximization Algorithm MEME The Gibbs sampler Lecture 10.
Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Identification of a Novel cis-Regulatory Element Involved in the Heat Shock Response in Caenorhabditis elegans Using Microarray Gene Expression and Computational.
Statistical methods for identifying yeast cell cycle transcription factors Speaker: Chun-hui Cai.
Transcription factor binding motifs (part I) 10/17/07.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
MOPAC: Motif-finding by Preprocessing and Agglomerative Clustering from Microarrays Thomas R. Ioerger 1 Ganesh Rajagopalan 1 Debby Siegele 2 1 Department.
The Model To model the complex distribution of the data we used the Gaussian Mixture Model (GMM) with a countable infinite number of Gaussian components.
An analysis of “Alignments anchored on genomic landmarks can aid in the identification of regulatory elements” by Kannan Tharakaraman et al. Sarah Aerni.
Biological Sequence Pattern Analysis Liangjiang (LJ) Wang March 8, 2005 PLPTH 890 Introduction to Genomic Bioinformatics Lecture 16.
1 Predicting Gene Expression from Sequence Michael A. Beer and Saeed Tavazoie Cell 117, (16 April 2004)
Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002.
Motif finding : Lecture 2 CS 498 CXZ. Recap Problem 1: Given a motif, finding its instances Problem 2: Finding motif ab initio. –Paradigm: look for over-represented.
Motif finding: Lecture 1 CS 498 CXZ. From DNA to Protein: In words 1.DNA = nucleotide sequence Alphabet size = 4 (A,C,G,T) 2.DNA  mRNA (single stranded)
Inferring Cellular Networks Using Probabilistic Graphical Models Jianlin Cheng, PhD University of Missouri 2009.
Cis-regulatory element study in transcriptome Jin Chen CSE Fall
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.
Marcin Pacholczyk, Silesian University of Technology.
Motif discovery EM algorithm Gibbs Sampler Enumeration Regression methods Phylogenetic trees Purpose Construction Finding significance Not directly related.
* only 17% of SNPs implicated in freshwater adaptation map to coding sequences Many, many mapping studies find prevalent noncoding QTLs.
“PREDICTIVE MODELING” CoSBBI, July Jennifer Hu.
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
Combinatorial State Equations and Gene Regulation Jay Raol and Steven J. Cox Computational and Applied Mathematics Rice University.
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Motifs BCH364C/391L Systems Biology / Bioinformatics – Spring 2015 Edward Marcotte, Univ of Texas at Austin Edward Marcotte/Univ. of Texas/BCH364C-391L/Spring.
Computational Genomics and Proteomics Lecture 8 Motif Discovery C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
Cis-regulatory Modules and Module Discovery
Multiple Species Gene Finding using Gibbs Sampling Sourav Chatterji Lior Pachter University of California, Berkeley.
Cluster validation Integration ICES Bioinformatics.
Flat clustering approaches
Local Multiple Sequence Alignment Sequence Motifs
Learning Sequence Motifs Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 Mark Craven
. Finding Motifs in Promoter Regions Libi Hertzberg Or Zuk.
Module Networks BMI/CS 576 Mark Craven December 2007.
Motif Search and RNA Structure Prediction Lesson 9.
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
Special Topics in Genomics Motif Analysis. Sequence motif – a pattern of nucleotide or amino acid sequences GTATGTACTTACTATGGGTGGTCAACAAATCTATGTATGA TAACATGTGACTCCTATAACCTCTTTGGGTGGTACATGAA.
Pattern Discovery and Recognition for Understanding Genetic Regulation Timothy L. Bailey Institute for Molecular Bioscience University of Queensland.
Transcription factor binding motifs (part II) 10/22/07.
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
Network Motifs See some examples of motifs and their functionality Discuss a study that showed how a miRNA also can be integrated into motifs Today’s plan.
1 Discovery of Conserved Sequence Patterns Using a Stochastic Dictionary Model Authors Mayetri Gupta & Jun S. Liu Presented by Ellen Bishop 12/09/2003.
Logistic Regression: To classify gene pairs
A Very Basic Gibbs Sampler for Motif Detection
Learning Sequence Motif Models Using Expectation Maximization (EM)
De novo Motif Finding using ChIP-Seq
A Zero-Knowledge Based Introduction to Biology
(Regulatory-) Motif Finding
Finding regulatory modules
Presentation transcript:

Cis-regultory module 10/24/07

TFs often work synergistically (Harbison 2004)

Combinatorial control

lysogenicgrowth lyticgrowth (source: Gary Kaiser) -phase E coli

OROR cI cro -operon

OROR cI cro -operon onoff lysogenic growth

OROR cI cro -operon offon lytic growth O R1 O R2 O R3

cro -operon cI Pol II lysogenic cro cI Pol II lytic

Cis-regulatory module (CRM) “A CRM is a DNA segment, typically a few hundred base pairs in length containing multiple binding sites, that recruits several cooperating factors to a particular genomic location” –Ji and Wong (2006)

Statistical Methods Predict modules when the motifs are known. (simpler) –LRA, by Wasserman and Fickett (1998) Predict modules when the motifs also need to be discovered. (more difficult) –CisModule, by Zhou and Wong (2004) –EMCModule, by Gupta and Liu (2005)

LRA

Cooperative motifs: Basic idea: True regulatory regions are likely to have multiple motif sites. Probability for being regulatory

LRA Training data contain a subset of known regulatory and control regions. highest motif matching score within a given sequence regression coefficient Probability for being a regulatory region

Application: skeletal-muscle gene regulation 5 muscle-specific TFs are known: –Mef-2, Myf, SRF, Tef, Sp-1 29 regulatory regions are known. Can we predict the regulatory regions just from sequence motif information?

Computational Procedure Motif matrices are identified by Gibbs sampling using sequence information from the 29 regulatory regions. For some TF, motifs cannot be found by the de novo approach. Use literature motifs instead. Top two matching scores for each TF are included as covariates. Apply LRA model. Use leave-one-out cross- validation to evaluate model performance.

Results Single motifs are highly non-specific. Simple multi-sites analysis improves specificity at the cost of reducing sensitivity.

Results Single motifs are highly non-specific. Simple multi-sites analysis improves specificity at the cost of reducing sensitivity.

Results Single motifs are highly non-specific. Simple multi-sites analysis improves specificity at the cost of reducing sensitivity. Logistic regression further improves specificity at reduced cost for sensitivity.

Motifs must be known in advance. When known regulatory sequences are few, it is difficult to identify motifs by using traditional methods. Objective: Integrating motif discovery and module finding in a single statistical model. Limitations of LRA

De novo module identification Two tasks Identify TF motifs Identify CRMs.

Why module approach can help motif discovery Due to poor specificity, a short sequence can be enriched simply by chance. The probability for random matches is much smaller for motif co-occurrence.

cisModule Basic idea: A two-level hierarchical mixture model (HMx). –Level 1: modules  sequences (Zhou and Wong 2004)

cisModule Basic idea: A two-level hierarchical mixture model (HMx). –Level 1: modules  sequences –Level 2: motifs  modules (Zhou and Wong 2004)

Treat HMx model as a stochastic machinery to generate sequences. –From the first sequence position, make a series of random decisions of whether to initiate a module of length l or generate a letter from the background model. –Inside a module, If a site for the kth motif was initiated at position n, then generate w k letters from its PWM and place them at [n, n+w k -1], otherwise generate a letter from the background. –After reaching the end of the current module, decide whether sampling from the background or initiating a new module. HMx Model as a Stochastic Process (Zhou and Wong 2004)

given alignment, update model parameters given model parameters, update module/motif locations Model inference: Gibbs sampling

An numerical experiment Merge the 29 regulatory regions with a set of sequences randomly selected from ENSEMBL promoters. Test the ability of cisModule to identify motifs under “noisy” environment.

Results

Limitations of CisModule The length of module, and number of motifs are externally provided. Convergence time could be slow. Multiple cycles are needed each starting from a new seed. Assuming that combinations of different motifs are independent.

EMCModule Gupta and Liu (2005) developed a similar approach called EMCModule. Main difference: –They use the collection of literature motifs as initial “seeds” for motif discovery. –Their method improves the convergence speed. –Their definition of CRMs are a little different: the number of motifs are fixed within one module, but the order of and distance between different motifs can be varied.

Further issues Comparative genomic approach can also be incorporated into module discovery. (Zhou and Wong 2007). The modules identified by these methods can be viewed as belonging to one “type”. New methods need to developed to discover multiple module types. While module-based approach is helpful for finding cooperative motifs, it may hurt discovery of single motifs.

(Yuh et al. 1998)

Reading List Wasserman and Fickett (1988) –LRA. One of the first work on cis-regulatory modules. Zhou and Wong (2004) –cisModule. A statistical method to identify cis- regulatory modules without knowledge of motif information. Yuh et al. (1998) –An influential biological paper on how information can be integrated from different modules to regulate gene expression.