Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome ECS289A.

Slides:



Advertisements
Similar presentations
Chapter 14 Phage Strategies.
Advertisements

Gene regulation /function card Anatomical network card Tassy et al., Figure S1: Navigation diagram of ANISEED Anatomical structure card Expression card.
Tianyu Zhan, Sharon Huang, Nallammai Muthiah, Evangeline Giannopoulos, J Peter Gergen Stony Brook University, Department of Biochemistry and Cell Biology.
1 * egg: generate the system * larva: eat and grow
Computational discovery of gene modules and regulatory networks Ziv Bar-Joseph et al (2003) Presented By: Dan Baluta.
Predicting Enhancers in Co-Expressed Genes Harshit Maheshwari Prabhat Pandey.
Computational detection of cis-regulatory modules Stein Aerts, Peter Van Loo, Ger Thijs, Yves Moreau and Bart De Moor Katholieke Universiteit Leuven, Belgium.
Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha.
Finding regulatory modules from local alignment - Department of Computer Science & Helsinki Institute of Information Technology HIIT University of Helsinki.
Biol/Chem 473 Schulze lecture 5: Eukaryotic gene regulation: Early Drosophila development.
Gene regulatory network
A turbo intro to (the bioinformatics of) microRNAs 11/ Peter Hagedorn.
Identification of a Novel cis-Regulatory Element Involved in the Heat Shock Response in Caenorhabditis elegans Using Microarray Gene Expression and Computational.
Transcription factor binding motifs (part I) 10/17/07.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
MOPAC: Motif-finding by Preprocessing and Agglomerative Clustering from Microarrays Thomas R. Ioerger 1 Ganesh Rajagopalan 1 Debby Siegele 2 1 Department.
The Model To model the complex distribution of the data we used the Gaussian Mixture Model (GMM) with a countable infinite number of Gaussian components.
An analysis of “Alignments anchored on genomic landmarks can aid in the identification of regulatory elements” by Kannan Tharakaraman et al. Sarah Aerni.
Bio277 Lab 3: Finding Transcription Factor Binding Motifs Adapted from a Lab Written by Prof Terry Speed Jess Mar Department of Biostatistics Quackenbush.
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.
Defining the Regulatory Potential of Highly Conserved Vertebrate Non-Exonic Elements Rachel Harte BME230.
Biological Sequence Pattern Analysis Liangjiang (LJ) Wang March 8, 2005 PLPTH 890 Introduction to Genomic Bioinformatics Lecture 16.
The Hardwiring of development: organization and function of genomic regulatory systems Maria I. Arnone and Eric H. Davidson.
1 Predicting Gene Expression from Sequence Michael A. Beer and Saeed Tavazoie Cell 117, (16 April 2004)
Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002.
Finding Regulatory Motifs in DNA Sequences
B 5:0-3% 5:9-25% 5:51-75% Introduction The Berkeley Drosophila Transcription Network Project (BDTNP) is a multidisciplinary collaboration studying the.
Cis-regultory module 10/24/07. TFs often work synergistically (Harbison 2004)
REGULATORY GENOMICS Saurabh Sinha, Dept. of Computer Science & Institute of Genomic Biology, University of Illinois.
Motif finding: Lecture 1 CS 498 CXZ. From DNA to Protein: In words 1.DNA = nucleotide sequence Alphabet size = 4 (A,C,G,T) 2.DNA  mRNA (single stranded)
Searching for TFBSs with TRANSFAC - Hot topics in Bioinformatics.
Current Topics in Genomics and Epigenomics – Lecture 2.
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
Genome-wide computational prediction of transcriptional regulatory modules reveal new insights into human gene expression Mathieu Blanchette et al. Presented.
Regulation of Gene Expression: An Overview  Transcriptional  Tissue-specific transcription factors  Direct binding of hormones, growth factors, etc.
* only 17% of SNPs implicated in freshwater adaptation map to coding sequences Many, many mapping studies find prevalent noncoding QTLs.
Finish up array applications Move on to proteomics Protein microarrays.
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Comparative genomics analysis of NtcA regulons in cyanobacteria: Regulation of nitrogen assimilation and its coupling to photosynthesis Wen-Ting Huang.
Computational Genomics and Proteomics Lecture 8 Motif Discovery C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
Localising regulatory elements using statistical analysis and shortest unique substrings of DNA Nora Pierstorff 1, Rodrigo Nunes de Fonseca 2, Thomas Wiehe.
Introduction to Bioinformatics Algorithms Finding Regulatory Motifs in DNA Sequences.
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
Tools for Comparative Sequence Analysis Ivan Ovcharenko Lawrence Livermore National Laboratory.
Cis-regulatory Modules and Module Discovery
Introduction to biological molecular networks
Alternative Splicing (a review by Liliana Florea, 2005) CS 498 SS Saurabh Sinha 11/30/06.
Local Multiple Sequence Alignment Sequence Motifs
Inference with Gene Expression and Sequence Data BMI/CS 776 Mark Craven April 2002.
Module Networks BMI/CS 576 Mark Craven December 2007.
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
Finding genes in the genome
Pattern Discovery and Recognition for Understanding Genetic Regulation Timothy L. Bailey Institute for Molecular Bioscience University of Queensland.
BIOBASE Training TRANSFAC ® Containing data on eukaryotic transcription factors, their experimentally-proven binding sites, and regulated genes ExPlain™
REGULATORY GENOMICS Saurabh Sinha, Dept. of Computer Science & Institute of Genomic Biology, University of Illinois.
Regulation of Gene Expression
Learning Sequence Motif Models Using Expectation Maximization (EM)
1 * egg: generate the system * larva: eat and grow
Dennis Shasha, Courant Institute, New York University With
1 * egg: generate the system * larva: eat and grow
Volume 12, Issue 11, Pages (September 2015)
Finding regulatory modules
Volume 21, Issue 1, Pages (October 2017)
Precision of Hunchback Expression in the Drosophila Embryo
Diverse patterns, similar mechanism
Mark Van Doren, Anne L. Williamson, Ruth Lehmann  Current Biology 
Nora Pierstorff Dept. of Genetics University of Cologne
BIOBASE Training TRANSFAC® ExPlain™
Presentation transcript:

Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome ECS289A Presentation By Hua Chen

Background Knowledge A significant character of cis-regulatory sites: the multiple binding sites for different transcriptional factors tend to cluster together in one region around the gene, forming the Cis-Regulatory Modules (CRM). A significant character of cis-regulatory sites: the multiple binding sites for different transcriptional factors tend to cluster together in one region around the gene, forming the Cis-Regulatory Modules (CRM). The searching of cis-regulatory sites gives out too many candidate positions, which make it difficult to tell the true ones; The searching of cis-regulatory sites gives out too many candidate positions, which make it difficult to tell the true ones; The character of CRM provides a feasible method to identify the cis-regulatory sites in the genome. The character of CRM provides a feasible method to identify the cis-regulatory sites in the genome.

One example of CRM in Drosophila: eve gene

Targets: Adopt the clustering of cis-regulatory modules as a method to identify the functional motifs; Adopt the clustering of cis-regulatory modules as a method to identify the functional motifs; Test the method with some known real CRM regions; Test the method with some known real CRM regions; Search the genome to discover CRMs and confirm the results by experiments. Search the genome to discover CRMs and confirm the results by experiments. The System Investigated: The early Drosophila embryo. The early Drosophila embryo. Five transcriptional factors: Bcd, Cad, Hb, Kr and Kni are investigated. Five transcriptional factors: Bcd, Cad, Hb, Kr and Kni are investigated.

Methods: Collecting Transcription Factor Binding Sequences in preceding lab works and doing Alignment; Collecting Transcription Factor Binding Sequences in preceding lab works and doing Alignment; Construction of Position Weight Matrices (PWM) for the conserved motifs. Construction of Position Weight Matrices (PWM) for the conserved motifs. Test the method with the known CRMs; Test the method with the known CRMs; Genome-wide Searching for unknown regulatory regions; Genome-wide Searching for unknown regulatory regions; mRNA Hybridization and Microarray hybridization to test whether the predicted regions are near to genes under regulation of the Transcription Factors; mRNA Hybridization and Microarray hybridization to test whether the predicted regions are near to genes under regulation of the Transcription Factors; One special case: giant gene, further investigated by Transgenics and Mutant Embryo. One special case: giant gene, further investigated by Transgenics and Mutant Embryo.

Step1: Collection and Alignment of TF Binding Sites Bcd, Cad, Hb, Kr, Kni binding sequences are determined by in vitro DNAse protection assays; Bcd, Cad, Hb, Kr, Kni binding sequences are determined by in vitro DNAse protection assays; The sequences are aligned with MEME. The sequences are aligned with MEME.

Step 2: Construction of PWMs and Searching: Patser is used to construct the Position Weight Matrix; Patser is used to construct the Position Weight Matrix; Cis-Analyst is used to identify the potential binding sites matching to the PWM in the Drosophila genome. Cis-Analyst is used to identify the potential binding sites matching to the PWM in the Drosophila genome. A user-defined cutoff parameter (site_p) to eliminate predicted low-affinity sites; A user-defined cutoff parameter (site_p) to eliminate predicted low-affinity sites; Search the sequence with a specified window length; Search the sequence with a specified window length; Retain the windows that contain at least min_sites binding sites; Retain the windows that contain at least min_sites binding sites; Merge all overlapping windows into a cluster. Merge all overlapping windows into a cluster.

Binding Site Sequence for Cad:

Binding Sites:

Step 3: Collection of Known CRMs:

Successful Result: 14/19 with the searching criteria: window-size=700 bp, number of predicted sites>=13

Step 4: Genome-wide Searching: 28 clusters identified; 28 clusters identified; 23 out of 28 fall in regions between genes; 23 out of 28 fall in regions between genes; 5 in the intron regions; 5 in the intron regions; 49 genes in the nearby regions. 49 genes in the nearby regions.

Step 5: Examine the expression pattern of the 49 genes by RNA in situ hybridization and microarray hybridization: The 49 genes are examined by hybridizations to see whether they show the pattern of under regulation of the TFs; The 49 genes are examined by hybridizations to see whether they show the pattern of under regulation of the TFs; 10 out of the 28 clusters are near to at least one gene show the anterior-posterior expression pattern (Under regulation of the five TFs). 10 out of the 28 clusters are near to at least one gene show the anterior-posterior expression pattern (Under regulation of the five TFs).

Step 6: The special case: giant gene The posterior expression is regulated by Cad,Hb,Kr; The posterior expression is regulated by Cad,Hb,Kr; The cis-regulatory sites are still unknown; The cis-regulatory sites are still unknown; The predicted CRM nearest to the giant gene is cloned to the upstream of lacZ reporter gene. The predicted CRM nearest to the giant gene is cloned to the upstream of lacZ reporter gene. The lacZ gene show a similar expression pattern as the giant mRNA. The lacZ gene show a similar expression pattern as the giant mRNA. +/+ Kr/Kr +/+ Kr/Kr

Conclusions: Binding site clustering is an effective method to identify cis-regulatory modules; Binding site clustering is an effective method to identify cis-regulatory modules; A major block is the paucity of the binding data for most transcription factors, which need a systematical work; A major block is the paucity of the binding data for most transcription factors, which need a systematical work; The real CRM structures is more complex, it needs to incorporate more complex rules in the method. The real CRM structures is more complex, it needs to incorporate more complex rules in the method.

Reference Berman, B.P., Nibu, Y. et al Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome. P. N. A. S. 99: Berman, B.P., Nibu, Y. et al Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome. P. N. A. S. 99: