Searching for structured motifs in the upstream regions of hsp70 genes in Tetrahymena termophila. Roberto Marangoni^, Antonietta La Terza*, Nadia Pisanti^,

Slides:



Advertisements
Similar presentations
Periodic clusters. Non periodic clusters That was only the beginning…
Advertisements

PREDetector : Prokaryotic Regulatory Element Detector Samuel Hiard 1, Sébastien Rigali 2, Séverine Colson 2, Raphaël Marée 1 and Louis Wehenkel 1 1 Bioinformatics.
Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha.
A new method of finding similarity regions in DNA sequences Laurent Noé Gregory Kucherov LORIA/UHP Nancy, France LORIA/INRIA Nancy, France Corresponding.
Bioinformatics Motif Detection Revised 27/10/06. Overview Introduction Multiple Alignments Multiple alignment based on HMM Motif Finding –Motif representation.
Regulatory Motifs. Contents Biology of regulatory motifs Experimental discovery Computational discovery PSSM MEME.
Phylogenetic reconstruction
Profiles for Sequences
Bioinformatics Finding signals and motifs in DNA and proteins Expectation Maximization Algorithm MEME The Gibbs sampler Lecture 10.
Identification of Transcriptional Regulatory Elements in Chemosensory Receptor Genes by Probabilistic Segmentation Steven A. McCarroll, Hao Li Cornelia.
Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.
Computing the exact p-value for structured motif Zhang Jing (Tsinghua University and university of waterloo) Co-authors: Xi Chen, Ming Li.
Identification of a Novel cis-Regulatory Element Involved in the Heat Shock Response in Caenorhabditis elegans Using Microarray Gene Expression and Computational.
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
Pattern Discovery in RNA Secondary Structure Using Affix Trees (when computer scientists meet real molecules) Giulio Pavesi& Giancarlo Mauri Dept. of Computer.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.
Bacterial Physiology (Micr430)
Similar Sequence Similar Function Charles Yan Spring 2006.
Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002.
Lecture Slides Elementary Statistics Twelfth Edition
Finding Regulatory Motifs in DNA Sequences
Phylogenetic Tree Construction and Related Problems Bioinformatics.
Statistics in Bioinformatics May 12, 2005 Quiz 3-on May 12 Learning objectives-Understand equally likely outcomes, counting techniques (Example, genetic.
Motif finding: Lecture 1 CS 498 CXZ. From DNA to Protein: In words 1.DNA = nucleotide sequence Alphabet size = 4 (A,C,G,T) 2.DNA  mRNA (single stranded)
A Statistical Method for Finding Transcriptional Factor Binding Sites Authors: Saurabh Sinha and Martin Tompa Presenter: Christopher Schlosberg CS598ss.
BIONFORMATIC ALGORITHMS Ryan Tinsley Brandon Lile May 9th, 2014.
Motif Discovery in Protein Sequences using Messy De Bruijn Graph Mehmet Dalkilic and Rupali Patwardhan.
Automatic methods for functional annotation of sequences Petri Törönen.
MicroRNA Targets Prediction and Analysis. Small RNAs play important roles The Nobel Prize in Physiology or Medicine for 2006 Andrew Z. Fire and Craig.
Guiding Motif Discovery by Iterative Pattern Refinement Zhiping Wang, Mehmet Dalkilic, Sun Kim School of Informatics, Indiana University.
INF380 - Proteomics-91 INF380 – Proteomics Chapter 9 – Identification and characterization by MS/MS The MS/MS identification problem can be formulated.
Genomes and Their Evolution. GenomicsThe study of whole sets of genes and their interactions. Bioinformatics The use of computer modeling and computational.
Identification of Regulatory Binding Sites Using Minimum Spanning Trees Pacific Symposium on Biocomputing, pp , 2003 Reporter: Chu-Ting Tseng Advisor:
Sequence analysis – an overview A.Krishnamachari
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
Using Mixed Length Training Sequences in Transcription Factor Binding Site Detection Tools Nathan Snyder Carnegie Mellon University BioGrid REU 2009 University.
Statistical Analysis for Word counting in Drosophila Core Promoters Yogita Mantri April Bioinformatics Capstone presentation.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
PreDetector : Prokaryotic Regulatory Element Detector Samuel Hiard 1, Sébastien Rigali 2, Séverine Colson 2, Raphaël Marée 1 and Louis Wehenkel 1 1 Department.
Figure 2: over-representation of neighbors in the fushi-tarazu region of Drosophila melanogaster. Annotated enhancers are marked grey. The CDS is marked.
Computational Genomics and Proteomics Lecture 8 Motif Discovery C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
Pattern Matching Rhys Price Jones Anne R. Haake. What is pattern matching? Pattern matching is the procedure of scanning a nucleic acid or protein sequence.
Localising regulatory elements using statistical analysis and shortest unique substrings of DNA Nora Pierstorff 1, Rodrigo Nunes de Fonseca 2, Thomas Wiehe.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Motif Detection in Yeast Vishakh Joe Bertolami Nick Urrea Jeff Weiss.
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Bioinformatics and Computational Biology
Pattern Discovery and Recognition for Genetic Regulation Tim Bailey UQ Maths and IMB.
1 From Mendel to Genomics Historically –Identify or create mutations, follow inheritance –Determine linkage, create maps Now: Genomics –Not just a gene,
Local Multiple Sequence Alignment Sequence Motifs
. Finding Motifs in Promoter Regions Libi Hertzberg Or Zuk.
Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine 朱林娇 14S
Statistical Tests We propose a novel test that takes into account both the genes conserved in all three regions ( x 123 ) and in only pairs of regions.
Intro to Probabilistic Models PSSMs Computational Genomics, Lecture 6b Partially based on slides by Metsada Pasmanik-Chor.
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – the Transcription.
Finding genes in the genome
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
Finding Motifs Vasileios Hatzivassiloglou University of Texas at Dallas.
A Very Basic Gibbs Sampler for Motif Detection
Bioinformatics tools to identify structured motifs in the upstream regions of stress-response-involved genes in Tetrahymena thermophila Antonietta La Terza*,
Genomes and Their Evolution
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
Introduction to Bioinformatics II
1 Department of Engineering, 2 Department of Mathematics,
Basic Local Alignment Search Tool
Mapping Global Histone Acetylation Patterns to Gene Expression
Nora Pierstorff Dept. of Genetics University of Cologne
Presentation transcript:

Searching for structured motifs in the upstream regions of hsp70 genes in Tetrahymena termophila. Roberto Marangoni^, Antonietta La Terza*, Nadia Pisanti^, Sabrina Barchetta*, Cristina Miceli* ^ Dipartimento di Informatica, Università di Pisa* Dipartimento di Biologia M.C.A., Università di Camerino 1 - BIOLOGICAL BACKGROUND: Gene regulation and structured motifs Structured genetic motifs, functioning as regulatory elements, are short DNA sequences that determine the timing, location and level of gene expression. Although often only 5 to 20 bp in length, they are critical in understanding gene regulation. Experimental procedures for regulatory element discovery, such as electrophoretic shift assays or in vivo analysis such as DNA transformation with reporter genes, are long procedures that typically verify one element at a time. Therefore, computational methods have been developed to predict regulatory elements and their locations in a high-throughput manner. Tetrahymena thermophila and heat shock protein 70 genes Genes induced by stresses represent excellent models to identify new genetic elements involved in the control of gene expression. Our attention is mainly focused on genes of the heat-shock protein family. The expression of the heat shock genes is known to be regulated mainly at transcriptional level. The inducibility of the heat shock genes in response to various environmental stresses, depend on the activation of the heat shock factors (HSF). HSF bind to highly evolutionary conserved heat shock regulatory elements (HSE) that are composed by at least three adjacent and inverse repeats of the motif 5’nGAAn 3’.One inducible hsp70 gene was cloned from Tetrahymena thermophila and the promoter was characterized. It showed to contain several HSE motifs with canonical and non-canonical sequences and a new genetic element with repetitive GATA sequences, that resembles the element specific for GATA binding factors (Fig.1). Electrophoretic mobility shift assays and mutational changes followed by in vivo analysis with a reporter gene revealed that the canonical HSE plays a determinant role in the induction of hsp70 gene transcription and that the repetitive GATA sequences are necessary for the hsp70 expression. By searching into the entire Tetrahymena genome recently completely sequenced, other genes of the same family (and also other stress genes) were identified. Their promoter sequences represent the data we analized using SMILE. 3- RESULTS a) HSE-motif and other similar motifs The following table summarizes the results obtained investigating for three-boxes structured motifs searched into the hsp70 genes of Tetrahymena thermophila. MotifScore ACA_TGT_ACA1.01 ATG_CAT_ATG0.70 GTT_AAC_GTT0.70 ATC_GAT_ATC0.55 TGA_TCA_TGA0.44 CTA_TAG_CTA0.38 TAG_CTA_TAG0.34 TTG_CAA_TTG0.25 CAA_TTG_CAA0.22 AGA_TCT_AGA0.22 TCT_AGA_TCT0.21 GAA_TTC_GAA0.12 TTC_GAA_TTC0.11 CTT_AAG_CTT0.10 The score indicates the deviation from randomness; a score >0 indicates that the pattern is statistically significant. The yellow box is highlighting the HSE pattern, which is experimentally proved to be involved in gene regulation. No experimental evidences are available for the other motifs at the moment. Figure 2 shows a very schematic representation about the localization of the most significant motifs found, including the HSE motif. A preliminary correlation analysis has given no indication about possible cooperation of these motifs in gene regulation, but more work is necessary to address this problem. b) GATA-motif GATA motif results very frequent in the searched genes, and highly repeated along the upstream sequences. This causes a low (but significant) score, and it is very difficult to represent in a graph similar to that in the Fig. 2, because of its abundance. Correlation studies are in progress to investigate possible association of several GATA boxes in a single functional motif. Fig. 2 ACA_TGT_ACA GTT_AAC_GTT ATG_CAT_ATG ATC_GAT_ATC GAA_TTC_GAA (HSE motif) hsp70 div1 hsp70 div2 hsp70 div3 hsp70 div4 hsp70 * * Gene characterized by experimental analysis HSE ……. GATA nGAAnnTTCnnGAAnWGATAR 5’ 3’ ATG Initation of transcription Initation of translation Diagramatic representation of the T. thermophila hsp70 promoter region including among others, the HSE and GATA regulatory motifs involved in the stress gene activation as shown by experimental analysis. The canonical sequences of each motif are reported above the corresponding box (n: any nucleotides; W: A/T; R:G/C) Fig SMILE and the searching strategy used To study structured motifs we used a software called SMILE (Structured Motifs Inference, Localization and Evaluation) which is based on an algorithm introduced in Marsan and Sagot (Marsan, L. and M. -F. Sagot, Algorithms for extracting structured motifs using a suffix tree with application to promoter and regulatory site consensus identification. J. of Comput. Biol. 7, 345–362.). It works with an index (suffix tree) of the set of sequences instead of working directly with the sequences. SMILE takes as input a set of unaligned biological sequences and a list of parameters. The parameters correspond to the properties that the patterns sought must satisfy. SMILE outputs all motifs in the input sequences that match such properties. The motifs SMILE can handle are complex as they may be composed of any specified number of parts, or sub- motifs. We call such sub-motifs the various boxes of the motif. An assumption is that the occurrences of the boxes of a motif must always appear in the same relative order in the sequences. Each one of the boxes composing the structured motif has its own user-defined characteristics. Other parameters describe characteristics of the whole motif. We are mainly interested in focusing on the HSE structured motif, whose structure is very particular, being composed by three boxes, the first and the last of which are identical, while the middle box is the reverse complement of the other two. This is strongly suggesting for a particular conformation of the DNA segment, that can be the responsible of the genetic regulative function. Indeed, SMILE allows to set parameters in such a way to focus the search on all the possible three-boxes motifs, arranged in a general pattern of the type: XYZ_Z*Y*X*_XYZ where X,Y,Z represent any DNA base, and X*, Y* and Z* represent their complements. We also tried to investigate possible spatial correlations between the patterns found. We also run searches for the GATA motif, in order to assess its statistical relevance. For the whole motif in our queries we have asked that the motif occurs in all the input sequences in an exact way, and that is composed of all the three boxes. For each box we have asked a length 3 and a distance with the next box that ranges from 2 to 14. All the motifs extracted according to the specified structural parameters are classified according to their statistical significance. SMILE offers two ways of performing such evaluation. We used the one that compares the number (and the distribution on the input sequences) of occurrences of the motifs found in the original sequence, with their occurrences in another set of related biological sequences that are not supposed to contain the motif and that are obtained by means of a random shuffling of the original sequences that maintains the distribution of fragments of length 3 (this number has been suitably chosen as it is the same as the length of the boxes).