Figure 2: over-representation of neighbors in the fushi-tarazu region of Drosophila melanogaster. Annotated enhancers are marked grey. The CDS is marked.

Slides:



Advertisements
Similar presentations
Periodic clusters. Non periodic clusters That was only the beginning…
Advertisements

Predicting Enhancers in Co-Expressed Genes Harshit Maheshwari Prabhat Pandey.
Finding regulatory modules from local alignment - Department of Computer Science & Helsinki Institute of Information Technology HIIT University of Helsinki.
Bioinformatics Motif Detection Revised 27/10/06. Overview Introduction Multiple Alignments Multiple alignment based on HMM Motif Finding –Motif representation.
Ab initio gene prediction Genome 559, Winter 2011.
Evolution at the DNA level …ACGGTGCAGTTACCA… …AC----CAGTCCACCA… Mutation SEQUENCE EDITS REARRANGEMENTS Deletion Inversion Translocation Duplication.
Identification of a Novel cis-Regulatory Element Involved in the Heat Shock Response in Caenorhabditis elegans Using Microarray Gene Expression and Computational.
CS273a Lecture 8, Win07, Batzoglou Evolution at the DNA level …ACGGTGCAGTTACCA… …AC----CAGTCCACCA… Mutation SEQUENCE EDITS REARRANGEMENTS Deletion Inversion.
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
Comparative ab initio prediction of gene structures using pair HMMs
The Model To model the complex distribution of the data we used the Gaussian Mixture Model (GMM) with a countable infinite number of Gaussian components.
CS273a Lecture 9/10, Aut 10, Batzoglou Multiple Sequence Alignment.
Similar Sequence Similar Function Charles Yan Spring 2006.
Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
An Investigation into Selection Constraints in RNA Genes Naila Mimouni, Rune Lyngsoe and Jotun Hein Department of Statistics, Oxford University Aim A robust.
Short Primer on Comparative Genomics Today: Special guest lecture 12pm, Alway M108 Comparative genomics of animals and plants Adam Siepel Assistant Professor.
Lecture 12 Splicing and gene prediction in eukaryotes
Materials and Methods Abstract Conclusions Introduction 1. Korber B, et al. Br Med Bull 2001; 58: Rambaut A, et al. Nat. Rev. Genet. 2004; 5:
Sequencing a genome and Basic Sequence Alignment
Motif finding: Lecture 1 CS 498 CXZ. From DNA to Protein: In words 1.DNA = nucleotide sequence Alphabet size = 4 (A,C,G,T) 2.DNA  mRNA (single stranded)
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Input for the Bayesian Phylogenetic Workflow All Input values could be loaded as text file or typing directly. Only for the multifasta file is advised.
Multiple Sequence Alignment May 12, 2009 Announcements Quiz #2 return (average 30) Hand in homework #7 Learning objectives-Understand ClustalW Homework#8-Due.
* only 17% of SNPs implicated in freshwater adaptation map to coding sequences Many, many mapping studies find prevalent noncoding QTLs.
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
BIOINFORMATICS IN BIOCHEMISTRY Bioinformatics– a field at the interface of molecular biology, computer science, and mathematics Bioinformatics focuses.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Sequence analysis – an overview A.Krishnamachari
Common Errors in Student Annotation Submissions contributions from Paul Lee, David Xiong, Thomas Quisenberry Annotating multiple genes at the same locus.
Computational Identification of Drosophila microRNA Genes Journal Club 09/05/03 Jared Bischof.
Sequencing a genome and Basic Sequence Alignment
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
Statistical Analysis for Word counting in Drosophila Core Promoters Yogita Mantri April Bioinformatics Capstone presentation.
Discovering the Correlation Between Evolutionary Genomics and Protein-Protein Interaction Rezaul Kabir and Brett Thompson
Searching for structured motifs in the upstream regions of hsp70 genes in Tetrahymena termophila. Roberto Marangoni^, Antonietta La Terza*, Nadia Pisanti^,
Localising regulatory elements using statistical analysis and shortest unique substrings of DNA Nora Pierstorff 1, Rodrigo Nunes de Fonseca 2, Thomas Wiehe.
From Genomes to Genes Rui Alves.
Log 2 (expression) H3K4me2 score A SLAMF6 log 2 (expression) Supplementary Fig. 1. H3K4me2 profiles vary significantly between loci of genes expressed.
Motif Detection in Yeast Vishakh Joe Bertolami Nick Urrea Jeff Weiss.
` Gene Diversification and Transcript Variants by Transposable Elements Un-Jong Jo 1, Dae-Soo Kim 1, Tae-Hyung Kim 1, Jae-Won Huh 2 and Heui-Soo Kim 1,2.
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
EB3233 Bioinformatics Introduction to Bioinformatics.
Cis-regulatory Modules and Module Discovery
Alternative Splicing (a review by Liliana Florea, 2005) CS 498 SS Saurabh Sinha 11/30/06.
Construction of Substitution matrices
Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features 王荣 14S
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Statistical Tests We propose a novel test that takes into account both the genes conserved in all three regions ( x 123 ) and in only pairs of regions.
Maik Friedel, Thomas Wilhelm, Jürgen Sühnel FLI-Jena, Germany Introduction: During the last 10 years, a large number of complete.
Pattern Discovery and Recognition for Understanding Genetic Regulation Timothy L. Bailey Institute for Molecular Bioscience University of Queensland.
Transcription factor binding motifs (part II) 10/22/07.
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
Finding Motifs Vasileios Hatzivassiloglou University of Texas at Dallas.
HW7: Evolutionarily conserved segments ENCODE region 009 (beta-globin locus) Multiple alignment of human, dog, and mouse 2 states: neutral (fast-evolving),
bacteria and eukaryotes
Introduction to Bioinformatics Resources for DNA Barcoding
Fig. 9. Scatter plots of 500 inferred rates versus their simulated values with a model tree with six sequences and d = 0.1 for (a) ML and (b) EB-EXP. The.
Pipelines for Computational Analysis (Bioinformatics)
Bioinformatics tools to identify structured motifs in the upstream regions of stress-response-involved genes in Tetrahymena thermophila Antonietta La Terza*,
University of Pittsburgh
Ab initio gene prediction
Overview Bioinformatics: Analyzing biological data using statistics, math modeling, and computer science BLAST = Basic Local Alignment Search Tool Input.
Introduction to Bioinformatics II
The Release 5.1 Annotation of Drosophila melanogaster Heterochromatin
Presented by, Jeremy Logue.
Nora Pierstorff Dept. of Genetics University of Cologne
Presented by, Jeremy Logue.
Common Errors in Student Annotation Submissions contributions from Paul Lee, David Xiong, Thomas Quisenberry Annotating multiple genes at the same locus.
Presentation transcript:

Figure 2: over-representation of neighbors in the fushi-tarazu region of Drosophila melanogaster. Annotated enhancers are marked grey. The CDS is marked blue. Figure 1: over-representation of neighbors in the even-skipped region of Drosophila melanogaster. Annotated enhancers are marked grey. The CDS is marked blue. Analysis of gene regulatory regions by means of DNA composition Nora Pierstorff 1, Bernhard Haubold 2, Thomas Wiehe 1 1 Dept. of Genetics, University of Cologne, Germany, 2 Dept. of Biotechnology and Bioinformatics, Univ. of Applied Sciences, Weihenstephan, Germany Abstract: We developed a software tool, termed “shustring” and based on suffix trees, for intra-genomic and intra-specific analysis of DNA sequences. This program determines the lengths of shortest unique substrings and the number of their close variants in a genome. Comparison of expected and observed length- and neighbor-distributions yields characteristic properties in intergenic and promoter regions. We investigate the statistical properties of shustrings in intra-genomic (intrinsic) as well as in inter-genomic analyses and present results of the method for several well studied examples of regulatory regions of developmental genes in Drosophila. Introduction: There are three basic approaches to the prediction of regulatory elements. Some ab initio or intrinsic methods are based on the assumption that regulatory regions contain over-represented strings. A second approach, often with the help of position weight matrices, looks for consensus sequences of binding sites of known transcription factors. Finally, a third approach relies on the hypothesis that functional elements are more conserved than nonfunctional elements (phylogenetic footprinting). However, it is known [1] that binding sites are often not conserved even among closely related species, but may be subject to rapid evolutionary turn-over. With the “shustring” approach, we are able to analyze an arbitrary number of sequences in a single run. The result are pointers to those stretches in the query sequence which are unusual with respect to the length of unique substrings and to the size of their neighborhood. Ab initio method: The ab initio approach for the prediction of regulatory elements is to recognise regions which contain highly overrepresented patterns. Shustring returns the length of the shortest unique substring for each position and the number and position of its neighbours. Neighbours are Hamming-1-neighbours, differing exactly at the last position. To avoid dependency on the length of the query sequence, we performed a sliding window analysis (window size 1000bp). Based on an analytically derived probability distribution we calculate the p-value of the number of observed neighbours. Hereby, the length of the shustring and the GC-content of the sequence are taken into account. Shustrings with a p-value <0.05 are recorded. The relative frequency of the recorded shustrings in a window of 200bp (step size 1bp) is calculated and plotted in Figures 1 and 2. Sequence comparison: Dermitzakis and Clark [1] noted that regulatory elements may evolve very rapidly. Hence, sequence comparison alone is often not sufficient to detect regulatory elements. The shustring method allows one to determine exceptionally long (indicative of sequence conservation) and exceptionally short unique substrings at the same time. As an example, we analyzed orthologous regions in Drosophila melanogaster and Drosophila virilis, which diverged about 40Myr ago and show an average sequence identity of about 66.8%. In contrast to the ab initio analysis above, we record here shustrings with extreme lengths (p-value <0.05) and plot their relative frequency in a sliding window of length 200bp in Figures 3 and 4. Discussion: The shustring method applied to one sequence finds shortest unique sequences and their Hamming-1-neighbours, which differ at the last position. Regions, which contain many shustrings with an over-represented neighbourhood are candidates for regulatory regions. The results of our program are comparable to other methods, which predict regulatory regions based on over- represented strings [2]. To improve the prediction, we added information obtained from sequence comparison with orthologous sequences form other species. Fast evolving as well as conserved regions may be detected at the same time based on extreme shustring lengths. The examples from Drosophila melanogaster and virilis indicate that results improve with respect to the ab initio approach. Our approach is also clearly different from traditional alignment methods and may complement these as shown in the lower panels of Figures 3 and 4. References: [1] Dermitzakis E., Clark A. (2002). Evolution of transcription Factor Binding Sites in Mammalian Gene Regulatory Regions: Conservation and Turnover. Mol. Biol. Evol. 19(7): [2] Nazina A., Papatsenko D. (2003). Statistical extraction of Drosophila cis-regulatory modules using exhaustive assessment of local word frequency. BMC Bioinformatics 4: /4/65 [3] N. Rajewsky, M. Vergassola, U. Gaul, and E. D. Siggia (2002): Computational detection of genomic cis-regulatory modules, applied to body patterning in the early Drosophila embryo. BMC Bioinformatics, 3:30 Figure 3: Comparison of even-skipped regions of Drosophila melanogaster and virilis. Upper panel: shustrings of extreme lengths. Middle panel: Average conservation. Lower panel: Ahab prediction.[3] The colour scheme is as in Fig. 1. Figure 4: As in Figure 3, but for fushi- tarazu region. Ahab prediction binding site prediction based on PWM’s Ahab prediction binding site prediction based on PWM’s Alignment score based on blastz alignment