Genome Wide Searches for RNA Secondary Structure Motifs Russell S. Hamilton Davis Lab Wellcome Trust Centre for Cell Biology Drosophila melanogaster.

Slides:



Advertisements
Similar presentations
RNA Secondary Structure Prediction
Advertisements

Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome ECS289A.
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
PREDetector : Prokaryotic Regulatory Element Detector Samuel Hiard 1, Sébastien Rigali 2, Séverine Colson 2, Raphaël Marée 1 and Louis Wehenkel 1 1 Bioinformatics.
Whole genome alignments Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Bioinformatics Tutorial I BLAST and Sequence Alignment.
Finding regulatory modules from local alignment - Department of Computer Science & Helsinki Institute of Information Technology HIIT University of Helsinki.
Bioinformatics Motif Detection Revised 27/10/06. Overview Introduction Multiple Alignments Multiple alignment based on HMM Motif Finding –Motif representation.
Two short pieces MicroRNA Alternative splicing.
Lecture 8 Alignment of pairs of sequence Local and global alignment
Predicting RNA Structure and Function. Non coding DNA (98.5% human genome) Intergenic Repetitive elements Promoters Introns mRNA untranslated region (UTR)
Predicting RNA Structure and Function
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
Introduction to Bioinformatics - Tutorial no. 9 RNA Secondary Structure Prediction.
Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004.
Pattern Discovery in RNA Secondary Structure Using Affix Trees (when computer scientists meet real molecules) Giulio Pavesi& Giancarlo Mauri Dept. of Computer.
Predicting RNA Structure and Function. Nobel prize 1989Nobel prize 2009 Ribozyme Ribosome RNA has many biological functions The function of the RNA molecule.
Discovery of RNA Structural Elements Using Evolutionary Computation Authors: G. Fogel, V. Porto, D. Weekes, D. Fogel, R. Griffey, J. McNeil, E. Lesnik,
Protein Modules An Introduction to Bioinformatics.
Presenting: Asher Malka Supervisor: Prof. Hermona Soreq.
Similar Sequence Similar Function Charles Yan Spring 2006.
RiboSearch Ben Daniel Ariel Kirshner Naomi
Predicting RNA Structure and Function. Nobel prize 1989 Nobel prize 2009 Ribozyme Ribosome.
Protein Tertiary Structure Prediction
Metagenomic Analysis Using MEGAN4
MicroRNA Targets Prediction and Analysis. Small RNAs play important roles The Nobel Prize in Physiology or Medicine for 2006 Andrew Z. Fire and Craig.
1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview پرتال پرتال بيوانفورماتيك ايرانيان.
Title: GeneWiz browser: An Interactive Tool for Visualizing Sequenced Chromosomes By Peter F. Hallin, Hans-Henrik Stærfeldt, Eva Rotenberg, Tim T. Binnewies,
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.
File formats Wrapping your data in the right package Deanna M. Church
Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches1 By Jayakumar Rudhrasenan S Primary Supervisor: Prof. Heiko Schroder.
BASys: A Web Server for Automated Bacterial Genome Annotation Gary Van Domselaar †, Paul Stothard, Savita Shrivastava, Joseph A. Cruz, AnChi Guo, Xiaoli.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics Lab v1 | Saurabh Sinha1 Powerpoint by Casey Hanson.
BIOINFORMATICS IN BIOCHEMISTRY Bioinformatics– a field at the interface of molecular biology, computer science, and mathematics Bioinformatics focuses.
A new way of seeing genomes Combining sequence- and signal-based genome analyses Maik Friedel, Thomas Wilhelm, Jürgen Sühnel FLI Introduction: So far,
From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?
11 Overview Paracel GeneMatcher2. 22 GeneMatcher2 The GeneMatcher system comprises of hardware and software components that significantly accelerate a.
Copyright OpenHelix. No use or reproduction without express written consent1.
Exploring Alternative Splicing Features using Support Vector Machines Feature for Alternative Splicing Alternative splicing is a mechanism for generating.
Pattern Matching Rhys Price Jones Anne R. Haake. What is pattern matching? Pattern matching is the procedure of scanning a nucleic acid or protein sequence.
Introduction to Bioinformatics Dr. Rybarczyk, PhD University of North Carolina-Chapel Hill
Motif discovery and Protein Databases Tutorial 5.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics | Saurabh Sinha | PowerPoint by Casey Hanson.
This seems highly unlikely.
Motif Search and RNA Structure Prediction Lesson 9.
MicroRNA Prediction with SCFG and MFE Structure Annotation Tim Shaw, Ying Zheng, and Bram Sebastian.
Finding genes in the genome
RNA Structure Prediction
Practice -- BLAST search in your own computer 1.Download data file from the course web page, or Ensemble. Save in the blast\dbs folder. 2.Start a CMD window,
Welcome to the combined BLAST and Genome Browser Tutorial.
PROTEIN INTERACTION NETWORK – INFERENCE TOOL DIVYA RAO CANDIDATE FOR MASTER OF SCIENCE IN BIOINFORMATICS ADVISOR: Dr. FILIPPO MENCZER CAPSTONE PROJECT.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center
Protein Structure Prediction: Threading and Rosetta BMI/CS 576 Colin Dewey Fall 2008.
Animal Development Part 1: Using Drosophila to study pattern formation Petros Ligoxygakis Associate Professor of Genetics.
4.2 - Algorithms Sébastien Lemieux Elitra Canada Ltd.
Mirela Andronescu February 22, 2005 Lab 8.3 (c) 2005 CGDN.
Lab 8.3: RNA Secondary Structure
Predicting RNA Structure and Function
There are four levels of structure in proteins
Introduction to Bioinformatics II
Identification and Characterization of pre-miRNA Candidates in the C
Comparative RNA Structural Analysis
RNA 2D and 3D Structure Craig L. Zirbel October 7, 2010.
Volume 14, Issue 7, Pages (February 2016)
Basic Local Alignment Search Tool
Computational Genomics of Noncoding RNA Genes
It is the presentation about the overview of DOT MATRIX and GAP PENALITY..
Presentation transcript:

Genome Wide Searches for RNA Secondary Structure Motifs Russell S. Hamilton Davis Lab Wellcome Trust Centre for Cell Biology Drosophila melanogaster

Introduction: RNA Localization2 + - microtubules mRNA cis-acting signal Trans-acting factors Dynein RNA Localization is a mode of targeting various proteins to their site of function Cis-acting signals in the mRNA are recognised by trans-acting factors bound to the dynein motor Translation of the mRNA into protein is blocked during transport The mRNA is anchored at the site of function before being translated to protein (Delanoue & Davis, 2005, Cell, in press)

gurken is localized to the dorso/anterior corner, forming a cap around the oocyte nucleus and establishes the dorso/ventral axis gurken localization has been shown to be dynein dependent (MacDougal et al, 2003, Dev. Cell, 4, ) gurken localization signal has been mapped to 64nt necessary and sufficient for localization (Van De Bor & Davis, 2004, Curr. Opin. Cell Biol. 16, ) gurken also localizes in the embryo Introduction: gurken 3 D V A P osk bcd grk Localizing mRNA in oocyte gurken encodes a TGFα homologue

Introduction: I Factor4 Localized I Factor nucleus I Factor is a retrotransposon (or transposable element), which inserts itself into the genome of an organism I Factor has been found to localize in a similar manner to gurken (Van De Bor, Hartswood, Jones, Finnegan & Davis) The localization signal has been mapped to a 58nt signal necessary and sufficient for localization. Van De Bor

Sequence Similarity%ID = 34% gurken AAGTAATTTTCGTGCTCTCAACAATTGTCGCCGTCACAGATTGTTGTTCGAGCCGAATCTTACT 64 Ifactor ---TGCACACCTCCCTCGTCACTCTTGATTTT-TCAAGAGCCTTCGATCGAGTAGGTGTGCA-- 58 * * *** ** *** *** * * ***** * * Structural Similarity V. Van Der Bor, D. Finnegan, E. Harstwood and C. Jones H St I1 B I2 H St I1 B I2 gurken 64nt stem loop I Factor 58nt stem loop Are there more examples in the Drosophila genome using a similar mechanism of localization? Search by secondary structure not sequence Introduction: gurken and I Factor5

Genome sequences Database Folded Genome sequences Comparison with grk & I Factor structures Method Outline6

RNALFOLD Folds large genomic sequences outputting stable structures of a given size Similar to mfold, but optimised for folding on genome wide scale 2L chromosome arm genomic sequence Stable Structures RNALfold Hofacker et al (2004) Bioinformatics 20, Method: RNALFOLD7 Window Length user defined Use 64 and 58 (grk & I Factor LEs)

RNAdistance & RNAforester Structures represented in bracket format Minimal representation maintaining all structural characteristics Structures then aligned (not by sequence) with the query structure e.g. gurken LE Scores can be weighted by sequence length and total number of base pairs..(((((.....))))). Matches = + score.-(-(((-....))))-. Mismatches = - score ( = base pair. = unpaired base - = gap RNAdistance Global Structure Comparison Hofacker (1994) Monatsh.Chem. 125, RNAforester Local Structure Comparison Hochsmann (2003) Proc. Comp. Sys. Bioinf. (CSB 2003) Method: RNAdistance & RNAforester8

Flexible secondary structure definition and searching algorithm Two step process Step 1. Create a structure description Step 2. Use the description to find matching structures in a sequence database Uses Mfold (and pknots) for secondary structure predictions Output can be ranked by thermodynamic stability User Defined Scoring Based on if/then/else statements e.g. if loop has 6-8 bases then score += 10 else score -= 10 Algorithm Summary Description converted to a tree structure Sequence being matched, has secondary structure converted to tree structure Then the matching can occur. Method: RNAMotif9 Macke, T.J. et al (2001) Nucl., Acids., Res., 29,

Define base pairings allowed (in addition to Watson-Crick) Define stems, loops, and bulges Including number of nucleotides Setting a range 0-N means it can either be present or not Can also put in sequence constraints Including tolerated mismatches Can search for pseudoknots, triplexes & quadruplexes Very flexible method of describing secondary structures Method: RNAMotif10

4 Description files so far… 1. Basic 2900 hits Matches both gurken and I factor LEs 2. Basic + score 2900 hits Scores nearer gurken as positive Scores nearer I factor as negative 3. Basic + score + seq contraint UU 394 hits UU in bulge present in both gurken and I factor 4. Basic + score + seq contraint UU + CAA/AAC 151+ hits CAA/AAC stem1 present in both gurken and I factor Method: RNAMotif11

Take all available sequence databases Predict all stable secondary structures Calculate similarity between grk/Ifactor and stable structures Pattern match structures against an RNAMotif description Results put in database and accessed via web interface Method: Overview12

Processing 6 processing nodes Pentium 4 HT 1GB RAM Data Storage RAID Array File Server Tape Backup Robot Computational requirements are beyond desktop PC’s Main requirements are for processing power and enough storage space for the sequences being searched and the database of matching structures Computational Infrastructure13 Web Server Linked to Database Development Platform

To stop your browser crashing, you can limit the number of hits displayed Filter by percentage of the sequence deemed to have low complexity Select the RNAMotif structure description used in the searches Narrow down the search by CG, TE, CR or individual identifiers X Web Interface: Searching14

RNAMotif raw output showing how sequence matches the structure description Indicates if the sequence has regions of low complexity/repeat regions (option to filter these out) RNAdistance scores displayed Custom RNAMotif Score Web Interface: Search Results15

Web Interface: Gene Mapping16

Web Interface: Conservation Assessment17

Results: Candidate Injections18 We are currently in the process of injecting candidates from the database into oocytes and embryos to determine if the RNA is localized. There have been suggestions that up to 20% of Drosophila genes may localize in the oocyte and/or embryo So we want to show that our method is able to enrich for localizing genes Results of candidate injections are stored in the database

Depending of the success of the experimental localization assays… Expand the searches to: Other Drosophilid genomes 12 will be sequenced in the near future Mammalian genomes (particularly human) Will require considerable computational power Search for LINE/SINE elements in human (transposon equivalents) Develop the web interface to enable real time searches to be performed on genes/genomes of interest Requires massive computational power… Future Work: Expanding Searches19

Squid Protein gurken mRNA is known to bind Squid protein Used homology modelling to predict squid tertiary structure (~2.5Å) (Hamilton & Soares) RNA tertiary structure prediction Secondary structure alone may not be sufficient for finding similar structures Experimental Structure Determination RNA + Protein - X-Ray and/or NMR RNA only- NMR Future Work: Tertiary Structure 20 RNA Binding Sites Flexible Linker region Squid homology model RNA + protein 3D Structure Staufen + RNA Ramos et al, 2000, EMBO, 19,

Long Term Future… Support Vector Machines (SVMs) Take sequence & structure for localizing and non- localizing matches (+ other data) Algorithm learns how to separate localizing from non-localizing Future Work: Machine Learning21 Problem is we don’t have enough data at the moment However with all the candidate injections we will hopefully generate enough data for localizing and non-localizing genes

Funding Davis Lab Ilan Davis Veronique Van De Bor Georgia Vendra Hille Tekotte Renald Delanue Carine Meignin Alejandra Clark Isabelle Kos Richard Parton Software Acknowledgements22 Finnegan Lab David Finnegan Eve Hartswood Cheryl Jones Bioinformatics Discussions Alastair Kerr Systems Administration Paul Taylor Homology Modelling Dinesh Soares