Presentation is loading. Please wait.

Presentation is loading. Please wait.

Genome Wide Searches for RNA Secondary Structure Motifs Russell S. Hamilton Davis Lab Wellcome Trust Centre for Cell Biology Drosophila melanogaster.

Similar presentations

Presentation on theme: "Genome Wide Searches for RNA Secondary Structure Motifs Russell S. Hamilton Davis Lab Wellcome Trust Centre for Cell Biology Drosophila melanogaster."— Presentation transcript:

1 Genome Wide Searches for RNA Secondary Structure Motifs Russell S. Hamilton Davis Lab Wellcome Trust Centre for Cell Biology Drosophila melanogaster

2 Introduction: RNA Localization2 + - microtubules mRNA cis-acting signal Trans-acting factors Dynein RNA Localization is a mode of targeting various proteins to their site of function Cis-acting signals in the mRNA are recognised by trans-acting factors bound to the dynein motor Translation of the mRNA into protein is blocked during transport The mRNA is anchored at the site of function before being translated to protein (Delanoue & Davis, 2005, Cell, in press)

3 gurken is localized to the dorso/anterior corner, forming a cap around the oocyte nucleus and establishes the dorso/ventral axis gurken localization has been shown to be dynein dependent (MacDougal et al, 2003, Dev. Cell, 4, 307-19) gurken localization signal has been mapped to 64nt necessary and sufficient for localization (Van De Bor & Davis, 2004, Curr. Opin. Cell Biol. 16, 300-7 ) gurken also localizes in the embryo Introduction: gurken 3 D V A P osk bcd grk Localizing mRNA in oocyte gurken encodes a TGFα homologue

4 Introduction: I Factor4 Localized I Factor nucleus I Factor is a retrotransposon (or transposable element), which inserts itself into the genome of an organism I Factor has been found to localize in a similar manner to gurken (Van De Bor, Hartswood, Jones, Finnegan & Davis) The localization signal has been mapped to a 58nt signal necessary and sufficient for localization. Van De Bor

5 Sequence Similarity%ID = 34% gurken AAGTAATTTTCGTGCTCTCAACAATTGTCGCCGTCACAGATTGTTGTTCGAGCCGAATCTTACT 64 Ifactor ---TGCACACCTCCCTCGTCACTCTTGATTTT-TCAAGAGCCTTCGATCGAGTAGGTGTGCA-- 58 * * *** ** *** *** * * ***** * * Structural Similarity V. Van Der Bor, D. Finnegan, E. Harstwood and C. Jones H St I1 B I2 H St I1 B I2 gurken 64nt stem loop I Factor 58nt stem loop Are there more examples in the Drosophila genome using a similar mechanism of localization? Search by secondary structure not sequence Introduction: gurken and I Factor5

6 Genome sequences Database Folded Genome sequences Comparison with grk & I Factor structures Method Outline6

7 RNALFOLD Folds large genomic sequences outputting stable structures of a given size Similar to mfold, but optimised for folding on genome wide scale 2L chromosome arm genomic sequence Stable Structures RNALfold Hofacker et al (2004) Bioinformatics 20, 191-198 Method: RNALFOLD7 Window Length user defined Use 64 and 58 (grk & I Factor LEs)

8 RNAdistance & RNAforester Structures represented in bracket format Minimal representation maintaining all structural characteristics Structures then aligned (not by sequence) with the query structure e.g. gurken LE Scores can be weighted by sequence length and total number of base pairs..(((((.....))))). Matches = + score.-(-(((-....))))-. Mismatches = - score ( = base pair. = unpaired base - = gap RNAdistance Global Structure Comparison Hofacker (1994) Monatsh.Chem. 125, 167-188 RNAforester Local Structure Comparison Hochsmann (2003) Proc. Comp. Sys. Bioinf. (CSB 2003) Method: RNAdistance & RNAforester8

9 Flexible secondary structure definition and searching algorithm Two step process Step 1. Create a structure description Step 2. Use the description to find matching structures in a sequence database Uses Mfold (and pknots) for secondary structure predictions Output can be ranked by thermodynamic stability User Defined Scoring Based on if/then/else statements e.g. if loop has 6-8 bases then score += 10 else score -= 10 Algorithm Summary Description converted to a tree structure Sequence being matched, has secondary structure converted to tree structure Then the matching can occur. Method: RNAMotif9 Macke, T.J. et al (2001) Nucl., Acids., Res., 29, 4724-4735

10 Define base pairings allowed (in addition to Watson-Crick) Define stems, loops, and bulges Including number of nucleotides Setting a range 0-N means it can either be present or not Can also put in sequence constraints Including tolerated mismatches Can search for pseudoknots, triplexes & quadruplexes Very flexible method of describing secondary structures Method: RNAMotif10

11 4 Description files so far… 1. Basic 2900 hits Matches both gurken and I factor LEs 2. Basic + score 2900 hits Scores nearer gurken as positive Scores nearer I factor as negative 3. Basic + score + seq contraint UU 394 hits UU in bulge present in both gurken and I factor 4. Basic + score + seq contraint UU + CAA/AAC 151+ hits CAA/AAC stem1 present in both gurken and I factor Method: RNAMotif11

12 Take all available sequence databases Predict all stable secondary structures Calculate similarity between grk/Ifactor and stable structures Pattern match structures against an RNAMotif description Results put in database and accessed via web interface Method: Overview12

13 Processing 6 processing nodes Pentium 4 HT 1GB RAM Data Storage RAID Array File Server Tape Backup Robot Computational requirements are beyond desktop PC’s Main requirements are for processing power and enough storage space for the sequences being searched and the database of matching structures Computational Infrastructure13 Web Server Linked to Database Development Platform

14 To stop your browser crashing, you can limit the number of hits displayed Filter by percentage of the sequence deemed to have low complexity Select the RNAMotif structure description used in the searches Narrow down the search by CG, TE, CR or individual identifiers X Web Interface: Searching14

15 RNAMotif raw output showing how sequence matches the structure description Indicates if the sequence has regions of low complexity/repeat regions (option to filter these out) RNAdistance scores displayed Custom RNAMotif Score Web Interface: Search Results15

16 Web Interface: Gene Mapping16

17 Web Interface: Conservation Assessment17

18 Results: Candidate Injections18 We are currently in the process of injecting candidates from the database into oocytes and embryos to determine if the RNA is localized. There have been suggestions that up to 20% of Drosophila genes may localize in the oocyte and/or embryo So we want to show that our method is able to enrich for localizing genes Results of candidate injections are stored in the database

19 Depending of the success of the experimental localization assays… Expand the searches to: Other Drosophilid genomes 12 will be sequenced in the near future Mammalian genomes (particularly human) Will require considerable computational power Search for LINE/SINE elements in human (transposon equivalents) Develop the web interface to enable real time searches to be performed on genes/genomes of interest Requires massive computational power… Future Work: Expanding Searches19

20 Squid Protein gurken mRNA is known to bind Squid protein Used homology modelling to predict squid tertiary structure (~2.5Å) (Hamilton & Soares) RNA tertiary structure prediction Secondary structure alone may not be sufficient for finding similar structures Experimental Structure Determination RNA + Protein - X-Ray and/or NMR RNA only- NMR Future Work: Tertiary Structure 20 RNA Binding Sites Flexible Linker region Squid homology model RNA + protein 3D Structure Staufen + RNA Ramos et al, 2000, EMBO, 19, 997-1009

21 Long Term Future… Support Vector Machines (SVMs) Take sequence & structure for localizing and non- localizing matches (+ other data) Algorithm learns how to separate localizing from non-localizing Future Work: Machine Learning21 Problem is we don’t have enough data at the moment However with all the candidate injections we will hopefully generate enough data for localizing and non-localizing genes

22 Funding Davis Lab Ilan Davis Veronique Van De Bor Georgia Vendra Hille Tekotte Renald Delanue Carine Meignin Alejandra Clark Isabelle Kos Richard Parton Software Acknowledgements22 Finnegan Lab David Finnegan Eve Hartswood Cheryl Jones Bioinformatics Discussions Alastair Kerr Systems Administration Paul Taylor Homology Modelling Dinesh Soares

Download ppt "Genome Wide Searches for RNA Secondary Structure Motifs Russell S. Hamilton Davis Lab Wellcome Trust Centre for Cell Biology Drosophila melanogaster."

Similar presentations

Ads by Google