Structural Alignment of Pseudoknotted RNAs Banu Dost, Buhm Han, Shaojie Zhang, Vineet Bafna.

Slides:



Advertisements
Similar presentations
RNA Secondary Structure Prediction
Advertisements

Finding regulatory modules from local alignment - Department of Computer Science & Helsinki Institute of Information Technology HIIT University of Helsinki.
Chapter 7 Dynamic Programming.
6 - 1 Chapter 6 The Secondary Structure Prediction of RNA.
Basics of Comparative Genomics Dr G. P. S. Raghava.
Rapid Global Alignments How to align genomic sequences in (more or less) linear time.
Comp 122, Fall 2004 Dynamic Programming. dynprog - 2 Lin / Devi Comp 122, Spring 2004 Longest Common Subsequence  Problem: Given 2 sequences, X =  x.
Profiles for Sequences
درس بیوانفورماتیک December 2013 مدل ‌ مخفی مارکوف و تعمیم ‌ های آن به نام خدا.
RNA Structure Prediction
Gene Prediction: Similarity-Based Approaches (selected from Jones/Pevzner lecture notes)
6 -1 Chapter 6 The Secondary Structure Prediction of RNA.
March 2006Vineet Bafna Designing Spaced Seeds March 2006Vineet Bafna Project/Exam deadlines May 2 – Send to me with a title of your project May.
Predicting RNA Structure and Function. Non coding DNA (98.5% human genome) Intergenic Repetitive elements Promoters Introns mRNA untranslated region (UTR)
Predicting RNA Structure and Function
Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004.
Searching genomes for noncoding RNA CS374 Leticia Britos 10/03/06.
Pattern Discovery in RNA Secondary Structure Using Affix Trees (when computer scientists meet real molecules) Giulio Pavesi& Giancarlo Mauri Dept. of Computer.
Designing Multiple Simultaneous Seeds for DNA Similarity Search Yanni Sun, Jeremy Buhler Washington University in Saint Louis.
7 -1 Chapter 7 Dynamic Programming Fibonacci Sequence Fibonacci sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, … F i = i if i  1 F i = F i-1 + F i-2 if.
Similar Sequence Similar Function Charles Yan Spring 2006.
BNFO 235 Lecture 5 Usman Roshan. What we have done to date Basic Perl –Data types: numbers, strings, arrays, and hashes –Control structures: If-else,
Finding Common RNA Pseudoknot Structures in Polynomial Time Patricia Evans University of New Brunswick.
Predicting RNA Structure and Function
An Investigation into Selection Constraints in RNA Genes Naila Mimouni, Rune Lyngsoe and Jotun Hein Department of Statistics, Oxford University Aim A robust.
1 Efficient Discovery of Conserved Patterns Using a Pattern Graph Inge Jonassen Pattern Discovery Arwa Zabian 13/07/2015.
Class 2: Basic Sequence Alignment
Predicting RNA Structure and Function. Nobel prize 1989 Nobel prize 2009 Ribozyme Ribosome.
Dynamic Programming (cont’d) CS 466 Saurabh Sinha.
Sequence comparison: Local alignment
Whole genome alignments Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Sequence Alignment and Phylogenetic Prediction using Map Reduce Programming Model in Hadoop DFS Presented by C. Geetha Jini (07MW03) D. Komagal Meenakshi.
Non-coding RNA gene finding problems. Outline Introduction RNA secondary structure prediction RNA sequence-structure alignment.
Space-Efficient Sequence Alignment Space-Efficient Sequence Alignment Bioinformatics 202 University of California, San Diego Lecture Notes No. 7 Dr. Pavel.
1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview پرتال پرتال بيوانفورماتيك ايرانيان.
Some Independent Study on Sequence Alignment — Lan Lin prepared for theory group meeting on July 16, 2003.
QNET: A tool for querying protein interaction networks Banu Dost +, Tomer Shlomi*, Nitin Gupta +, Eytan Ruppin*, Vineet Bafna +, Roded Sharan* + University.
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
7 -1 Chapter 7 Dynamic Programming Fibonacci sequence Fibonacci sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, … F i = i if i  1 F i = F i-1 + F i-2 if.
Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.
From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?
Hugh E. Williams and Justin Zobel IEEE Transactions on knowledge and data engineering Vol. 14, No. 1, January/February 2002 Presented by Jitimon Keinduangjun.
MicroRNA identification based on sequence and structure alignment Presented by - Neeta Jain Xiaowo Wang†, Jing Zhang†, Fei Li, Jin Gu, Tao He, Xuegong.
ReferencesReferences AcknowledgementsAcknowledgements TORQUE server DefinitionsDefinitions MethodsMethods IntroductionIntroduction Experiments & Results.
1 Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine Chenghai Xue, Fei Li, Tao He,
© Wiley Publishing All Rights Reserved. RNA Analysis.
Introduction to Bioinformatics Dr. Rybarczyk, PhD University of North Carolina-Chapel Hill
Exploiting Conserved Structure for Faster Annotation of Non-Coding RNAs without loss of Accuracy Zasha Weinberg, and Walter L. Ruzzo Presented by: Jeff.
Questions?. Novel ncRNAs are abundant: Ex: miRNAs miRNAs were the second major story in 2001 (after the genome). Subsequently, many other non-coding genes.
Gene Prediction: Similarity-Based Methods (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 15, 2005 ChengXiang Zhai Department of Computer Science.
Structural Alignment of Pseudo-knotted RNA
Motif Search and RNA Structure Prediction Lesson 9.
Tracking down ncRNAs in the genomes. How to find ncRNA gene The stability of ncRNA secondary structure is not sufficiently different from the predicted.
Hidden Markov Model and Its Application in Bioinformatics Liqing Department of Computer Science.
Splicing Exons: A Eukaryotic Challenge to Gene Prediction Ian McCoy.
(H)MMs in gene prediction and similarity searches.
Local Exact Pattern Matching for Non-fixed RNA Structures Mika Amit, Rolf Backofen, Steffen Heyne, Gad M. Landau, Mathias Mohl, Christina Schmiedl, Sebastian.
4.2 - Algorithms Sébastien Lemieux Elitra Canada Ltd.
RNA sequence-structure alignment
Sequence comparison: Local alignment
RNA Secondary Structure Prediction
SPIRE Normalized Similarity of RNA Sequences
Comparative RNA Structural Analysis
SPIRE Normalized Similarity of RNA Sequences
CSE 589 Applied Algorithms Spring 1999
Homology Modeling.
Dynamic Programming II DP over Intervals
Computational Genomics of Noncoding RNA Genes
Presentation transcript:

Structural Alignment of Pseudoknotted RNAs Banu Dost, Buhm Han, Shaojie Zhang, Vineet Bafna

Non-coding RNAs are mostly undetected [Nature 2001] Modern RNA world hypothesis: There are many undetected functional ncRNAs. [Eddy Nature Reviews (2001)] There may be many ncRNAs behind many unexplained phenomenon.[Storz Science 2002] Non-coding RNAs (ncRNAs) might be playing a significant role in many cellular functions. ncRNA --- RNA acts as functional molecule, and is not translated into protein.

How can we discover ncRNA genes? Low-energy Stability Approach: Are they the substrings that fold into stable low-energy structures? No. The stability of ncRNA secondary structure is not sufficiently different from the predicted stability of a random sequence. [Rivas and Eddy Bioinformatics (2000)]. Comparative Approach: Are they the substrings that are similar to known ncRNAs in sequence and structure?

ncRNA Discovery: Comparative Approach RNA Local Alignment Problem: Given a non-coding RNA as query, can you find all subsequences in the genomic database that are similar to the query in both sequence and secondary structure? AUCGAA-GAUGGGGCAACCCAACUAAGGUUAACUAAGGUU

ncRNA Discovery: Previous Work RSEARCH [Klein and Eddy BMC Bioinformatics (2003)] FASTR [Bafna and Zhang CSB (2004)] The query ncRNA with known secondary structure is compared to every subsequence in a database. …. Database Query ncRNA

Problem: Can not handle pseudo-knotted structures. RNA alignment problem has been solved for RNAs with a regular structure, i.e. non-pseudo-knotted structures. Regular Structure All of the base pairs are non-crossing. Pseudo-knotted Structure Some of the base pairs are crossing.

Objective Extend the Bafna and Zhang’s algorithm to solve the problem for also the pseudo-knotted structures. Dynamic programming technique used to align subsequences. Challenge: Design a substructure for the suboptimal solutions valid for the pseudo-knotted structures.

Definition: Simple Pseudo-knot All base pairs non-crossing and horizontal when rotated to form 2 loops.

Substructure for Sub-optimal Solutions of a Simple Pseudoknot Regular structure: continuous subintervals as substructure of recursion. Simple Pseudo-knot: can not use this substructure due to interweaving base pairs.

Substructure for Simple Pseudo-knots subpseudoknot P(i, j, k) as the union of two subintervals P(i, j, k) = [i 0, i] U [j, k] frontier (i.j.k)

Naive Approach target query Compute B[i, j, k, i’, j’, k’]  O(m 3 n 3 ) scores. (m:query, n:target) Instead of all triplets in the query, consider only the valid sub-pseudo-knots that will represent the simple pseudo- knot.

Use a chain of sub-pseudoknots to represent Simple Pseudo-knot P(13, 14, 39) P(13, 14, 38) P(13, 14, 37) P(13, 14, 36) P(13, 15, 35) P(12, 15, 35) P(11, 16, 35) P(10, 16, 35) ……..

Why Chaining? DP: use sub-optimal solution of the child sub-structure to compute optimal score at each step. compute B[i,j,k, i’,j’, k’] => O(mn 3 ) scores (m:query, n:target) P(13, 14, 39) P(13, 14, 38) P(13, 14, 37) P(13, 14, 36) P(13, 15, 35) P(12, 15, 35) P(11, 16, 35) P(10, 16, 35) ……..

Alignment Algorithm Recursions: (i,j) is a base pair case B[i, j, k, i’, j’, k’] = max {MATCH, INSERT, DELETE} target query MATCH: (i,j) and (i’, j’) are corresponding pairs DELETE: i is deleted j is deleted i and j are deleted INSERT: i’ is inserted j’ is inserted i’ and j’ are inserted

Alignment Algorithm Recursions: (i,j) is a base pair case B[i, j, k, i’, j’, k’] = max {MATCH, INSERT, DELETE} (i,j) &(i’, j’) are pairs j deleted i deleted i & j deleted i’ inserted j’ inserted i’&j’ inserted

m: query length, n: target length #sub-pseudoknots in query: O(m) #sub-pseudoknots in target (i 0,k 0 ) : O(n 3 ) Time to align (i 0,k 0 ) to a simple pseudoknot Do alignment for all subintervals (i,k 0 ) = O(n) x O(mn 3 ) = O(mn 4 ) Time Complexity: to align to a simple pseudo-knot

Simple Pseudo-knot in a Regular Structure: S in R Use a binary tree to represent RNA Solid circular nodes correspond to the actual base pairs. Empty circular nodes correspond to unpaired bases. Rectangular node correspond to subtree representing pseudo-knotted region

Simple Pseudo-knot in a Simple Pseudo-knot: Recursive Simple Pseudo-knot S in S R in S

Which structures can we handle? Time complexity increases with the number of pseudo-knotted region! R: regular structure, S: simple pseudo-knot R: O(mn 3 ) S: O(mn 4 ) S in R: O(mn 4 ) R in S: O(mn 5 ) R in S in R: O(mn 5 ) = S in S in R: O(mn 5 ). R in S in R in S in R = O(mn 5 ). …….

Can we handle simple pseudo-knots with higher degree: standard pseudo-knots?

Yes! By revising the sub-pseudoknot structure and the recursion cases accordingly. targetquery

Can we handle recursive standard pseudoknots? Yes! Same reasoning with recursive simple pseudoknots.

What is left? What can we NOT handle? We can handle the class of pseudoknots defined by Akutsu which is the second largest class currently defined. We can additionally handle standard and recursive standard pseudoknots which are defined by us. A&U <= A&U U {standard/recursive standard pseudoknots} <= R&E The largest class is defined by Rivas and Eddy. An example from this class we can not handle: We can handle this! (Standard pseudo-knot of degree 4) We can NOT handle this!

Implementation: PAL C++ implementation of our algorithm. input: a query sequence with known structure (R/S/S in R) a target sequence output: all high scoring local alignments in the target sequence

Testing Test Data: RFAM database, 6 RNA families with simple pseudo- knotted structures. (simple pseudo-knots in regular structure) UPSK Antizyme Corona FSE Corona pk3 Parecho CRE IFN gamma

Test 1: Structure Prediction How good is PAL in inferring structure of the target sequence? Pick 2 seed members of an RNA family as query and target. Align them. Compare the inferred structure of target with annotated structure in Rfam.

Test 1: Structure Prediction Results TP, FP, FN, Sensitivity, Specificity Specificity = TP/(TP+FP) Sensitivity = TP/(TP+FN) Both measure is >= 0.95 PAL is a strong predictor of structure

Test 2: Homologue Search How well is PAL in finding the homologues of an RNA sequence? Generate a random genome. Insert the members of an RNA family. Pick one of the members as a query. Search for the homologues of the query. Can we locate the members?

Test 2: Homologue Search Results

Novel Homologues Search Searched whole Viral genomes for homologues of 2 pseudo-knotted RNA families: Corona FSE : 11 novel members Corona pk3 : 20 novel members Searched mouse, rat and gerbil genomes for homologues of IFN-gamma RNA family.

Conclusion PAL is a viable tool in finding novel homologues and inferring structure. We hope PAL will help to understand and explore the impact of pseudo-knotted RNAs in cellular function.