Internal loops within the RNA secondary structure can be worked out in an almost quadratic time stRNAgology, Haifa, 2006.

Slides:



Advertisements
Similar presentations
B. Knudsen and J. Hein Department of Genetics and Ecology
Advertisements

RNA Secondary Structure Prediction
RNA structure prediction. RNA functions RNA functions as –mRNA –rRNA –tRNA –Nuclear export –Spliceosome –Regulatory molecules (RNAi) –Enzymes –Virus –Retrotransposons.
Chapter 7 Dynamic Programming.
Ab initio gene prediction Genome 559, Winter 2011.
6 - 1 Chapter 6 The Secondary Structure Prediction of RNA.
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
Predicting the 3D Structure of RNA motifs Ali Mokdad – UCSF May 28, 2007.
Structural bioinformatics
Predicting RNA Structure and Function. Non coding DNA (98.5% human genome) Intergenic Repetitive elements Promoters Introns mRNA untranslated region (UTR)
Predicting RNA Structure and Function
RNA structure prediction. RNA functions RNA functions as –mRNA –rRNA –tRNA –Nuclear export –Spliceosome –Regulatory molecules (RNAi) –Enzymes –Virus –Retrotransposons.
RNA Secondary Structure aagacuucggaucuggcgacaccc uacacuucggaugacaccaaagug aggucuucggcacgggcaccauuc ccaacuucggauuuugcuaccaua aagccuucggagcgggcguaacuc.
Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004.
Pattern Discovery in RNA Secondary Structure Using Affix Trees (when computer scientists meet real molecules) Giulio Pavesi& Giancarlo Mauri Dept. of Computer.
A Basic Introduction to SFold Kevin MacDonald December 7, 2004 BI420 Final Presentation.
Zhi John Lu, Jason Gloor, and David H. Mathews University of Rochester Medical Center, Rochester, New York Improved RNA Secondary Structure Prediction.
Improving Free Energy Functions for RNA Folding RNA Secondary Structure Prediction.
RNA Secondary Structure Prediction
Predicting RNA Structure and Function. Nobel prize 1989Nobel prize 2009 Ribozyme Ribosome RNA has many biological functions The function of the RNA molecule.
Presenting: Asher Malka Supervisor: Prof. Hermona Soreq.
7 -1 Chapter 7 Dynamic Programming Fibonacci Sequence Fibonacci sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, … F i = i if i  1 F i = F i-1 + F i-2 if.
RNA structure analysis Jurgen Mourik & Richard Vogelaars Utrecht University.
Structural Alignment of Pseudoknotted RNAs Banu Dost, Buhm Han, Shaojie Zhang, Vineet Bafna.
Predicting RNA Structure and Function
Predicting RNA Structure and Function. Nobel prize 1989 Nobel prize 2009 Ribozyme Ribosome.
Dynamic Programming (cont’d) CS 466 Saurabh Sinha.
RNA Secondary Structure Prediction Introduction RNA is a single-stranded chain of the nucleotides A, C, G, and U. The string of nucleotides specifies the.
RNA-Seq and RNA Structure Prediction
BIONFORMATIC ALGORITHMS Ryan Tinsley Brandon Lile May 9th, 2014.
RNA informatics Unit 12 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD.
Non-coding RNA gene finding problems. Outline Introduction RNA secondary structure prediction RNA sequence-structure alignment.
MicroRNA Targets Prediction and Analysis. Small RNAs play important roles The Nobel Prize in Physiology or Medicine for 2006 Andrew Z. Fire and Craig.
1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview پرتال پرتال بيوانفورماتيك ايرانيان.
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?
Adrian Caciula Department of Computer Science Georgia State University Joint work with Serghei Mangul (UCLA) Ion Mandoiu (UCONN) Alex Zelikovsky (GSU)
MicroRNA identification based on sequence and structure alignment Presented by - Neeta Jain Xiaowo Wang†, Jing Zhang†, Fei Li, Jin Gu, Tao He, Xuegong.
RNA Secondary Structure Prediction. 16s rRNA RNA Secondary Structure Hairpin loop Junction (Multiloop)Bulge Single- Stranded Interior Loop Stem Image–
Lecture 9 CS5661 RNA – The “REAL nucleic acid” Motivation Concepts Structural prediction –Dot-matrix –Dynamic programming Simple cost model Energy cost.
RNA secondary structure RNA is (usually) single-stranded The nucleotides ‘want’ to pair with their Watson-Crick complements (AU, GC) They may ‘settle’
RNA Structure Prediction
RNA Structure Prediction Including Pseudoknots Based on Stochastic Multiple Context-Free Grammar PMSB2006, June 18, Tuusula, Finland Yuki Kato, Hiroyuki.
MicroRNAs and Other Tiny Endogenous RNAs in C. elegans Annie Chiang JClub Ambros et al. Curr Biol 13:
CS5263 Bioinformatics RNA Secondary Structure Prediction.
Prediction of Secondary Structure of RNA
RNA Structure Prediction RNA Structure Basics The RNA ‘Rules’ Programs and Predictions BIO520 BioinformaticsJim Lund Assigned reading: Ch. 6 from Bioinformatics:
Pre-mRNA secondary structures influence exon recognition Michael Hiller Bioinformatics Group University of Freiburg, Germany.
Motif Search and RNA Structure Prediction Lesson 9.
Tracking down ncRNAs in the genomes. How to find ncRNA gene The stability of ncRNA secondary structure is not sufficiently different from the predicted.
MicroRNA Prediction with SCFG and MFE Structure Annotation Tim Shaw, Ying Zheng, and Bram Sebastian.
RNA Structure Prediction
Rapid ab initio RNA Folding Including Pseudoknots via Graph Tree Decomposition Jizhen Zhao, Liming Cai Russell Malmberg Computer Science Plant Biology.
RNAs. RNA Basics transfer RNA (tRNA) transfer RNA (tRNA) messenger RNA (mRNA) messenger RNA (mRNA) ribosomal RNA (rRNA) ribosomal RNA (rRNA) small interfering.
bacteria and eukaryotes
Stochastic Context-Free Grammars for Modeling RNA
Vienna RNA web servers
Lab 8.3: RNA Secondary Structure
Predicting RNA Structure and Function
RNA Secondary Structure Prediction
RNA Secondary Structure Prediction
Stochastic Context-Free Grammars for Modeling RNA
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
Dynamic Programming (cont’d)
Comparative RNA Structural Analysis
RNA 2D and 3D Structure Craig L. Zirbel October 7, 2010.
SEG5010 Presentation Zhou Lanjun.
Algorithm Course Algorithms Lecture 3 Sorting Algorithm-1
Presentation transcript:

Internal loops within the RNA secondary structure can be worked out in an almost quadratic time stRNAgology, Haifa, 2006

M. Roytberg Institute of mathematical problems in biology, Russian Academy of Sciences A.Ogurtsov, S.Shabalina, A.Kondrashov National Center for Biotechnology Information, National Library of Medicine NIH USA

An Example: t-RNA From Paul Higgs

RNA: Pseudoknots From Durbin et al.(1998) Biological Sequence Comparison

AGCT A CGGAGCGATCTCCGAGCTTTCGAGAAAGCCTCTAT T AGC Pseudoknot-free secondary structures only! After J. de Ridder Motivation: Algorithms: - allows divide the problem into independent parts; Biology: - pseudoknot-free structure is “a skeleton” of the RNA structure; the pseudoknots can be predicted on the top of it

Pseudoknot-free Secondary structure prediction Search for the optimal structure [Tinoco et al. (1971, 1973); Nussinov and Jacobson (1980), Zuker(1989)… ] Computation of probabilities of base pairings [McCaskill, 1990); Hofacker et al. (1994);... ] Folding modeling [Mironov et al. (1985, 2005);... ] Search for multi-branch free structures [Eppstein et al., 1992, Larmore and Schieber, 1991 ]

Pseudoknot-free Secondary structure prediction Search for the optimal structure [Tinoco et al. (1971, 1973); Nussinov and Jacobson (1980),… ] Computation of probabilities of base pairings [McCaskill, 1990); Hofacker et al. (1994);... ] Folding modeling [Mironov et al. (1985, 2005);... ] Search for optimal and sub-optimal multi-branch free (MBF) structures

What is Multi-Branch Free structure? [ Nearest Neighbor Model for RNA energy, Jaeger, J.A., Turner,D.H. and Zuker,M. (1989) ] Why Multi-Branch Free structures?

S – stacking loop (pair); B – bulge; C – 1x0 bulge; M – multi-branched loop (3 branches) H – hairpin loop I – internal loop (general case) E: 1x1, F: 1x2, G: 2x2 – special internal loops Structure energy = = Sum of loops energies

Hairpins, Multi-branch and Internal Loops x 1 = 61; x 2 = 65; t x = 3 y 1 = 81; y 2 = 76; t y = 4 Opening base pairing (x 1, y 1 ) Closing base pairing (x 2, y 2 ) X-spacer length t x = x 2 - x 1 -1 Y-spacer length t y = y 2 - y 1 -1

Loop Energies Stacking pairs – given by the table, e.g. Stack[A,U; A,U] “Small non-branched loops” (0x1, 1x1, 1x2, 2x2-loops) – given by the table Bulge 0xn – B(n) {+ dependence on paired bases…} Hairpin of length n - H(n) {+….} k-branched loop with n unpaired bases: c 1 k + c 2 n Internal loop…

Internal Loop Energies: f_Int ( t x, t y ) = NB: - D 0 = 6 (small!) - “Multi-branch”: c∙ (t x,+ t y ) t x = 3 t y = 4 = f Sum (t x,+ t y ) + f Diff (|t x - t y |); f Sum (s) ~ log(s); f Diff (d) = w 0 – w∙( D 0 -d), d < D 0 ; w 0, d ≥ D 0 ;

What is Multi-Branch Free structure? (summary) The structure without Multi-Branch Loops; Internal loops are algorithmically most difficult loops [because of complex form of its energy function]

Why Multi-Branch Free structures? Algorithms: The algorithm to process MBF structures is a part of the algorithm predicting optimal 2-structure of general form; The (sub-)optimal MBF structures can be found quickly Run-time depends on the number of putative base pairings rather than on the RNA length Biology: Some RNA do have MBF structures; Set of (sub-)optimal MBF structures can help to predict elements of the RNA structures (e.g. unpaired regions)

PROBLEMS TO BE CONSIDERED (Given the RNA sequence of length L; number of possible base pairings is M  L 2 ) Problem 1. Find the optimal (i.e. having minimal possible energy) MBF structure Problem 1 *. Give the sub-algorithm to analyze internal loops for the algorithm predicting the optimal RNA secondary structure. Problem 2. Construct the set of conditionally optimal MBF structures, i.e. the set that for every possible pairing (p, q) contains an optimal MBF structure in which nucleotides p and q form a pair.

RESULTS (Given the RNA sequence of length L; number of possible base pairings is M  L 2 ) All the Problems can be solved with time complexity O(M∙log 2 (L))  O(L 2 log 2 (L)) Comment: Best known before algorithm for the Problem 1 [Lyngsø et al. (1999)] has O(L 3 ) run-time. The Problem 2 was not considered before

Sparse Dynamic Programming [ D.Eppstein, Z.Galil, Z.Galil, R.Giancarlo, G.Italiano(1992) ] Solves the Problem 1 with O(M∙log 2 (L)) run-time But…

What to improve in SDP -1: SDP Energies  NNM Energies Energy function f NNM (t x, t y ) must be a convex function of t x + t y However for NNM: f NNM (t x, t y ) = = f Sum (t x,+ t y ) + f Diff (|t x - t y |); f Sum (s) ~ log(s); f Diff (d) = w 0 – w∙( D 0 -d), d < D 0 ; w 0, d ≥ D 0 ; D 0 = 6

How to improve SDP - 1 Problem 1. Find the optimal (i.e. having minimal possible energy) MBF structure with NNM energy function

How to adapt SDP to NNM Scoring function? f NNM (t x, t y ) = = f Sum (t x,+ t y ) + f Diff (|t x - t y |); f Sum (s) ~ log(s); f Diff (d) = w 0 – w∙( D 0 -d), d < D 0 ; w0, d ≥ D 0 ; D 0 = 6 (small!)  : Take benefit from the small value of D 0

PROBLEM 1: dot-matrix representation of the set U of putative base pairings (2, 13) (3, 12) (4, 8) (A, U) (C, G) (G,C) External (distant) b.p.  Upper-right point The set U for the RNA “UACGCACCAGAGUGG” (L=15).

DIAG r = {(p, q) | (p, q)  U, p+q = r} STRIP r = {(p, q)  U | r– D 0 <p+q < r+D 0 } ; r = 15 (A,B)  DIAG r (x, y)  STRIPr   f Diff (|(A-x) – (y-B)|)= = f Diff (|(B+A) – (y+x)|) < <w 0 f Diff (d) = { d =>w 0 – w∙( D 0 -d) d ≥ D 0 => => w 0 }

for all B :=1 to L { // G(A, B) is the Energy of optimal MBF structure for all (A, B)  U {// with the base pairing (L-A+1, B) G Main (A, B) = =min{G(x, y) + w 0 + f Len ((B–A) – (y–x+2)) | (x, y)  U(A, B)} and G Strip (A, B) = min{G(x, y)- w  |(A+B)-(x+y)|+f Len ((B–A) – (y–x+2))| (x, y)  STRIP A+B } Then find G(A, B) = min{ w 0 + G Main (A, B), G Strip (A, B) } }} Problem 1: The algorithm

Problem 1: Run-time estimation [M = size(U); L= RNA length] G Main (A, B) = min{G(x, y) + f Len ((B–A) – (y–x+2)) | (x, y)  U(A, B)} ~O(M∙log 2 (L)) by SparseDP G Strip (A, B) = min{G A+B (x, y) + f Len ((B–A) – (y–x+2))| (x, y)  STRIP A+B } ~O(M∙D 0 ∙log(L)) using convexity of f Len (s) and partial linearity of f Diff (d) G(A, B) = min{ w 0 + G Main (A, B), G Strip (A, B) } ~O(M) RUN-TIME: ~O(M∙log 2 (L))

Problem 1: Run-time estimation [M = size(U); L= RNA length] G Main (A, B) = min{G(x, y) + f Len ((B–A) – (y–x+2)) | (x, y)  U(A, B)} ~O(M∙log 2 (L)) by SparseDP !!!! Candidate lists perform even better! RNALMax length of a can- didate list Average length of a can- didate list NM_ NM_ NM_ NM_ NM_

What to improve in SDP - 2: No probabilities !!! DP algorithm finding an optimal structure can be transformed to the algorithm finding the partition function and probabilities SDP algorithm does NOT allow this

A C Z B E F D Score(path) = W(e 1 )+...+W(e n ) BestScore(A) = = min{ W(AB) + BestScore(B), W(AC) + BestScore(C), W(AD) + BestScore(D), } min(a+b, a+c) = a + min(b, c) A C Z B E F D Prob(path) = p(e 1 )x...xp(e n ) Prob(A) = = Summa{ p(AB) x Prob(B), p(AC) + Prob(C), p(AD) + Prob(D), } Summa(ab, ac) = ab + ac = a  (b+ c) DP: distributivity only

SDP: “owner paradigm” “Owner’s observation”: Let G = min{G B, G 1, G 2, …} and G A > G B. Then we know already the value G’ = min{G A, G B, G 1, G 2, …} = G However, this does not help if we have to compute S = G B + G 1 + G 2 + … and S’ = G A + G B + G 1 + G 2 + …

“How to improve” SDP - 2: Problem 2. Construct the set of conditionally optimal MBF structures, i.e. the set that for every possible pairing (p, q) contains an optimal MBF structure Opt (p, q) in which nucleotides p and q form a pair.

Problem 2: Preliminary observation-1 SDP-M: for every (A, B) finds G(A,B) and the optimal chain ending in (A, B)  for every putative base pairing (i, j) finds the optimal MBF structure Ext(i, j) with the external base pairing (i, j) Run-time: O(M∙log 2 (L))

Problem 2: Preliminary observation-2 SDP-M: for every (A, B) finds G(A,B) and the optimal chain ending in (A, B)  for every putative base pairing (i, j) finds the optimal MBF structure Int(i, j) with the internal base pairing (i, j) Run-time: O(M∙log 2 (L))

Problem 2: Solution For every putative base pairing (i, j) find the optimal MBF structure Ext(i, j) with the outside base pairing (i, j). Run-time: O(M∙log 2 (L)) For every putative base pairing (i, j) find the optimal MBF structure Int(i, j) with the outside base pairing (i, j). Run-time: O(M∙log 2 (L)) For every putative base pairing (i, j) obtain the desired optimal MBF structure Opt(i, j) as concatenation of Ext(i, j) and Int(i, j).

Problem 2: Biology The presence of a low–energy putative MBF structure within a genome fragment can serve as a sign of a non–coding RNA gene. Information about conditionally optimal MBF structures can be used to predict unpaired RNA regions. The accumulating experimental evidence support the importance of target local secondary structure in mRNA and their accessibility for interaction with antisense oligos or siRNAs.

CONCLUSION We have proposed the algorithms with run-time O(M∙log 2 (L)) solving the following problems: Problem 1. Find the optimal (i.e. having minimal possible energy) MBF structure Problem 2. Construct the set of conditionally optimal MBF structures, i.e. the set that for every possible pairing (p, q) contains an optimal MBF structure in which nucleotides p and q form a pair. The run-time mainly depends not on the RNA length L but on the size of the set of putative base pairings M. This allows one to use the algorithms in combination with pre-filtering of the set of putative base pairings.

A.Yu. Ogurtsov, S. A. Shabalina, A. S. Kondrashov (National Center for Biotechnology Information, National Library of Medicine NIH USA) Thanks to: K. Belkin and P.Vlasov

Thank you!  : Any questions?