Presentation is loading. Please wait.

Presentation is loading. Please wait.

Internal loops within the RNA secondary structure can be worked out in an almost quadratic time stRNAgology, Haifa, 2006.

Similar presentations


Presentation on theme: "Internal loops within the RNA secondary structure can be worked out in an almost quadratic time stRNAgology, Haifa, 2006."— Presentation transcript:

1 Internal loops within the RNA secondary structure can be worked out in an almost quadratic time stRNAgology, Haifa, 2006

2 M. Roytberg Institute of mathematical problems in biology, Russian Academy of Sciences A.Ogurtsov, S.Shabalina, A.Kondrashov National Center for Biotechnology Information, National Library of Medicine NIH USA

3 An Example: t-RNA From Paul Higgs

4 RNA: Pseudoknots From Durbin et al.(1998) Biological Sequence Comparison

5 AGCT A CGGAGCGATCTCCGAGCTTTCGAGAAAGCCTCTAT T AGC Pseudoknot-free secondary structures only! After J. de Ridder Motivation: Algorithms: - allows divide the problem into independent parts; Biology: - pseudoknot-free structure is “a skeleton” of the RNA structure; the pseudoknots can be predicted on the top of it

6 Pseudoknot-free Secondary structure prediction Search for the optimal structure [Tinoco et al. (1971, 1973); Nussinov and Jacobson (1980), Zuker(1989)… ] Computation of probabilities of base pairings [McCaskill, 1990); Hofacker et al. (1994);... ] Folding modeling [Mironov et al. (1985, 2005);... ] Search for multi-branch free structures [Eppstein et al., 1992, Larmore and Schieber, 1991 ]

7 Pseudoknot-free Secondary structure prediction Search for the optimal structure [Tinoco et al. (1971, 1973); Nussinov and Jacobson (1980),… ] Computation of probabilities of base pairings [McCaskill, 1990); Hofacker et al. (1994);... ] Folding modeling [Mironov et al. (1985, 2005);... ] Search for optimal and sub-optimal multi-branch free (MBF) structures

8 What is Multi-Branch Free structure? [ Nearest Neighbor Model for RNA energy, Jaeger, J.A., Turner,D.H. and Zuker,M. (1989) ] Why Multi-Branch Free structures?

9 S – stacking loop (pair); B – bulge; C – 1x0 bulge; M – multi-branched loop (3 branches) H – hairpin loop I – internal loop (general case) E: 1x1, F: 1x2, G: 2x2 – special internal loops Structure energy = = Sum of loops energies

10 Hairpins, Multi-branch and Internal Loops x 1 = 61; x 2 = 65; t x = 3 y 1 = 81; y 2 = 76; t y = 4 Opening base pairing (x 1, y 1 ) Closing base pairing (x 2, y 2 ) X-spacer length t x = x 2 - x 1 -1 Y-spacer length t y = y 2 - y 1 -1

11 Loop Energies Stacking pairs – given by the table, e.g. Stack[A,U; A,U] “Small non-branched loops” (0x1, 1x1, 1x2, 2x2-loops) – given by the table Bulge 0xn – B(n) {+ dependence on paired bases…} Hairpin of length n - H(n) {+….} k-branched loop with n unpaired bases: c 1 k + c 2 n Internal loop…

12 Internal Loop Energies: f_Int ( t x, t y ) = NB: - D 0 = 6 (small!) - “Multi-branch”: c∙ (t x,+ t y ) t x = 3 t y = 4 = f Sum (t x,+ t y ) + f Diff (|t x - t y |); f Sum (s) ~ log(s); f Diff (d) = w 0 – w∙( D 0 -d), d < D 0 ; w 0, d ≥ D 0 ;

13 What is Multi-Branch Free structure? (summary) The structure without Multi-Branch Loops; Internal loops are algorithmically most difficult loops [because of complex form of its energy function]

14 Why Multi-Branch Free structures? Algorithms: The algorithm to process MBF structures is a part of the algorithm predicting optimal 2-structure of general form; The (sub-)optimal MBF structures can be found quickly Run-time depends on the number of putative base pairings rather than on the RNA length Biology: Some RNA do have MBF structures; Set of (sub-)optimal MBF structures can help to predict elements of the RNA structures (e.g. unpaired regions)

15 PROBLEMS TO BE CONSIDERED (Given the RNA sequence of length L; number of possible base pairings is M  L 2 ) Problem 1. Find the optimal (i.e. having minimal possible energy) MBF structure Problem 1 *. Give the sub-algorithm to analyze internal loops for the algorithm predicting the optimal RNA secondary structure. Problem 2. Construct the set of conditionally optimal MBF structures, i.e. the set that for every possible pairing (p, q) contains an optimal MBF structure in which nucleotides p and q form a pair.

16 RESULTS (Given the RNA sequence of length L; number of possible base pairings is M  L 2 ) All the Problems can be solved with time complexity O(M∙log 2 (L))  O(L 2 log 2 (L)) Comment: Best known before algorithm for the Problem 1 [Lyngsø et al. (1999)] has O(L 3 ) run-time. The Problem 2 was not considered before

17 Sparse Dynamic Programming [ D.Eppstein, Z.Galil, Z.Galil, R.Giancarlo, G.Italiano(1992) ] Solves the Problem 1 with O(M∙log 2 (L)) run-time But…

18 What to improve in SDP -1: SDP Energies  NNM Energies Energy function f NNM (t x, t y ) must be a convex function of t x + t y However for NNM: f NNM (t x, t y ) = = f Sum (t x,+ t y ) + f Diff (|t x - t y |); f Sum (s) ~ log(s); f Diff (d) = w 0 – w∙( D 0 -d), d < D 0 ; w 0, d ≥ D 0 ; D 0 = 6

19 How to improve SDP - 1 Problem 1. Find the optimal (i.e. having minimal possible energy) MBF structure with NNM energy function

20 How to adapt SDP to NNM Scoring function? f NNM (t x, t y ) = = f Sum (t x,+ t y ) + f Diff (|t x - t y |); f Sum (s) ~ log(s); f Diff (d) = w 0 – w∙( D 0 -d), d < D 0 ; w0, d ≥ D 0 ; D 0 = 6 (small!)  : Take benefit from the small value of D 0

21 PROBLEM 1: dot-matrix representation of the set U of putative base pairings (2, 13) (3, 12) (4, 8) (A, U) (C, G) (G,C) External (distant) b.p.  Upper-right point The set U for the RNA “UACGCACCAGAGUGG” (L=15).

22 DIAG r = {(p, q) | (p, q)  U, p+q = r} STRIP r = {(p, q)  U | r– D 0 <p+q < r+D 0 } ; r = 15 (A,B)  DIAG r (x, y)  STRIPr   f Diff (|(A-x) – (y-B)|)= = f Diff (|(B+A) – (y+x)|) < <w 0 f Diff (d) = { d =>w 0 – w∙( D 0 -d) d ≥ D 0 => => w 0 }

23 for all B :=1 to L { // G(A, B) is the Energy of optimal MBF structure for all (A, B)  U {// with the base pairing (L-A+1, B) G Main (A, B) = =min{G(x, y) + w 0 + f Len ((B–A) – (y–x+2)) | (x, y)  U(A, B)} and G Strip (A, B) = min{G(x, y)- w  |(A+B)-(x+y)|+f Len ((B–A) – (y–x+2))| (x, y)  STRIP A+B } Then find G(A, B) = min{ w 0 + G Main (A, B), G Strip (A, B) } }} Problem 1: The algorithm

24 Problem 1: Run-time estimation [M = size(U); L= RNA length] G Main (A, B) = min{G(x, y) + f Len ((B–A) – (y–x+2)) | (x, y)  U(A, B)} ~O(M∙log 2 (L)) by SparseDP G Strip (A, B) = min{G A+B (x, y) + f Len ((B–A) – (y–x+2))| (x, y)  STRIP A+B } ~O(M∙D 0 ∙log(L)) using convexity of f Len (s) and partial linearity of f Diff (d) G(A, B) = min{ w 0 + G Main (A, B), G Strip (A, B) } ~O(M) RUN-TIME: ~O(M∙log 2 (L))

25 Problem 1: Run-time estimation [M = size(U); L= RNA length] G Main (A, B) = min{G(x, y) + f Len ((B–A) – (y–x+2)) | (x, y)  U(A, B)} ~O(M∙log 2 (L)) by SparseDP !!!! Candidate lists perform even better! RNALMax length of a can- didate list Average length of a can- didate list NM_207436 1597262.05 NM_173589 3222201.97 NM_003622 6076202.03 NM_032969 9146142.11 NM_014611 17400141.99

26 What to improve in SDP - 2: No probabilities !!! DP algorithm finding an optimal structure can be transformed to the algorithm finding the partition function and probabilities SDP algorithm does NOT allow this

27 A C Z B E F D 0.5.1.3 1. 0.4.1 0.2.1 1..8.1.2 Score(path) = W(e 1 )+...+W(e n ) BestScore(A) = = min{ W(AB) + BestScore(B), W(AC) + BestScore(C), W(AD) + BestScore(D), } min(a+b, a+c) = a + min(b, c) A C Z B E F D 5 2 3 7 113 14 6 5 7 6 7 Prob(path) = p(e 1 )x...xp(e n ) Prob(A) = = Summa{ p(AB) x Prob(B), p(AC) + Prob(C), p(AD) + Prob(D), } Summa(ab, ac) = ab + ac = a  (b+ c) DP: distributivity only

28 SDP: “owner paradigm” “Owner’s observation”: Let G = min{G B, G 1, G 2, …} and G A > G B. Then we know already the value G’ = min{G A, G B, G 1, G 2, …} = G However, this does not help if we have to compute S = G B + G 1 + G 2 + … and S’ = G A + G B + G 1 + G 2 + …

29 “How to improve” SDP - 2: Problem 2. Construct the set of conditionally optimal MBF structures, i.e. the set that for every possible pairing (p, q) contains an optimal MBF structure Opt (p, q) in which nucleotides p and q form a pair.

30 Problem 2: Preliminary observation-1 SDP-M: for every (A, B) finds G(A,B) and the optimal chain ending in (A, B)  for every putative base pairing (i, j) finds the optimal MBF structure Ext(i, j) with the external base pairing (i, j) Run-time: O(M∙log 2 (L))

31 Problem 2: Preliminary observation-2 SDP-M: for every (A, B) finds G(A,B) and the optimal chain ending in (A, B)  for every putative base pairing (i, j) finds the optimal MBF structure Int(i, j) with the internal base pairing (i, j) Run-time: O(M∙log 2 (L))

32 Problem 2: Solution For every putative base pairing (i, j) find the optimal MBF structure Ext(i, j) with the outside base pairing (i, j). Run-time: O(M∙log 2 (L)) For every putative base pairing (i, j) find the optimal MBF structure Int(i, j) with the outside base pairing (i, j). Run-time: O(M∙log 2 (L)) For every putative base pairing (i, j) obtain the desired optimal MBF structure Opt(i, j) as concatenation of Ext(i, j) and Int(i, j).

33 Problem 2: Biology The presence of a low–energy putative MBF structure within a genome fragment can serve as a sign of a non–coding RNA gene. Information about conditionally optimal MBF structures can be used to predict unpaired RNA regions. The accumulating experimental evidence support the importance of target local secondary structure in mRNA and their accessibility for interaction with antisense oligos or siRNAs.

34 CONCLUSION We have proposed the algorithms with run-time O(M∙log 2 (L)) solving the following problems: Problem 1. Find the optimal (i.e. having minimal possible energy) MBF structure Problem 2. Construct the set of conditionally optimal MBF structures, i.e. the set that for every possible pairing (p, q) contains an optimal MBF structure in which nucleotides p and q form a pair. The run-time mainly depends not on the RNA length L but on the size of the set of putative base pairings M. This allows one to use the algorithms in combination with pre-filtering of the set of putative base pairings.

35 A.Yu. Ogurtsov, S. A. Shabalina, A. S. Kondrashov (National Center for Biotechnology Information, National Library of Medicine NIH USA) Thanks to: K. Belkin and P.Vlasov

36 Thank you!  : Any questions?


Download ppt "Internal loops within the RNA secondary structure can be worked out in an almost quadratic time stRNAgology, Haifa, 2006."

Similar presentations


Ads by Google