Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dynamic Programming (cont’d) CS 466 Saurabh Sinha.

Similar presentations


Presentation on theme: "Dynamic Programming (cont’d) CS 466 Saurabh Sinha."— Presentation transcript:

1 Dynamic Programming (cont’d) CS 466 Saurabh Sinha

2 RNA secondary structure prediction

3 RNA RNA is similar to DNA chemically. It is usually only a single strand. T(hyamine) is replaced by U(racil) Some forms of RNA can form secondary structures by “pairing up” with itself. This can change its properties dramatically. http://www.cgl.ucsf.edu/home/glasfeld/tutorial/trna/trna.giftRNA linear and 3D view:

4 RNA There’s more to RNA than mRNA RNA can adopt interesting non-linear structures, and catalyze reactions tRNAs (transfer RNAs) are the “adapters” that implement translation

5 Secondary structure Several interesting RNAs have a conserved secondary structure (resulting from base- pairing interactions) Sometimes, the sequence itself may not be conserved for the function to be retained It is important to tell what the secondary structure is going to be, for homology detection

6 Conserved secondary structure N-Y A N-N’ R N-N’ / N Consensus binding site for R17 phage coat protein. N = A/C/G/U, N’ is a complementary base pairing to N, Y is C/U, R is A/G Source: DEKM

7 Basics of secondary structure G-C pairing: three bonds (strong) A-U pairing: two bonds (weaker) Base pairs are approximately coplanar

8 Basics of secondary structure

9 G-C pairing: three bonds (strong) A-U pairing: two bonds (weaker) Base pairs are approximately coplanar Base pairs are stacked onto other base pairs (arranged side by side): “stems”

10 Secondary structure elements Loop: single stranded subsequences bounded by base pairs loop at the end of a stem stem loop single stranded bases within a stem … only on one side of stem … on both sides of stem

11 Non-canonical base pairs G-C and A-U are the canonical base pairs G-U is also possible, almost as stable

12 Nesting Base pairs almost always occur in a nested fashion If positions i and j are paired, and positions i’ and j’ are paired, then these two base-pairings are said to be nested if: i < i’ < j’ < j OR i’ < i < j < j’ Non-nested base pairing: pseudoknot

13 Pseudoknot 2 11 918 (9, 18) (2, 11) NOT NESTED

14 Pseudoknot problems Pseudoknots are not handled by the algorithms we shall see Pseudoknots do occur in many important RNAs But the total number of pseudoknotted base pairs is typically relatively small

15 Secondary structure prediction Find the secondary structure with most base pairs. Nussinov’s algorithm Recursive: finds best structure for small subsequences, and works its way outwards to larger subsequences

16 Nussinov’s algorithm: idea There are only four possible ways of getting the best structure for subsequence (i,j) from the best structures of the smaller subsequences (1) Add unpaired position i onto best structure for subsequence (i+1,j) i i+1 j

17 Nussinov’s algorithm: idea There are only four possible ways of getting the best structure for subsequence (i,j) from the best structures of the smaller subsequences (2) Add unpaired position j onto best structure for subsequence (i,j-1) j j-1i

18 Nussinov’s algorithm: idea There are only four possible ways of getting the best structure for subsequence (i,j) from the best structures of the smaller subsequences (3) Add (i,j) pair onto best structure for subsequence (i+1,j-1) j i+1j-1 i

19 Nussinov’s algorithm: idea There are only four possible ways of getting the best structure for subsequence (i,j) from the best structures of the smaller subsequences (4)Combine two optimal substructures (i,k) and (k+1,j) i kk+1j

20 Nussinov RNA folding algorithm Given a sequence s of length L with symbols s 1 … s L. Let  (i,j) = 1 if s i and s j are a complementary base pair, and 0 otherwise. We recursively calculate scores g(i,j) which are the maximal number of base pairs that can be formed for subsequence s i …s j. Dynamic programming

21 Recursion Starting with all subsequences of length 2, to length L g(i,j) = max of g(i+1, j) g(i,j-1) g(i+1,j-1) +  (i,j) max i < k < j [g(i,k) + g(k+1,j)] Initialization g(i,i-1) = 0 g(i,i) = 0 O(n 2 ) ? No. O(n 3 )

22 Traceback As usual in sequence alignment ? Optimal sequence alignment is a linear path in the dynamic programming table Optimal secondary structure can have “bifurcations” Traceback uses a pushdown stack

23 Traceback Push (1,L) onto stack Repeat until stack is empty: pop (i,j) if i >= j continue else if g(i+1,j) = g(i,j) push (i+1,j) else if g(i,j-1) = g(i,j) push (i,j-1) else if g(i+1,j-1) +  (i,j) = g(i,j) record (i,j) base pair push (i+1,j-1) else for k = i+1 to j-1, if g(i,k)+g(k+1,j) g(i,j) push (k+1,j) push (i,k) break (for loop)


Download ppt "Dynamic Programming (cont’d) CS 466 Saurabh Sinha."

Similar presentations


Ads by Google