Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSCI 256 Data Structures and Algorithm Analysis Lecture 16 Some slides by Kevin Wayne copyright 2005, Pearson Addison Wesley all rights reserved, and some.

Similar presentations


Presentation on theme: "CSCI 256 Data Structures and Algorithm Analysis Lecture 16 Some slides by Kevin Wayne copyright 2005, Pearson Addison Wesley all rights reserved, and some."— Presentation transcript:

1 CSCI 256 Data Structures and Algorithm Analysis Lecture 16 Some slides by Kevin Wayne copyright 2005, Pearson Addison Wesley all rights reserved, and some by Iker Gondra

2 Dynamic Programming Review Recipe –Characterize structure of problem –Recursively define value of optimal solution –Compute value of optimal solution –Construct optimal solution from computed information Dynamic programming techniques –Binary choice: weighted interval scheduling –Multi-way choice: segmented least squares –Adding a new variable: knapsack –Dynamic programming over intervals: RNA secondary structure

3 RNA Secondary Structure RNA: String B = b 1 b 2  b n over alphabet { A, C, G, U } Secondary structure: RNA is single-stranded so it tends to loop back and form base pairs with itself. This structure is essential for understanding behavior of molecule G U C A GA A G CG A U G A U U A G A CA A C U G A G U C A U C G G G C C G Ex: GUCGAUUGAGCGAAUGUAACAACGUGGCUACGGCGAGA complementary base pairs: A-U, C-G

4 RNA Secondary Structure Secondary structure: A set of pairs S = { (b i, b j ) } that satisfy –[Watson-Crick] S is a matching and each pair in S is a Watson-Crick complement: A-U, U-A, C-G, or G-C –[No sharp turns] The ends of each pair are separated by at least 4 intervening bases. If (b i, b j )  S, then i < j - 4 –[Non-crossing] If (b i, b j ) and (b k, b l ) are two pairs in S, then we cannot have i < k < j < l C GG C A G U U UA A UGUGGCCAU ok GG C A G U UA G A UGGGCAU sharp turn G 44 C GG C A U G U UA A GUUGGCCAU crossing

5 RNA Secondary Structure Out of all the secondary structures that are possible for a single RNA molecule, which are the ones that are likely to arise? –Free energy: Usual hypothesis is that an RNA molecule will form the secondary structure with the optimum total free energy –Goal: Given an RNA molecule B = b 1 b 2  b n, find a secondary structure S that maximizes the number of base pairs approximate by number of base pairs http://www.genebee.msu.su/services/rna2_reduced.html

6 RNA Secondary Structure: Subproblems First attempt: OPT(j) = maximum number of base pairs in a secondary structure on substring b 1 b 2  b j. Either –j is not involved in a pair Find optimal secondary structure in: b 1 b 2  b j-1 –j pairs with t for some t < j – 4 OPT(j-1) 1 tj

7 RNA Secondary Structure: Subproblems If j pairs with some t where t < j – 4, then the no crossover rule tells us that we can’t have a base pair (k,l) where k < t < l < j; this implies we can’t have (k,l) where 1 ≤ k ≤ t -1 and t + 1 ≤ l ≤ j -1 This means that any other pair (k,l) in an optimal structure is either in b 1,b 2,…,b t-1 or in b t+1,…b j-1 So we must look at two subproblems which are decoupled due to the noncrossing constraint: Find the optimal secondary structure in: b 1 b 2  b t-1 Find the optimal secondary structure in: b t+1 b t+2  b j-1

8 RNA Secondary Structure: Subproblems What is different here???? The second subproblem is not on our list of subproblems, because it does not begin with b 1. We need more subproblems! We need to be able to work with subproblems that do not begin with b 1.

9 Dynamic Programming Over Intervals Notation: OPT(i, j) = maximum number of base pairs in a secondary structure of the substring b i b i+1  b j –Case 1: i  j - 4 OPT(i, j) = 0 by no-sharp turns condition –Case 2: i < j - 4 If Base b j is not involved in a pair (Watson –Crick) OPT(i, j) = OPT(i, j-1) If Base b j pairs with b t for some t, i  t < j - 4 the non-crossing constraint means that: OPT(i, j) = 1 + max t { OPT(i, t-1) + OPT(t+1, j-1) } take max over t such that i  t < j-4 and b t and b j are Watson-Crick complements

10 Hence if i < j – 4 we want the maximum of the two values for Opt(i,j) Opt(i,j) = max ( OPT(i, j-1), 1 + max t { OPT(i, t-1) + OPT(t+1, j-1) } )

11 Dynamic Programming Over Intervals What order to solve the sub-problems? –Looking at the recurrence relation, we see that we are invoking solutions to subproblems on shorter intervals –Need to evaluate Opt for shortest intervals first – this is different from the subset sum (and knapsack) strategy of doing row by row –To achieve this need to set an auxillary variable k to a constant and use values of i and j which keep j-i = k –As k gets larger, the interval for the subproblem b i,b i+1,…b j grows

12 Dynamic Programming Over Intervals –Running time: O(n 3 ) Why??? RNA(b 1,…,b n ) { Initialize Opt[i, j] = 0 whenever i  j-4 (ie, i+4 ≥ j) for k = 5, 6, …, n-1 for i = 1, 2, …, n-k set j = i + k Compute Opt[i, j] return Opt[1, n] } using recurrence

13 Running time analysis: There are O(n 2 ) subproblems to solve and evaluating the recurrence in each problem takes O(n) time (because we have to find the max over the t’s such that b t and b j are allowable pairs) So running time is O(n 3 )

14 Example: ACCGGUAGU Recall: base pairs allowed: AU, UA, CG, GC What is the basic array that we need to fill?? (here n= 9)

15 Example: ACCGGUAGU Note if i > j, let Opt (i,j) = 0 (Why??) Need two dimensions to present the array M of for values for Opt (i,j) – one for the left endpoint of the interval being considered, and one for the right endpoint Some initial values are 0 – whenever i ≥ j – 4 (Why??) Begin with k = 5; loop over the i’s from 1 to 4 (= 9 – 5) for Opt (1,6); t = 1 is only t with 1 ≤ t < 6 - 4, and b 1 b 6 is AU allowable base pair so Opt (1,6) = max( 0, max(1+0+0) ) = 1 Opt (2,7) t = 2 is only t with 2 ≤ t < 7-4; but b 2 b 7 is CA – not an allowable base pair so no ts satisfy the conditions and Opt(2,7) = Opt (2,6) = 0 Next value??

16 Example: ACCGGUAGU Opt(3,8); t = 3 only possible t and b 3 b 8 is allowable base pair so Opt(3,8)= max( Opt(3,7), max ( 1 + Opt(3,0) + Opt(2,7) ) ) = max ( 0, max(1 + 0+0) ) = 1 Next value to calculate??

17 Example: ACCGGUAGU It is Opt(4,9) Now let k = 6 and do Opt( 1, 7) then Opt( 2,8), then Opt (3,9) …. Note for Opt (1,7) both t = 1 and t = 2 satisfy the inequality i ≤ t < j – 4 (i = 1 and j = 7); are both base pair allowable? This is a fully worked example in the text – check out more values to make sure you are following the algorithm


Download ppt "CSCI 256 Data Structures and Algorithm Analysis Lecture 16 Some slides by Kevin Wayne copyright 2005, Pearson Addison Wesley all rights reserved, and some."

Similar presentations


Ads by Google