Chapter 6 Dynamic Programming

Slides:



Advertisements
Similar presentations
Analysis of Algorithms
Advertisements

1 Chapter 6 Dynamic Programming Slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved.
Overview What is Dynamic Programming? A Sequence of 4 Steps
9/27/10 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne Adam Smith Algorithm Design and Analysis L ECTURE 14 Dynamic.
Chapter 7 Dynamic Programming.
CSCI 256 Data Structures and Algorithm Analysis Lecture 13 Some slides by Kevin Wayne copyright 2005, Pearson Addison Wesley all rights reserved, and some.
Outline 1. General Design and Problem Solving Strategies 2. More about Dynamic Programming – Example: Edit Distance 3. Backtracking (if there is time)
CSCI-256 Data Structures & Algorithm Analysis Lecture Note: Some slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved. 20.
DYNAMIC PROGRAMMING. 2 Algorithmic Paradigms Greedy. Build up a solution incrementally, myopically optimizing some local criterion. Divide-and-conquer.
Dynamic Programming1. 2 Outline and Reading Matrix Chain-Product (§5.3.1) The General Technique (§5.3.2) 0-1 Knapsack Problem (§5.3.3)
Sequence Alignment Variations Computing alignments using only O(m) space rather than O(mn) space. Computing alignments with bounded difference Exclusion.
CSC401 – Analysis of Algorithms Lecture Notes 12 Dynamic Programming
CSE 421 Algorithms Richard Anderson Lecture 16 Dynamic Programming.
Dynamic Programming1. 2 Outline and Reading Matrix Chain-Product (§5.3.1) The General Technique (§5.3.2) 0-1 Knapsack Problem (§5.3.3)
7 -1 Chapter 7 Dynamic Programming Fibonacci Sequence Fibonacci sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, … F i = i if i  1 F i = F i-1 + F i-2 if.
UNC Chapel Hill Lin/Manocha/Foskey Optimization Problems In which a set of choices must be made in order to arrive at an optimal (min/max) solution, subject.
UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter 11: Core String Edits.
1 prepared from lecture material © 2004 Goodrich & Tamassia COMMONWEALTH OF AUSTRALIA Copyright Regulations 1969 WARNING This material.
1 Algorithmic Paradigms Greed. Build up a solution incrementally, myopically optimizing some local criterion. Divide-and-conquer. Break up a problem into.
Lecture 7: Greedy Algorithms II
Sequence similarity. Motivation Same gene, or similar gene Suffix of A similar to prefix of B? Suffix of A similar to prefix of B..Z? Longest similar.
Dynamic Programming Adapted from Introduction and Algorithms by Kleinberg and Tardos.
Dynamic Programming. Well known algorithm design techniques:. –Divide-and-conquer algorithms Another strategy for designing algorithms is dynamic programming.
9/10/10 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne Adam Smith Algorithm Design and Analysis L ECTURE 13 Dynamic.
ADA: 7. Dynamic Prog.1 Objective o introduce DP, its two hallmarks, and two major programming techniques o look at two examples: the fibonacci.
1 Algorithmic Paradigms Greed. Build up a solution incrementally, myopically optimizing some local criterion. Divide-and-conquer. Break up a problem into.
Dynamic programming 叶德仕
CSCI 256 Data Structures and Algorithm Analysis Lecture 14 Some slides by Kevin Wayne copyright 2005, Pearson Addison Wesley all rights reserved, and some.
Final Exam Review Final exam will have the similar format and requirements as Mid-term exam: Closed book, no computer, no smartphone Calculator is Ok Final.
CSC401: Analysis of Algorithms CSC401 – Analysis of Algorithms Chapter Dynamic Programming Objectives: Present the Dynamic Programming paradigm.
CSCI-256 Data Structures & Algorithm Analysis Lecture Note: Some slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved. 17.
Minimum Edit Distance Definition of Minimum Edit Distance.
1 CPSC 320: Intermediate Algorithm Design and Analysis July 28, 2014.
Prof. Swarat Chaudhuri COMP 482: Design and Analysis of Algorithms Spring 2012 Lecture 17.
Prof. Swarat Chaudhuri COMP 482: Design and Analysis of Algorithms Spring 2012 Lecture 16.
1 Chapter 6 Dynamic Programming. 2 Algorithmic Paradigms Greedy. Build up a solution incrementally, optimizing some local criterion. Divide-and-conquer.
Topic 25 Dynamic Programming "Thus, I thought dynamic programming was a good name. It was something not even a Congressman could object to. So I used it.
1 Chapter 6 Dynamic Programming Slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved.
1 Chapter 6-1 Dynamic Programming Slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved.
Sequence Alignment Tanya Berger-Wolf CS502: Algorithms in Computational Biology January 25, 2011.
Weighted Minimum Edit Distance
1 Dynamic Programming Topic 07 Asst. Prof. Dr. Bunyarit Uyyanonvara IT Program, Image and Vision Computing Lab. School of Information and Computer Technology.
Optimization Problems In which a set of choices must be made in order to arrive at an optimal (min/max) solution, subject to some constraints. (There may.
Dynamic Programming1. 2 Outline and Reading Matrix Chain-Product (§5.3.1) The General Technique (§5.3.2) 0-1 Knapsack Problem (§5.3.3)
Computer Sciences Department1.  Property 1: each node can have up to two successor nodes (children)  The predecessor node of a node is called its.
Dynamic Programming.  Decomposes a problem into a series of sub- problems  Builds up correct solutions to larger and larger sub- problems  Examples.
Instructor Neelima Gupta Instructor: Ms. Neelima Gupta.
Chapter 7 Dynamic Programming 7.1 Introduction 7.2 The Longest Common Subsequence Problem 7.3 Matrix Chain Multiplication 7.4 The dynamic Programming Paradigm.
CS38 Introduction to Algorithms Lecture 10 May 1, 2014.
1 Chapter 6 Dynamic Programming Slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved.
CSCI-256 Data Structures & Algorithm Analysis Lecture Note: Some slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved. 18.
CSCI-256 Data Structures & Algorithm Analysis Lecture Note: Some slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved. 21.
CSCI 256 Data Structures and Algorithm Analysis Lecture 16 Some slides by Kevin Wayne copyright 2005, Pearson Addison Wesley all rights reserved, and some.
9/27/10 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne Adam Smith Algorithm Design and Analysis L ECTURE 16 Dynamic.
TU/e Algorithms (2IL15) – Lecture 4 1 DYNAMIC PROGRAMMING II
CS502: Algorithms in Computational Biology
Chapter 6 Dynamic Programming
Dynamic programming 叶德仕
Definition of Minimum Edit Distance
Prepared by Chen & Po-Chuan 2016/03/29
Chapter 6 Dynamic Programming
Chapter 6 Dynamic Programming
Chapter 6 Dynamic Programming
CSE 589 Applied Algorithms Spring 1999
Chapter 6 Dynamic Programming
Chapter 6 Dynamic Programming
Dynamic Programming II DP over Intervals
Data Structures and Algorithm Analysis Lecture 15
Dynamic Programming.
Presentation transcript:

Chapter 6 Dynamic Programming

Algorithmic Paradigms Greed. Build up a solution incrementally, myopically optimizing some local criterion. Divide-and-conquer. Break up a problem into two sub-problems, solve each sub-problem independently, and combine solution to sub-problems to form solution to original problem. Dynamic programming. Break up a problem into a series of overlapping sub-problems, and build up solutions to larger and larger sub-problems. overlapping sub-problem = sub-problem whose results can be reused several times

Dynamic Programming History Bellman. Pioneered the systematic study of dynamic programming in the 1950s. Etymology. Dynamic programming = planning over time. Secretary of Defense was hostile to mathematical research. Bellman sought an impressive name to avoid confrontation. "it's impossible to use dynamic in a pejorative sense" "something not even a Congressman could object to" Reference: Bellman, R. E. Eye of the Hurricane, An Autobiography.

Dynamic Programming Applications Areas Bioinformatics. Control theory. Information theory. Operations research (e.g. knapsack problem) Computer science: theory, graphics, AI, systems, …. Some famous dynamic programming algorithms Viterbi’s algorithm for hidden Markov models. Unix diff for comparing two files. Smith-Waterman for sequence alignment. Bellman-Ford for shortest path routing in networks. Cocke-Kasami-Younger for parsing context free grammars.

6.4 Knapsack Problem 1-24,26-37

Knapsack Problem Knapsack problem. Given n objects and a "knapsack." Item i weighs wi > 0 kilograms and has value vi > 0. Knapsack has capacity of W kilograms. Goal: fill knapsack so as to maximize total value. Ex: { 3, 4 } has value 40. Greedy: repeatedly add item with maximum ratio vi / wi. Ex: { 5, 2, 1 } achieves only value = 35  greedy not optimal. Item Value Weight 1 1 1 2 6 2 W = 11 3 18 5 4 22 6 5 28 7

Dynamic Programming: False Start Def. OPT(i) = max profit subset of items 1, …, i. Case 1: OPT does not select item i. OPT selects best of { 1, 2, …, i-1 } Case 2: OPT selects item i. accepting item i does not immediately imply that we will have to reject other items without knowing what other items were selected before i, we don't even know if we have enough room for i Conclusion. Need more sub-problems!

Dynamic Programming: Adding a New Variable Def. OPT(i, w) = max profit subset of items 1, …, i with weight limit w. Case 1: OPT does not select item i. OPT selects best of { 1, 2, …, i-1 } using weight limit w Case 2: OPT selects item i. new weight limit = w – wi OPT selects best of { 1, 2, …, i–1 } using this new weight limit

Knapsack Problem: Bottom-Up Knapsack. Fill up an n-by-W array. Input: n, w1,…,wN, v1,…,vN for w = 0 to W M[0, w] = 0 for i = 1 to n for w = 1 to W if (wi > w) M[i, w] = M[i-1, w] else M[i, w] = max {M[i-1, w], vi + M[i-1, w-wi ]} return M[n, W]

Knapsack Algorithm  { 1, 2 } { 1, 2, 3 } { 1, 2, 3, 4 } { 1 } W + 1  { 1, 2 } { 1, 2, 3 } { 1, 2, 3, 4 } { 1 } { 1, 2, 3, 4, 5 } 1 2 6 3 7 4 5 18 19 22 24 28 8 25 29 9 34 10 11 40 n + 1 1 Value 18 22 28 Weight 5 6 2 7 Item 3 4 OPT: { 4, 3 } value = 22 + 18 = 40 W = 11

Knapsack Problem: Running Time Running time. O (n W) How much storage is needed? The table we used above is of size O(n W). For example, if W = 1000000 and n = 100, we need a total of 400 Mbytes (106 x 100 x 4 bytes). This is too high. The space requirement can be reduced: once row j has been computed, we don’t need rows j – 1 and below. Space reduces to 2W = 8 Mbytes in the above case.

6.6 Sequence Alignment 1-24,26-37

String Similarity How similar are two strings? ocurrance Occurrence o - o c c u r r e n c e 5 mismatches, 1 gap o c - u r r a n c e o c c u r r e n c e 1 mismatch, 1 gap o c - u r r - a n c e o c c u r r e - n c e 0 mismatches, 3 gaps

String Similarity How similar are two strings? ocurrance Occurrence Key concept used by Search Engines to correct spelling errors. You get a response: “Did you mean xxxxx ?” o c u r r a n c e - o c c u r r e n c e 5 mismatches, 1 gap o c - u r r a n c e o c c u r r e n c e 1 mismatch, 1 gap o c - u r r - a n c e o c c u r r e - n c e 0 mismatches, 3 gaps 14

Google search for “allgorythm” When a word is not in the dictionary, the search engine searches through all words in data base and finds the one that is closest to “allgorythm” and asks if you probably meant it.

Edit Distance Applications. Basis for Unix diff: determine how similar two files are. Plagiarism detection: “document similarity” is a good indicator of whether a document was plagiarized. Speech recognition. Sound pattern corresponding to a spoken word is matched against the sound pattern corresponding to various standard spoken words in a data base. Computational biology. Levenshtein introduced concept of edit-distance; Needleman-Wunsch first applied it to aligning biological sequences; our algorithm is closest in spirit to Smith-Waterman (but their algorithm is for local alignment) Presumably alpha_pp = 0, but assumption not needed (could be a profit instead of penalty) bonus application: spam filter - compare message with known spam messages

Plagiarism checker – a simple test Here a paragraph from Chapter 6 of the text (current chapter on Dynamic Programming) and made some small changes: “we now turn to a more powerful and subtle design technique, dynamic programming. It'll be simpler to day precisely what characterizes dynamic programming after we have seen it in action, but the basic idea is drawn from the intuition behind divide-and-conquer ad is essentially the opposite of the greedy strategy: one implicitly explores the space of all possible solutions, by carefully decomposing things into a series of subproblems, and then building up correct solutions to larger and larger subproblems. In a way, we can thus view dynamic programming as operating dangerously close to the edge of brute-force search.” Let us try if the detector can find the original source: http://www.dustball.com/cs/plagiarism.checker/

Edit Distance Edit distance. [Levenshtein 1966, Needleman-Wunsch 1970] Gap penalty ; mismatch penalty pq. Cost = sum of gap and mismatch penalties. Levenshtein introduced concept of edit-distance; Needleman-Wunsch first applied it to aligning biological sequences; our algorithm is closest in spirit to Smith-Waterman (but their algorithm is for local alignment) Presumably alpha_pp = 0, but assumption not needed (could be a profit instead of penalty) bonus application: spam filter - compare message with known spam messages C T G A C C T A C C T - C T G A C C T A C C T C C T G A C T A C A T C C T G A C - T A C A T TC + GT + AG+ 2CA 2 + CA 21

Edit Distance In the simplest model we will first study, consider just two operations: insert and delete each costing one unit. Example: What is DIS(“abbaabab”, “bbabaaba”)? abbaabab -> bbaabab (delete a) -> bbabab (delete a) -> bbabaab (insert a) -> bbabaaba (insert a) DIS(abbaabab, bbabaaba) <= 4. Is there a way to perform this transformation with 3 or fewer edit operations? Levenshtein introduced concept of edit-distance; Needleman-Wunsch first applied it to aligning biological sequences; our algorithm is closest in spirit to Smith-Waterman (but their algorithm is for local alignment) Presumably alpha_pp = 0, but assumption not needed (could be a profit instead of penalty) bonus application: spam filter - compare message with known spam messages 22

Edit distance: Problem Structure Def. DIS(i, j) = min edit distance between x1 x2 . . . xi and y1 y2 . . . yj. Case 1: xi = yj in this case, DIS(i, j) = DIS(i-1, j-1). Case 2: xi = yj. Now there are two choices. Either delete xi and x1 x2 . . . xi-1 to y1 y2 . . . yj OR convert x1 x2 . . . xi and y1 y2 . . . yj-1 and insert yj. 23

Extending the model to include substitution Now add another operation: insert (i, c)  insert c as the i-th symbol (cost = 1) delete (i)  delete the i-th symbol (cost = 1) change(i, c, d)  change i-th symbol from c to d (cost = 1.5) Definition of DIS(i, j) :

Alignment problem in molecular biology Given two fragments F1 and F2 of DNA, we want to know how closely related they are. (Closeness is an indicator of how likely is it that one evolved from the other.) This kind of analysis plays a critical role in molecular biology. DNA sequences evolve by insertions, deletions and mutations. Thus the problem is almost the same as the one we studied in the previous slide.

Sequence Alignment Goal: Given two strings X = x1 x2 . . . xm and Y = y1 y2 . . . yn find alignment of minimum cost. Def. An alignment M is a set of ordered pairs xi-yj such that each item occurs in at most one pair and no crossings. Def. The pair xi-yj and xi'-yj' cross if i < i', but j > j'. Ex: CTACCG vs. TACATG. Sol: M = x2-y1, x3-y2, x4-y3, x5-y4, x6-y6. x1 x2 x3 x4 x5 x6 C T A C C - G - T A C A T G y1 y2 y3 y4 y5 y6

Sequence Alignment: Problem Structure Def. OPT(i, j) = min cost of aligning strings x1 x2 . . . xi and y1 y2 . . . yj. Case 1: OPT matches xi-yj. pay mismatch for xi-yj + min cost of aligning two strings x1 x2 . . . xi-1 and y1 y2 . . . yj-1 Case 2a: OPT leaves xi unmatched. pay gap for xi and min cost of aligning x1 x2 . . . xi-1 and y1 y2 . . . yj Case 2b: OPT leaves yj unmatched. pay gap for yj and min cost of aligning x1 x2 . . . xi and y1 y2 . . . yj-1

Sequence Alignment: Algorithm Analysis. (mn) time and space. Computational biology: m = n = 105 or more is common. 10 billions ops OK (may take a few minutes), but 10GB array? Sequence-Alignment(m, n, x1x2...xm, y1y2...yn, , ) { for i = 0 to m M[0, i] = i for j = 0 to n M[j, 0] = j for i = 1 to m for j = 1 to n M[i, j] = min([xi, yj] + M[i-1, j-1],  + M[i-1, j],  + M[i, j-1]) return M[m, n] }

Problem 6.4 (HW # 4, Problem 1) Input: NY[j], j = 1, 2, …, n and SF[j], j = 1, 2, …, n and moving cost k (= 10). Output: optimum cost of operating the business for n months. Solution idea: Define OPT(j, NY) as the optimal cost of operating the business for the first j months under the condition that the operation was located in NY in the j-th month and similarly for OPT(j, SF). Now write a recurrence formula for OPT(j, NY) in terms of OPT(j-1, NY) and OPT(j-1, SF) and similarly for OPT(j, SF). From this it is easy to compute OPT(j, NY) and OT(j, SF).

6.5 RNA Secondary Structure 1-24,26-37

RNA Secondary Structure RNA. String B = b1b2bn over alphabet { A, C, G, U }. Secondary structure. RNA is single-stranded so it tends to loop back and form base pairs with itself. This structure is essential for understanding behavior of molecule. C A Ex: GUCGAUUGAGCGAAUGUAACAACGUGGCUACGGCGAGA A A A U G C C G U A A G G U A U U A G A C G C U G C G C G A G C G A U G complementary base pairs: A-U, C-G 31

RNA Secondary Structure Secondary structure. A set of pairs S = { (bi, bj) } that satisfy: [Watson-Crick.] S is a matching and each pair in S is a Watson-Crick complement: A-U, U-A, C-G, or G-C. [No sharp turns.] The ends of each pair are separated by at least 4 intervening bases. If (bi, bj)  S, then i < j - 4. [Non-crossing.] If (bi, bj) and (bk, bl) are two pairs in S, then we cannot have i < k < j < l. Free energy. Usual hypothesis is that an RNA molecule will form the secondary structure with the optimum total free energy. Goal. Given an RNA molecule B = b1b2bn, find a secondary structure S that maximizes the number of base pairs. approximate by number of base pairs 32

RNA Secondary Structure: Examples G G G G G G G C U C U C G C G C U A U A U A G U A U A U A base pair A U G U G G C C A U A U G G G G C A U A G U U G G C C A U 4 ok sharp turn crossing 34

RNA Secondary Structure: Sub-problems First attempt. OPT(j) = maximum number of base pairs in a secondary structure of the substring b1b2bj. Difficulty. Results in two sub-problems. Finding secondary structure in: b1b2bt-1. Finding secondary structure in: bt+1bt+2bn-1. match bt and bn 1 t n OPT(t-1) need more sub-problems 36

Dynamic Programming formulation Notation. OPT(i, j) = maximum number of base pairs in a secondary structure of the substring bibi+1bj. Case 1. If i  j - 4. OPT(i, j) = 0 by no-sharp turns condition. Case 2. Base bj is not involved in a pair. OPT(i, j) = OPT(i, j-1) Case 3. Base bj pairs with bt for some i  t < j - 4. non-crossing constraint decouples resulting sub-problems OPT(i, j) = 1 + maxt { OPT(i, t-1) + OPT(t+1, j-1) } take max over t such that i  t < j-4 and bt and bj are Watson-Crick complements 37

Dynamic Programming - algorithm Q. What order to solve the sub-problems? A. Do shortest intervals first. Running time. O(n3). RNA(b1,…,bn) { for k = 5, 6, …, n-1 for i = 1, 2, …, n-k j = i + k Compute M[i, j] return M[1, n] } 4 3 i 2 1 6 7 8 9 using recurrence j 38