Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Structures and Algorithm Analysis Lecture 15

Similar presentations


Presentation on theme: "Data Structures and Algorithm Analysis Lecture 15"— Presentation transcript:

1 Data Structures and Algorithm Analysis Lecture 15
CSCI 256 Data Structures and Algorithm Analysis Lecture 15 Some slides by Kevin Wayne copyright 2005, Pearson Addison Wesley all rights reserved, and some by Iker Gondra

2 Subset Sum Problem Subset Sum problem Given n items
Item i weighs wi > 0 Goal: find a subset of the items that has as large a sum of weights as possible without exceeding W Item Weight W = 10 1 1 2 2 3 5 4 6 5 7

3 Subset Sum Problem Greedy approach? Select items by increasing weight?
Select items by decreasing weight? Item Weight W = 10 1 1 2 2 3 5 4 5 5 6

4 Dynamic Programming: False Start
Let’s try the same strategy that we used for the Weighted Interval Scheduling problem Def: OPT(i) = max weight subset of items 1, …, i Case 1: OPT does not select item i OPT selects best of { 1, 2, …, i-1 } Case 2: OPT selects item i accepting item i does not immediately imply that we will have to reject other items (not overlapping as before) without knowing what other items were selected before i, we don't even know if we have enough room for item i Conclusion: Need more sub-problems! To find out the value for OPT(i) we not only need to know the value for OPT(i-1), but we also need to know the best solution we can get using a subset of the first i-1 items and total remaining allowed weight W - wi

5 Dynamic Programming: Adding a New Variable w (the value of W – wi)
Def: OPT(i, w) = max weight of all subsets of items from 1, …, i with weight limit w ( Ultimately, we need OPT(n,W) ) Example: {2, 4, 7, 10} Opt[2, 7] = Opt[3, 7] = Opt[3,12] = Opt[4,12] =

6 Dynamic Programming: Adding a New Variable, w (not to be confused with W a constant)
Case 1: OPT does not select item i OPT selects best of { 1, 2, …, i -1 } using weight limit w Case 2: OPT selects item i new weight limit = w – wi OPT selects best of { 1, 2, …, i -1 } using this new weight limit

7 Dynamic Programming Approach:
Want to design an algorithm that builds up a table of all Opt(i,w) values while computing each of them at most once! Use the recurrence

8 Subset-Sum Algorithm Subset_sum(n,W) Array M[0.,,,n, 0,…,W]
Initialize M[0,w] = 0 for each w = 1,…, W For i = 1,…,n (the items from 1 to i are considered) For w = 0,…, W Use recurrence relation previous page to compute M[i,w] Endfor Return M[n,W]

9 Subset-Sum Algorithm Build up the table ( 2 dimensional array) of solutions for M, (with rows labelled by i, row by row) computing the values of M[i,w] in O(1) time using the previous values of M[i-1,w] and M[i-1, w-wi]. Thus the running time is proportional to the number of entries, so is O(nW) To recover the optimal set S of entries, we trace back through the array M by a procedure similar to those developed in our previous DP problems.

10 Knapsack Problem (values and weights)
Given n objects and a "knapsack" Item i weighs wi > 0 kilograms and has value vi > 0 Knapsack has capacity of W kilograms (constraint) Goal: fill knapsack so as to maximize total value Ex: set consisting of items 3 and 4 { 3, 4 } has value 40 and weight 11 Greedy: repeatedly add item with maximum ratio vi / wi Ex: { 5, 2, 1 } achieves only value = 35  greedy not optimal Item Value Weight 1 1 1 2 6 2 W = 11 3 18 5 4 22 6 5 28 7

11 Knapsack Problem OPT(i, w)?
What is difference in strategy between this and the subset-sum problem??? Still need to constrain by weight w, but we are optimizing the values (the v’s)

12 Knapsack Problem Input: n, w1,…,wN, v1,…,vN for w = 0 to W
Opt[0, w] = 0 for i = 1 to n for w = 1 to W if (wi > w) Opt[i, w] = Opt[i-1, w] else Opt[i, w] = max {Opt[i-1, w], vi + Opt[i-1, w-wi ]} return Opt[n, W]

13 Knapsack Grid -- Try This!!!!
w { 1, 2 } { 1, 2, 3 } { 1, 2, 3, 4 } { 1 } { 1, 2, 3, 4, 5 } 1 2 3 4 5 6 7 8 9 10 11 i 1 Value 18 22 28 Weight 5 6 2 7 Item 3 4 Find OPT(4, 3) (First few rows from class) W = 11

14 String Similarity Consider a dictionary interface or a spell checker. How similar are two strings? ocurrance occurrence o c u r r a n c e - o c - u r r a n c e o c c u r r e n c e o c c u r r e n c e 5 mismatches, 1 gap 1 mismatch, 1 gap o c - u r r - a n c e o c c u r r e - n c e 0 mismatches, 3 gaps

15 String Similarity Dictionary interfaces and spell checkers not the most computationally intensive application for this type of problem Determining similarities among strings is one of the central computational problems facing molecular biologists today Strings arise very naturally in biology (e.g., an organism’s genome is divided up into giant linear DNA molecules known as chromosomes, think of a chromosome as an enormous linear tape containing a string over the alphabet {A, C, G, T}) A certain substring in the DNA of some organism may code for a certain kind of toxin. If we discover a very “similar” substring in the DNA of another organism, might be able to hypothesize without any experimentation that it codes for similar toxin

16 Edit Distance Applications
Basis for Unix diff Speech recognition Computational biology Edit distance [Levenshtein 1966, Needleman-Wunsch 1970] Gap penalty ; mismatch penalty pq Cost = sum of gap and mismatch penalties C T G A C C T A C C T - C T G A C C T A C C T C C T G A C T A C A T C C T G A C - T A C A T TC + GT + AG+ 2CA 2 + CA

17 Crossing: Given strings (x1,…xm) and (y1,…yn),
a crossing is two pairs ( xi,yj ), ( xi’,yj’ ) where i < i’ and j’ < j.

18 Sequence Alignment Def: An alignment M is a set of ordered pairs xi,yj such that each item occurs in at most one pair and there are no crossings Goal: Given two strings X = x1 x xm and Y = y1 y yn find alignment of minimum cost Ex: given: CTACCG vs. TACATG Find cost of several alignments, expressed in terms of the gap penalty  and the mismatch penalty pq

19 Sequence Alignment: Problem Structure
In the optimal alignment M of X = x1 x xm and Y = y1 y yn, either (xm, yn) M or (xm, yn) M. That is, either the last two symbols in the two strings are matched to each other, or they aren’t. By itself, is this fact enough to provide us with a DP solution? Lemma: In the optimal alignment M of X = x1 x xm and Y = y1 y yn. If (xm, yn)  M, then either xm or yn is not matched in M Proof (DONE IN CLASS) Show if they are both matched, you’d have a crossing

20 Sequence Alignment: Problem Structure
By contradiction: Suppose we have an optimal alignment M such that (xm, yn)  M, and both xm and yn are matched in M; since xm and yn are the last elements in the lists, xm is matched with yn’ for some n’ < n and yn is matched with xm’ for some m’ < m; thus we have pairs (m’, n) and (m,n’) with m’ < m but n > n’; this gives a crossing. Contradiction, crossings are not allowed in alignments We can use the Lemma to conclude:

21 Sequence Alignment: Problem Structure
In an optimal alignment M of X = x1 x xm and Y = y1 y yn, at least one of the following is true (xm, yn)  M; or xm is not matched; or yn is not matched Def: OPT(i, j) = min cost of aligning strings x1 x xi and y1 y yj. Write down final recurrence (DONE IN CLASS)

22 Sequence Alignment Def. OPT(i, j) = min cost of aligning strings x1 x xi and y1 y yj. Case 1: OPT matches xi-yj. pay mismatch for xi-yj + min cost of aligning two strings x1 x xi-1 and y1 y yj-1 Case 2a: OPT leaves xi unmatched. pay gap for xi and min cost of aligning x1 x xi-1 and y1 y yj Case 2b: OPT leaves yj unmatched. pay gap for yj and min cost of aligning x1 x xi and y1 y yj-1

23 Sequence Alignment: Algorithm
Sequence-Alignment(m, n, x1x2...xm, y1y2...yn, , ) { for i = 0 to m Opt[0, i] = i for j = 0 to n Opt[j, 0] = j for i = 1 to m for j = 1 to n Opt[i, j] = min([xi, yj] + Opt[i-1, j-1],  + Opt[i-1, j],  + Opt[i, j-1]) return Opt[m, n] } Analysis: Array Opt[i,j] has O(mn) entries and at worst we spend constant time at each; hence: O(mn) time and space

24 English words or sentences: m, n  10
Computational biology: m = n = 100, billion operations is OK, but array with 10 billion entries? Luckily a reduction to O(m+n) space (with O(mn) running time) is possible using a combination of divide and conquer and dynamic programming techniques; the interested reader can see Ch 6.7.


Download ppt "Data Structures and Algorithm Analysis Lecture 15"

Similar presentations


Ads by Google