Presentation is loading. Please wait.

Presentation is loading. Please wait.

Prof. Swarat Chaudhuri COMP 482: Design and Analysis of Algorithms Spring 2012 Lecture 16.

Similar presentations


Presentation on theme: "Prof. Swarat Chaudhuri COMP 482: Design and Analysis of Algorithms Spring 2012 Lecture 16."— Presentation transcript:

1 Prof. Swarat Chaudhuri COMP 482: Design and Analysis of Algorithms Spring 2012 Lecture 16

2 6.3 Segmented Least Squares

3 3 Segmented Least Squares Least squares. n Foundational problem in statistic and numerical analysis. n Given n points in the plane: (x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ). n Find a line y = ax + b that minimizes the sum of the squared error: Solution. Calculus  min error is achieved when x y

4 4 Segmented Least Squares Segmented least squares. n Points lie roughly on a sequence of several line segments. n Given n points in the plane (x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ) with n x 1 < x 2 <... < x n, find a sequence of lines that minimizes f(x). Q. What's a reasonable choice for f(x) to balance accuracy and parsimony? x y goodness of fit number of lines

5 5 Segmented Least Squares Segmented least squares. n Points lie roughly on a sequence of several line segments. n Given n points in the plane (x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ) with n x 1 < x 2 <... < x n, find a sequence of lines that minimizes: – the sum of the sums of the squared errors E in each segment – the number of lines L n Tradeoff function: E + c L, for some constant c > 0. x y

6 6 Dynamic Programming: Multiway Choice Notation. n OPT(j) = minimum cost for points p 1, p i+1,..., p j. n e(i, j) = minimum sum of squares for points p i, p i+1,..., p j. To compute OPT(j): n Last segment uses points p i, p i+1,..., p j for some i. n Cost = e(i, j) + c + OPT(i-1).

7 7 Segmented Least Squares: Algorithm Running time. O(n 3 ). n Bottleneck = computing e(i, j) for O(n 2 ) pairs, O(n) per pair using previous formula. INPUT: n, p 1,…,p N, c Segmented-Least-Squares() { M[0] = 0 for j = 1 to n for i = 1 to j compute the least square error e ij for the segment p i,…, p j for j = 1 to n M[j] = min 1  i  j (e ij + c + M[i-1]) return M[n] } can be improved to O(n 2 ) by pre-computing various statistics

8 6.6 Sequence Alignment

9 9 String Similarity How similar are two strings? n ocurrance n occurrence ocurrance ccurrenceo - ocurrnce ccurrnceo --a e- ocurrance ccurrenceo - 5 mismatches, 1 gap 1 mismatch, 1 gap 0 mismatches, 3 gaps

10 10 Applications. n Basis for Unix diff. n Speech recognition. n Computational biology. Edit distance. [Levenshtein 1966, Needleman-Wunsch 1970] n Gap penalty  ; mismatch penalty  pq. n Cost = sum of gap and mismatch penalties. 2  +  CA CGACCTACCT CTGACTACAT TGACCTACCT CTGACTACAT - T C C C  TC +  GT +  AG + 2  CA - Edit Distance

11 11 Goal: Given two strings X = x 1 x 2... x m and Y = y 1 y 2... y n find alignment of minimum cost. Def. An alignment M is a set of ordered pairs x i -y j such that each item occurs in at most one pair and no crossings. Def. The pair x i -y j and x i' -y j' cross if i j'. Ex: CTACCG vs. TACATG. Sol: M = x 2 -y 1, x 3 -y 2, x 4 -y 3, x 5 -y 4, x 6 -y 6. Sequence Alignment CTACC- TACAT- G G y1y1 y2y2 y3y3 y4y4 y5y5 y6y6 x2x2 x3x3 x4x4 x5x5 x1x1 x6x6

12 12 Sequence Alignment: Problem Structure Def. OPT(i, j) = min cost of aligning strings x 1 x 2... x i and y 1 y 2... y j. n Case 1: OPT matches x i -y j. – pay mismatch for x i -y j + min cost of aligning two strings x 1 x 2... x i-1 and y 1 y 2... y j-1 n Case 2a: OPT leaves x i unmatched. – pay gap for x i and min cost of aligning x 1 x 2... x i-1 and y 1 y 2... y j n Case 2b: OPT leaves y j unmatched. – pay gap for y j and min cost of aligning x 1 x 2... x i and y 1 y 2... y j-1

13 13 Sequence Alignment: Algorithm Analysis.  (mn) time and space. English words or sentences: m, n  10. Computational biology: m = n = 100,000. 10 billions ops OK, but 10GB array? (Solution: sequence alignment in linear space) Sequence-Alignment(m, n, x 1 x 2...x m, y 1 y 2...y n, ,  ) { for i = 0 to m M[0, i] = i  for j = 0 to n M[j, 0] = j  for i = 1 to m for j = 1 to n M[i, j] = min(  [x i, y j ] + M[i-1, j-1],  + M[i-1, j],  + M[i, j-1]) return M[m, n] }

14 Q1: Least-cost concatenation You have a DNA sequence A of length n; you also have a “library” of shorter strings each of length m < n. Your goal is to generate a concatenation C of strings B 1,…,B k in the library that the cost of aligning C to A is as low as possible. You can assume a gap cost δ and a mismatch cost α pq for determining the cost of alignment. 14

15 Answer Let A[x:y] denote the substring of A consisting of its symbols from position x to position y. Let c(x,y) be the cost of optimally aligning A[x:y] to any string in the library. Let OPT(j) be the alignment cost for the optimal solution on the string A[1:j]. OPT(j) = min t < j c(t,j) + OPT(t – 1) for j ≥ 1 OPT(0) = 0 [Essentially, t is a “breakpoint” where you choose to use a new library component.] 15

16 6.4 Knapsack Problem

17 17 Q2: Knapsack Problem Knapsack problem. n Given n objects and a "knapsack." n Item i weighs w i > 0 kilograms and has value v i > 0. n Knapsack has capacity of W kilograms. n Goal: fill knapsack so as to maximize total value. Ex: { 3, 4 } has value 40. Greedy: repeatedly add item with maximum ratio v i / w i. Ex: { 5, 2, 1 } achieves only value = 35  greedy not optimal. 1 Value 18 22 28 1 Weight 5 6 62 7 Item 1 3 4 5 2 W = 11

18 18 Dynamic Programming: False Start Def. OPT(i) = max profit subset of items 1, …, i. n Case 1: OPT does not select item i. – OPT selects best of { 1, 2, …, i-1 } n Case 2: OPT selects item i. – accepting item i does not immediately imply that we will have to reject other items – without knowing what other items were selected before i, we don't even know if we have enough room for i Conclusion. Need more sub-problems!

19 19 Dynamic Programming: Adding a New Variable Def. OPT(i, w) = max profit subset of items 1, …, i with weight limit w. n Case 1: OPT does not select item i. – OPT selects best of { 1, 2, …, i-1 } using weight limit w n Case 2: OPT selects item i. – new weight limit = w – w i – OPT selects best of { 1, 2, …, i–1 } using this new weight limit

20 20 Input: n, w 1,…,w N, v 1,…,v N for w = 0 to W M[0, w] = 0 for i = 1 to n for w = 1 to W if (w i > w) M[i, w] = M[i-1, w] else M[i, w] = max {M[i-1, w], v i + M[i-1, w-w i ]} return M[n, W] Knapsack Problem: Bottom-Up Knapsack. Fill up an n-by-W array.

21 21 Knapsack Algorithm n + 1 1 Value 18 22 28 1 Weight 5 6 62 7 Item 1 3 4 5 2  { 1, 2 } { 1, 2, 3 } { 1, 2, 3, 4 } { 1 } { 1, 2, 3, 4, 5 } 0 0 0 0 0 0 0 1 0 1 1 1 1 1 2 0 6 6 6 1 6 3 0 7 7 7 1 7 4 0 7 7 7 1 7 5 0 7 18 1 6 0 7 19 22 1 7 0 7 24 1 28 8 0 7 25 28 1 29 9 0 7 25 29 1 34 10 0 7 25 29 1 34 11 0 7 25 40 1 W + 1 W = 11 OPT: { 4, 3 } value = 22 + 18 = 40

22 22 Knapsack Problem: Running Time Running time.  (n W). n Not polynomial in input size! n "Pseudo-polynomial." n Decision version of Knapsack is NP-complete. [Chapter 8] Knapsack approximation algorithm. There exists a polynomial algorithm that produces a feasible solution that has value within 0.01% of optimum. [Section 11.8]

23 Q3: Longest palindromic subsequence Give an algorithm to find the longest subsequence of a given string A that is a palindrome. “amantwocamelsacrazyplanacanalpanama” 23

24 Q3-a: Palindromes (contd.) Every string can be decomposed into a sequence of palindromes. Give an efficient algorithm to compute the smallest number of palindromes that makes up a given string. 24


Download ppt "Prof. Swarat Chaudhuri COMP 482: Design and Analysis of Algorithms Spring 2012 Lecture 16."

Similar presentations


Ads by Google