Presentation is loading. Please wait.

Presentation is loading. Please wait.

Greedy Algorithms CSc 4520/6520 Fall 2013 Problems Considered Activity Selection Problem Knapsack Problem – 0 – 1 Knapsack – Fractional Knapsack Huffman.

Similar presentations


Presentation on theme: "Greedy Algorithms CSc 4520/6520 Fall 2013 Problems Considered Activity Selection Problem Knapsack Problem – 0 – 1 Knapsack – Fractional Knapsack Huffman."— Presentation transcript:

1

2 Greedy Algorithms CSc 4520/6520 Fall 2013

3 Problems Considered Activity Selection Problem Knapsack Problem – 0 – 1 Knapsack – Fractional Knapsack Huffman Codes

4 CS3381 Des & Anal of Alg (2001-2002 SemA) City Univ of HK / Dept of CS / Helena Wong 5. Greedy Algorithms - 3 http://www.cs.cityu.edu.hk/~helena Greedy Algorithms 2 techniques for solving optimization problems: 1. Dynamic Programming 2. Greedy Algorithms (“Greedy Strategy”) Greedy Approach can solve these problems: For the optimization problems: Dynamic Programming can solve these problems: For some optimization problems, Dynamic Programming is “overkill” Greedy Strategy is simpler and more efficient.

5 CS3381 Des & Anal of Alg (2001-2002 SemA) City Univ of HK / Dept of CS / Helena Wong 5. Greedy Algorithms - 4 http://www.cs.cityu.edu.hk/~helena Activity-Selection Problem For a set of proposed activities that wish to use a lecture hall, select a maximum-size subset of “compatible activities”. Set of activities: S={a 1,a 2,…a n } Duration of activity a i :[start_time i, finish_time i ) Activities sorted in increasing order of finish time: i1234567891011 start_time i 130535688212 finish_time i 4567891011121314

6 CS3381 Des & Anal of Alg (2001-2002 SemA) City Univ of HK / Dept of CS / Helena Wong 5. Greedy Algorithms - 5 http://www.cs.cityu.edu.hk/~helena Activity-Selection Problem i1234567891011 start_time i 130535688212 finish_time i 4567891011121314 Compatible activities: {a 3, a 9, a 11 }, {a 1,a 4,a 8,a 11 }, {a 2,a 4,a 9,a 11 }

7 CS3381 Des & Anal of Alg (2001-2002 SemA) City Univ of HK / Dept of CS / Helena Wong 5. Greedy Algorithms - 6 http://www.cs.cityu.edu.hk/~helena Activity-Selection Problem Dynamic Programming Solution (Step 1) Step 1. Characterize the structure of an optimal solution. S: i1234567891011(=n) start_time i 130535688212 finish_time i 4567891011121314 eg Definition: S ij ={a k  S: finish_time i  start_time k <finish_time k  start_time j } Let S i,j be the set of activities that start after a i finishes and finish before a j starts. eg. S 2,11 =

8 CS3381 Des & Anal of Alg (2001-2002 SemA) City Univ of HK / Dept of CS / Helena Wong 5. Greedy Algorithms - 7 http://www.cs.cityu.edu.hk/~helena Activity-Selection Problem Dynamic Programming Solution (Step 1) S: i1234567891011(=n) start_time i 130535688212 finish_time i 4567891011121314 Add fictitious activities: a 0 and a n+1 : S: i01234567891011 12 start_time i 130535688212  finish_time i 04567891011121314 ie. S 0,n+1 ={a 1,a 2,a 3,a 4,a 5,a 6,a 7,a 8,a 9,a 10,a 11 } = S Note: If i>=j then S i,j =Ø

9 CS3381 Des & Anal of Alg (2001-2002 SemA) City Univ of HK / Dept of CS / Helena Wong 5. Greedy Algorithms - 8 http://www.cs.cityu.edu.hk/~helena Substructure: Activity-Selection Problem Dynamic Programming Solution (Step 1) Suppose a solution to S i,j includes activity a k, then,2 subproblems are generated: S i,k, S k,j The problem: For a set of proposed activities that wish to use a lecture hall, select a maximum- size subset of “compatible activities Select a maximum-size subset of compatible activities from S 0,n+1. = The maximum-size subset A i,j of compatible activities is: A i,j =A i,k U {a k } U A k,j Suppose a solution to S 0,n+1 contains a 7, then, 2 subproblems are generated: S 0,7 and S 7,n+1

10 CS3381 Des & Anal of Alg (2001-2002 SemA) City Univ of HK / Dept of CS / Helena Wong 5. Greedy Algorithms - 9 http://www.cs.cityu.edu.hk/~helena Activity-Selection Problem Dynamic Programming Solution (Step 2) Step 2. Recursively define an optimal solution Let c[i,j] = number of activities in a maximum-size subset of compatible activities in S i,j. If i>=j, then S i,j =Ø, ie. c[i,j]=0. 0if S i,j =Ø Max i<k<j {c[i,k] + c[k,j] + 1} if S i,j  Ø c(i,j) = Step 3. Compute the value of an optimal solution in a bottom-up fashion Step 4. Construct an optimal solution from computed information.

11 CS3381 Des & Anal of Alg (2001-2002 SemA) City Univ of HK / Dept of CS / Helena Wong 5. Greedy Algorithms - 10 http://www.cs.cityu.edu.hk/~helena Activity-Selection Problem Greedy Strategy Solution Consider any nonempty subproblem S i,j, and let a m be the activity in S i,j with the earliest finish time. eg. S 2,11 ={a 4,a 6,a 7,a 8,a 9 } Among {a 4,a 6,a 7,a 8,a 9 }, a 4 will finish earliest 1. A 4 is used in the solution 2. After choosing A 4, there are 2 subproblems: S 2,4 and S 4,11. But S 2,4 is empty. Only S 4, 11 remains as a subproblem. Then, 1. A m is used in some maximum- size subset of compatible activities of S i,j. 2.The subproblem S i,m is empty, so that choosing a m leaves the subproblem S m,j as the only one that may be nonempty. 0if S i,j =Ø Max i<k<j {c[i,k]+c[k,j]+1}if S i,j  Ø c(i,j) =

12 CS3381 Des & Anal of Alg (2001-2002 SemA) City Univ of HK / Dept of CS / Helena Wong 5. Greedy Algorithms - 11 http://www.cs.cityu.edu.hk/~helena Activity-Selection Problem Greedy Strategy Solution That is, To solve S 0,12, we select a 1 that will finish earliest, and solve for S 1,12. To solve S 1,12, we select a 4 that will finish earliest, and solve for S 4,12. To solve S 4,12, we select a 8 that will finish earliest, and solve for S 8,12. … Greedy Choices (Locally optimal choice) To leave as much opportunity as possible for the remaining activities to be scheduled. Solve the problem in a top-down fashion Hence, to solve the S i,j : 1. Choose the activity a m with the earliest finish time. 2.Solution of S i,j = {a m } U Solution of subproblem S m,j

13 CS3381 Des & Anal of Alg (2001-2002 SemA) City Univ of HK / Dept of CS / Helena Wong 5. Greedy Algorithms - 12 http://www.cs.cityu.edu.hk/~helena Activity-Selection Problem Greedy Strategy Solution Recursive-Activity-Selector(i,j) 1m = i+1 // Find first activity in S i,j 2while m < j and start_time m < finish_time i 3do m = m + 1 4if m < j 5then return {a m } U Recursive-Activity-Selector(m,j) 6else return Ø Order of calls: Recursive-Activity-Selector(0,12) Recursive-Activity-Selector(1,12) Recursive-Activity-Selector(4,12) Recursive-Activity-Selector(8,12) Recursive-Activity-Selector(11,12) m=2 Okay m=3 Okay m=4 break the loop Ø {11} {8,11} 4 {4,8,11} 4 {1,4,8,11}

14 CS3381 Des & Anal of Alg (2001-2002 SemA) City Univ of HK / Dept of CS / Helena Wong 5. Greedy Algorithms - 13 http://www.cs.cityu.edu.hk/~helena Activity-Selection Problem Greedy Strategy Solution Iterative-Activity-Selector() 1Answer = {a 1 } 2last_selected=1 3for m = 2 to n 4if start_time m >=finish_time last_selected 5then Answer = Answer U {a m } 6last_selected = m 7return Answer

15 CS3381 Des & Anal of Alg (2001-2002 SemA) City Univ of HK / Dept of CS / Helena Wong 5. Greedy Algorithms - 14 http://www.cs.cityu.edu.hk/~helena Activity-Selection Problem Greedy Strategy Solution For both Recursive-Activity-Selector and Iterative-Activity-Selector, Running times are  (n) Reason: each a m are examined once.

16 CS3381 Des & Anal of Alg (2001-2002 SemA) City Univ of HK / Dept of CS / Helena Wong 5. Greedy Algorithms - 15 http://www.cs.cityu.edu.hk/~helena Greedy Algorithm Design Steps of Greedy Algorithm Design: 1. Formulate the optimization problem in the form: we make a choice and we are left with one subproblem to solve. 2.Show that the greedy choice can lead to an optimal solution, so that the greedy choice is always safe. 3.Demonstrate that an optimal solution to original problem = greedy choice + an optimal solution to the subproblem Optimal Substructure Property Greedy- Choice Property A good clue that that a greedy strategy will solve the problem.

17 CS3381 Des & Anal of Alg (2001-2002 SemA) City Univ of HK / Dept of CS / Helena Wong 5. Greedy Algorithms - 16 http://www.cs.cityu.edu.hk/~helena Greedy Algorithm Design Comparison: Dynamic ProgrammingGreedy Algorithms At each step, the choice is determined based on solutions of subproblems. At each step, we quickly make a choice that currently looks best. --A local optimal (greedy) choice. Bottom-up approachTop-down approach Sub-problems are solved first.Greedy choice can be made first before solving further sub- problems. Can be slower, more complexUsually faster, simpler

18 Greedy Algorithms Similar to dynamic programming, but simpler approach  Also used for optimization problems Idea: When we have a choice to make, make the one that looks best right now  Make a locally optimal choice in hope of getting a globally optimal solution Greedy algorithms don’t always yield an optimal solution Makes the choice that looks best at the moment in order to get optimal solution.

19 Fractional Knapsack Problem Knapsack capacity: W There are n items: the i -th item has value v i and weight w i Goal:  find x i such that for all 0  x i  1, i = 1, 2,.., n  w i x i  W and  x i v i is maximum

20 50 Fractional Knapsack - Example E.g.: 10 20 30 50 Item 1 Item 2 Item 3 $60$100$120 10 20 $60 $100 + $240 $6/pound$5/pound$4/pound 20 --- 30 $80 +

21 Fractional Knapsack Problem Greedy strategy 1:  Pick the item with the maximum value E.g.:  W = 1  w 1 = 100, v 1 = 2  w 2 = 1, v 2 = 1  Taking from the item with the maximum value: Total value taken = v 1 /w 1 = 2/100  Smaller than what the thief can take if choosing the other item Total value (choose item 2) = v 2 /w 2 = 1

22 Fractional Knapsack Problem Greedy strategy 2: Pick the item with the maximum value per pound v i /w i If the supply of that element is exhausted and the thief can carry more: take as much as possible from the item with the next greatest value per pound It is good to order items based on their value per pound

23 Fractional Knapsack Problem Alg.: Fractional-Knapsack ( W, v[n], w[n] ) 1. While w > 0 and as long as there are items remaining 2. pick item with maximum v i /w i 3. x i  min (1, w/w i ) 4. remove item i from list 5. w  w – x i w i w – the amount of space remaining in the knapsack ( w = W ) Running time:  (n) if items already ordered; else  (nlgn)

24 CS3381 Des & Anal of Alg (2001-2002 SemA) City Univ of HK / Dept of CS / Helena Wong 5. Greedy Algorithms - 23 http://www.cs.cityu.edu.hk/~helena Huffman Codes For compressing data (sequence of characters) Widely used Very efficient (saving 20-90%) Use a table to keep frequencies of occurrence of characters. Output binary string. “Today’s weather is nice” “001 0110 0 0 100 1000 1110”

25 Huffman Code Problem Huffman’s algorithm achieves data compression by finding the best variable length binary encoding scheme for the symbols that occur in the file to be compressed.

26 Huffman Code Problem The more frequent a symbol occurs, the shorter should be the Huffman binary word representing it. The Huffman code is a prefix-free code.  No prefix of a code word is equal to another codeword.

27 Overview Huffman codes: compressing data (savings of 20% to 90%) Huffman’s greedy algorithm uses a table of the frequencies of occurrence of each character to build up an optimal way of representing each character as a binary string C: Alphabet

28 CS3381 Des & Anal of Alg (2001-2002 SemA) City Univ of HK / Dept of CS / Helena Wong 5. Greedy Algorithms - 27 http://www.cs.cityu.edu.hk/~helena Huffman Codes FrequencyFixed-lengthVariable-lengthcodeword ‘a’450000000 ‘b’13000001101 ‘c’12000010100 ‘d’16000011111 ‘e’90001001101 ‘f’50001011100 Example: A file of 100,000 characters. Containing only ‘a’ to ‘e’ 300,000 bits 1*45000 + 3*13000 + 3*12000 + 3*16000 + 4*9000 + 4*5000 = 224,000 bits 1*45000 + 3*13000 + 3*12000 + 3*16000 + 4*9000 + 4*5000 = 222424,000 bits eg. “abc” = “000001010” eg. “abc” = “0101100” 300,000 224,000

29 CS3381 Des & Anal of Alg (2001-2002 SemA) City Univ of HK / Dept of CS / Helena Wong 5. Greedy Algorithms - 28 http://www.cs.cityu.edu.hk/~helena 01 a:45b:13c:12d:16e:9 f:5 0 0 0 0 0 1 1 1 1 Huffman Codes The coding schemes can be represented by trees: FrequencyFixed-length (in thousands)codeword ‘a’45000 ‘b’13001 ‘c’12010 ‘d’16011 ‘e’9100 ‘f’5101 100 86 14 01 58 28 a:45b:13c:12d:16e:9 f:5 0 0 0 0 0 1 1 1 1 FrequencyVariable-length (in thousands)codeword ‘a’450 ‘b’13101 ‘c’12100 ‘d’16111 ‘e’91101 ‘f’51100 100 55 01 25 30 0 0 0 1 1 1 a:45 14 f:5 e:9 0 1 d:16 b:13 01 a:45b:13c:12d:16e:9 f:5 0 0 0 0 0 1 1 1 1 14 01 58 28 a:45b:13c:12d:16e:9 f:5 0 0 0 0 0 1 1 1 1 86 14 01 58 28 a:45b:13c:12d:16e:9 f:5 0 0 0 0 0 1 1 1 1 Not a full binary tree A full binary tree every nonleaf node has 2 children A file of 100,000 characters. c:12

30 CS3381 Des & Anal of Alg (2001-2002 SemA) City Univ of HK / Dept of CS / Helena Wong 5. Greedy Algorithms - 29 http://www.cs.cityu.edu.hk/~helena Huffman Codes Frequency Codeword ‘a’450000 ‘b’13000101 ‘c’12000100 ‘d’16000111 ‘e’90001101 ‘f’50001100 100 55 01 25 30 0 0 0 1 1 1 a:45 14 f::5 e:9 0 1 d:16 c:12b:13 To find an optimal code for a file: 1. The coding must be unambiguous. Consider codes in which no codeword is also a prefix of other codeword. => Prefix Codes Prefix Codes are unambiguous. Once the codewords are decided, it is easy to compress (encode) and decompress (decode). 2. File size must be smallest. => Can be represented by a full binary tree. => Usually less frequent characters are at bottom Let C be the alphabet (eg. C={‘a’,’b’,’c’,’d’,’e’,’f’}) For each character c, no. of bits to encode all c’s occurrences = freq c *depth c File size B(T) =  c  C freq c *depth c Eg. “abc” is coded as “0101100”

31 CS3381 Des & Anal of Alg (2001-2002 SemA) City Univ of HK / Dept of CS / Helena Wong 5. Greedy Algorithms - 30 http://www.cs.cityu.edu.hk/~helena Huffman Codes Huffman code (1952) was invented to solve it. A Greedy Approach. Q: A min-priority queue f:5e:9c:12b:13d:16 a:45 100 55 25 30 a:45 14 f::5 e:9 d:16 c:12b:13 c:12b:13d:16 a:45 14 f:5 e:9 d:16 a:45 14 25 c:12 b:13 30 f:5 e:9 a:45 d:16 14 25 c:12 b:13 30 55 f:5 e:9 d:16 a:45 14 25 c:12 b:13 f:5 e:9 How do we find the optimal prefix code?

32 CS3381 Des & Anal of Alg (2001-2002 SemA) City Univ of HK / Dept of CS / Helena Wong 5. Greedy Algorithms - 31 http://www.cs.cityu.edu.hk/~helena Huffman Codes HUFFMAN(C) 1 Build Q from C 2 For i = 1 to |C|-1 3 Allocate a new node z 4z.left = x = EXTRACT_MIN(Q) 5z.right = y = EXTRACT_MIN(Q) 6z.freq = x.freq + y.freq 7Insert z into Q in correct position. 8 Return EXTRACT_MIN(Q) Q: A min-priority queue f:5e:9c:12b:13d:16 a:45 c:12b:13d:16 a:45 14 f:5 e:9 d:16 a:45 14 25 c:12 b:13 f:5 e:9 …. If Q is implemented as a binary min-heap, “Build Q from C” is O(n) “ EXTRACT_MIN (Q)” is O(lg n) “Insert z into Q” is O(lg n) Huffman(C) is O(n lg n) How is it “greedy”?

33 Cost of a Tree T For each character c in the alphabet C  let f(c) be the frequency of c in the file  let d T (c) be the depth of c in the tree It is also the length of the codeword. Why? Let B(T) be the number of bits required to encode the file (called the cost of T)

34 Huffman Code Problem In the pseudocode that follows: we assume that C is a set of n characters and that each character c  C is an object with a defined frequency f [c]. The algorithm builds the tree T corresponding to the optimal code A min-priority queue Q, is used to identify the two least-frequent objects to merge together. The result of the merger of two objects is a new object whose frequency is the sum of the frequencies of the two objects that were merged.

35 Running time of Huffman's algorithm The running time of Huffman's algorithm assumes that Q is implemented as a binary min- heap. For a set C of n characters, the initialization of Q in line 2 can be performed in O (n) time using the BUILD-MINHEAP The for loop in lines 3-8 is executed exactly n - 1 times, and since each heap operation requires time O (lg n), the loop contributes O (n lg n) to the running time. Thus, the total running time of HUFFMAN on a set of n characters is O (n lg n).

36 Prefix Code Prefix(-free) code: no codeword is also a prefix of some other codewords (Un-ambiguous)  An optimal data compression achievable by a character code can always be achieved with a prefix code  Simplify the encoding (compression) and decoding Encoding: abc  0. 101. 100 = 0101100 Decoding: 001011101 = 0. 0. 101. 1101  aabe  Use binary tree to represent prefix codes for easy decoding An optimal code is always represented by a full binary tree, in which every non-leaf node has two children  |C| leaves and |C|-1 internal nodes Cost: Frequency of c Depth of c (length of the codeword)

37 Huffman Code Reduce size of data by 20%-90% in general If no characters occur more frequently than others, then no advantage over ASCII Encoding:  Given the characters and their frequencies, perform the algorithm and generate a code. Write the characters using the code Decoding:  Given the Huffman tree, figure out what each character is (possible because of prefix property)

38 How to Decode? 37 With fixed length code, easy:  break up into 3's, for instance For variable length code, ensure that no character's code is the prefix of another  no ambiguity 101111110100 b d e a a

39 Huffman Algorithm correctness: Need to prove two things: Greedy Choice Property: There exists a minimum cost prefix tree where the two smallest frequency characters are indeed siblings with the longest path from root. This means that the greedy choice does not hurt finding the optimum.

40 Algorithm correctness: Optimal Substructure Property: An optimal solution to the problem once we choose the two least frequent elements and combine them to produce a smaller problem, is indeed a solution to the problem when the two elements are added.

41 Algorithm correctness: There exists a minimum cost tree where the minimum frequency elements are longest path siblings: Assume that is not the situation. Then there are two elements in the longest path. Say a,b are the elements with smallest frequency and x,y the elements in the longest path.

42 Algorithm correctness: xy a dydy dada We know about depth and frequency: d a ≤ d y f a ≤ f y CT

43 Algorithm correctness: xy a dydy dada We also know about code tree CT: ∑f σ d σ σ is smallest possible. CT Now exchange a and y.

44 Algorithm correctness: xa y dydy dada CT’ (d a ≤ d y, f a ≤ f y Therefore f a d a ≥f y d a and f y d y ≥f a d y ) Cost(CT) = ∑f σ d σ = σ ∑f σ d σ +f a d a +f y d y ≥ σ≠a,y ∑f σ d σ +f y d a +f a d y = σ≠a,y cost(CT’)

45 Algorithm correctness: xa b dxdx dbdb CT Now do the same thing for b and x

46 Algorithm correctness: ba x dxdx dbdb CT” And get an optimal code tree where a and b are sibling with the longest paths

47 Algorithm correctness: Optimal substructure property: Let a,b be the symbols with the smallest frequency. Let x be a new symbol whose frequency is f x =f a +f b. Delete characters a and b, and find the optimal code tree CT for the reduced alphabet. Then CT’ = CT U {a,b} is an optimal tree for the original alphabet.

48 Algorithm correctness: CT x ab CT’ x f x = f a + f b

49 Algorithm correctness: cost(CT’)=∑f σ d’ σ = ∑f σ d’ σ + f a d’ a + f b d’ b = σ σ≠a,b ∑f σ d’ σ + f a (d x +1) + f b (d x +1) = σ≠a,b ∑f σ d’ σ +( f a + f b )(d x +1)= σ≠a,b ∑f σ d σ + f x (d x +1)+f x = cost(CT) + f x σ≠a,b

50 Algorithm correctness: CT x ab CT’ x f x = f a + f b cost(CT)+f x = cost(CT’)

51 Algorithm correctness: Assume CT’ is not optimal. By the previous lemma there is a tree CT” that is optimal, and where a and b are siblings. So cost(CT”) < cost(CT’)

52 Algorithm correctness: CT’’’ x ab CT” x f x = f a + f b By a similar argument: cost(CT’’’)+f x = cost(CT”) Consider

53 Algorithm correctness: We get: cost(CT’’’) = cost(CT”) – f x < cost(CT’) – f x = cost(CT) and this contradicts the minimality of cost(CT).

54 Application on Huffman code Both the.mp3 and.jpg file formats use Huffman coding at one stage of the compression

55 Dynamic Programming vs. Greedy Algorithms Dynamic programming  We make a choice at each step  The choice depends on solutions to subproblems  Bottom up solution, from smaller to larger subproblems Greedy algorithm  Make the greedy choice and THEN  Solve the subproblem arising after the choice is made  The choice we make may depend on previous choices, but not on solutions to subproblems  Top down solution, problems decrease in size

56 Looking Ahead More greedy algorithms to come when considering graph algorithms – Minimum spanning tree Kruskal Prim – Dijkstra’s algorithm for shortest paths from single source


Download ppt "Greedy Algorithms CSc 4520/6520 Fall 2013 Problems Considered Activity Selection Problem Knapsack Problem – 0 – 1 Knapsack – Fractional Knapsack Huffman."

Similar presentations


Ads by Google