CS420 lecture eight Greedy Algorithms. Going from A to G Starting with a full tank, we can drive 350 miles before we need to gas up, minimize the number.

CS420 lecture eight Greedy Algorithms

Going from A to G Starting with a full tank, we can drive 350 miles before we need to gas up, minimize the number of times we need to stop at a gas station A B C D E F G 0 250 300 600 850 900 1100

Going from A to G We can drive 350 miles before we need to gas up, minimize the number of times we need to stop at a gas station a possible (non greedy) optimal solution A B C D E F G 0 250 300 600 850 900 1100

Going from A to G We can drive 350 miles before we need to gas up, minimize the number of times we need to stop at a gas station we can make it greedy: A B C D E F G 0 250 300 600 850 900 1100 A greedy solution goes as far as possible before gassing up, and we can turn an optimal solution into a greedy one by making the first step greedy, and then applying induction

Greedy algorithms Greedy algorithms determine a global optimum via (a number of) locally optimal choices

Activity Selection Given a set of activities S = { 1,2,3,...,N } that use a resource and have a start time S i and finish time F i S i <=F i. Activities are compatible if the intervals [S i,F i ) and [S j,F j ) do not overlap: S i >=F j or S j >=F i [ [) means: includes left but only up to right ) The Activity-selection problem is to select a maximum-size set of mutually compatible activities.

Greedy algorithm for Activity Selection How would you do it?

Greedy algorithm for Activity Selection Sort activities by finish time F 1 <= F 2...<= F n A=1 j=1 for i = 2 to n if S i >=F j include i in A j=i

Eg from Cormen et. al. i 1 2 3 4 5 6 7 8 9 10 11 S i 1 3 0 5 3 5 6 8 8 2 12 F i 4 5 6 7 8 9 10 11 12 13 14

Eg from Cormen et. al. i 1 2 3 4 5 6 7 8 9 10 11 S i 1 3 0 5 3 5 6 8 8 2 12 F i 4 5 6 7 8 9 10 11 12 13 14 A = 1,4,8,11

Activity selection Are there other ways to do it? sure...

Greedy works for Activity Selection BASE: Optimal solution contains activity 1 as first activity Let A be an optimal solution with activity k != 1 as first activity Then we can replace activity k (which has F k >=F 1 ) by activity 1 So, picking the first element in a greedy fashion works

Greedy works for Activity Selection STEP: After the first choice is made, remove all activities that are incompatible with the first chosen activity and recursively define a new problem consisting of the remaining activities. The first activity for this reduced problem can be made in a greedy fashion by principle 1. By induction, Greedy is optimal.

What did we do? We assumed there was a non greedy optimal solution, then we stepwise morphed this solution in a greedy optimal solution, thereby showing that the greedy solution works in the first place.

MST: Minimal Spanning Tree Given a connected and undirected graph with labeled edges (label = distance), find a tree that – is a sub-graph of the given graph (has nodes and edges from the given graph) – and is a minimal spanning tree: it reaches each node, such that the sum of the edge labels is minimal (MST).

12 3 10 11 7 4 8 6 15

12 3 10 11 7 4 8 6 Greedy solution for MST? 15

12 3 10 11 7 4 8 6 Pick a node and a minimal edge emanating from it, now we have a MST in the making. Keep adding minimal edges to the MST until connected. 15

12 3 10 11 7 4 8 6 15

Greedy works for MST Lemma 1 Let G be connected and undirected graph (V,E) and S be a spanning tree S = (V,T) of G, then forall V 1,V 2 in S the path from V 1 to V 2 is unique why?

Greedy works for MST Lemma 1 Let G be connected and undirected graph (V,E) and S be a spanning tree S = (V,T) of G, then for all V 1,V 2 in S the path from V 1 to V 2 is unique. otherwise it wouldn't be a tree

Greedy works for MST Lemma 2 Let G be connected and undirected graph (V,E) and S be a spanning tree S = (V,T) of G, then if any edge in E-T is added to S, a unique cycle results. why?

Greedy works for MST Lemma 2 Let G be connected and undirected graph (V,E) and S be a spanning tree S = (V,T) of G, then if any edge in E-T is added to S, a unique cycle results. because there already is a unique path between the endpoints of the added edge

Greedy works for MST Lemma 2 Let G be connected and undirected graph (V,E) and S be a spanning tree S = (V,T) of G then also, any edge on the cycle can be taken away, making the graph a spanning tree again.

Greedy works for MST Proof by contradiction: Suppose we can create an MST by at some stage not taking the minimal cost edge min, but a non-minimal edge other. We build the rest of the spanning tree, so now all vertices are connected. We can now make a lower cost spanning tree by removing other and adding min. Hence the spanning tree with other in it was not minimal.

Bounds for MST MST = Ω(|V|) (we need to touch all nodes) Greedy with priority heap for nodes is O(|E| lg|V|) – See lecture on Shortest Paths There is no known O(n) algorithm for MST – MST has algorithmic gap

Huffman codes Say I have a code consisting of the letters a, b, c, d, e, f with frequencies (x1000) 45, 13, 12, 16, 9, 5 What would a fixed encoding look like?

Huffman codes Say I have a code consisting of the letters a, b, c, d, e, f with frequencies(x1000) 45, 13, 12, 16, 9, 5 What would a fixed bit encoding look like? a b c d e f 000 001 010 011 100 101

Variable encoding a b c d e f frequency(x1000) 45 13 12 16 9 5 fixed encoding 000 001 010 011 100 101 variable encoding 0 101 100 111 1101 1100

Fixed vs variable 100,000 characters Fixed:

Fixed vs variable 100,000 characters Fixed: 300,000 bits Variable:

Fixed vs variable 100,000 characters Fixed: 300,000 bits Variable: (1*45 + 3*13 + 3*12 + 3*16 + 4*9 + 4*5)*1000 = 224,000 bits 25% saving

Variable prefix encoding a b c d e f frequency(x1000) 45 13 12 16 9 5 variable encoding 0 101 100 111 1101 1100 what is special about our encoding?

Variable prefix encoding a b c d e f frequency(x1000) 45 13 12 16 9 5 variable encoding 0 101 100 111 1101 1100 no code is a prefix of another. why does it matter?

Variable prefix encoding a b c d e f frequency(x1000) 45 13 12 16 9 5 variable encoding 0 101 100 111 1101 1100 no code is a prefix of another. We can concatenate the codes without ambiguities

Variable prefix encoding a b c d e f frequency(x1000) 45 13 12 16 9 5 variable encoding 0 101 100 111 1101 1100 0101100 = 001011101 =

Representing an encoding A binary tree, where the intermediate nodes contain frequencies, and the leaves are the characters (+their frequencies) and the paths to the leaves are the codes, is nice.

100 0/ \1 / \ a:45 55 / \ 0/ \1 25 30 0/ \1 0/ \1 c:12 b:13 14 d:16 / \ 0/ \1 f:5 e:9 T HE FREQUENCIES OF THE INTERNAL NODES ARE THE SUMS OF THE FREQUENCIES OF THEIR CHILDREN.

100 0/ \1 / \ a:45 55 / \ 0/ \1 25 30 0/ \1 0/ \1 c:12 b:13 14 d:16 / \ 0/ \1 f:5 e:9 T HE FREQUENCIES OF THE INTERNAL NODES ARE THE SUMS OF THE FREQUENCIES OF THEIR CHILDREN. I F THE TREE IS NOT FULL, THE ENCODING IS NON OPTIMAL. W HY ?

100 0/ \1 / \ a:45 55 / \ 0/ \1 25 30 0/ \1 0/ \1 c:12 b:13 14 d:16 / \ 0/ \1 f:5 e:9 A N OPTIMAL CODE IS REPRESENTED BY A FULL BINARY TREE, WHERE EACH INTERNAL NODE HAS TWO CHILDREN. If a tree is not full it has an internal node with one child labeled with a redundant bit. (check the fixed encoding)

100 0/ \1 / \ 86 14 0/ \1 0/ / \ | 58 28 14 0/ \1 0/ \1 0/ \1 / \ / \ / \ a:45 b:13 c:12 d:16 e:9 f:5

100 0/ \1 / \ 86 14 0/ \1 0 / redundant 0 / \ | 58 28 14 0/ \1 0/ \1 0/ \1 / \ / \ / \ a:45 b:13 c:12 d:16 e:9 f:5

Cost of encoding a file For each character c in C, f(c) is its frequency and d(c) is its depth in the tree, which equals the number of bits it takes to encode c. Then the cost of the encoding is the number of bits to encode the file, which is

Huffman code An optimal encoding of a file has a minimal cost. Huffman invented a greedy algorithm to construct an optimal prefix code called the Huffman code.

Huffman algorithm Create |C| leaves, one for each character Perform |C|-1 merge operations, each creating a new node, with children the nodes with least two frequencies and with frequency the sum of these two frequencies. By using a heap for the collection of intermediate trees this algorithm takes O(nlgn) time.

1) f:5 e:9 c:12 b:13 d:16 a:45

2) c:12 b:13 14 d:16 a:45 / \ f e

1) f:5 e:9 c:12 b:13 d:16 a:45 2) c:12 b:13 14 d:16 a:45 / \ f e 3) 14 d:16 25 a:45 / \ / \ f e c b

1) f:5 e:9 c:12 b:13 d:16 a:45 2) c:12 b:13 14 d:16 a:45 / \ f e 3) 14 d:16 25 a:45 / \ / \ f e c b 4) 25 30 a:45 / \ / \ c b 14 d / \ f e

1) f:5 e:9 c:12 b:13 d:16 a:45 2) c:12 b:13 14 d:16 a:45 / \ f e 3) 14 d:16 25 a:45 / \ / \ f e c b 4) 25 30 a:45 / \ / \ c b 14 d / \ f e 5) a:45 55 / \ 25 30 / \ / \ c b 14 d / \ f e

1) f:5 e:9 c:12 b:13 d:16 a:45 2) c:12 b:13 14 d:16 a:45 / \ f e 3) 14 d:16 25 a:45 / \ / \ f e c b 4) 25 30 a:45 / \ / \ c b 14 d / \ f e 5) a:45 55 / \ 25 30 / \ / \ c b 14 d / \ f e 6) 100 / \ a 55 / \ 25 30 / \ / \ c b 14 d / \ f e

Huffman is optimal Base step inductive approach. First we show: Let x and y be the two characters with the minimal frequencies, then there is a minimal cost encoding tree with x and y of equal and highest depth (see e and f in our example above). How?

Greedy proof technique The proof technique is the same as we have used for the previous two problems (activities and MST): If the greedy choice is not taken then we show that by taking the greedy choice we get a solution that is as good or better.

Lowest leaves x,y lowest frequencies Assume that two other characters a and b with higher frequencies are siblings at the lowest level of the tree: T / \ O x / \ y O / \ a b

Since the frequencies of x and y are lowest, the cost of the tree can only improve if we swap y and a, and x and b: T / \ O b / \ a O / \ y x why?

Since the frequencies of x and y are lowest, the cost of the tree can only improve if we swap y and a, and x and b: T / \ O b / \ a O / \ y x what is the cost of an encoding tree?

Greedy start We have shown that putting the lowest two frequency characters lowest in the tree is a good greedy starting point for our algorithm: base of the induction.

Step If we have an alphabet C' = C with x and y replaced by a new character z with frequency f(z)=f(x)+f(y) with an optimal encoding tree T' (eg, the tree created from steps 2 to 6 in the example) then we need to show that the tree T with leaf z replaced by an internal node f(z) with children x:f(x) and y:f(y)) is an optimal encoding for C (the tree created form steps 1 to 6 in the example).

Proof of step: by contradiction d(x)=d(y)=d(z)+1 so f(x)d(x)+f(y)d(y) = (f(x)+f(y))(d(z)+1) = f(z)d(z)+f(x)+f(y) because f(z)=f(x)+f(y) So cost(T) = cost(T')+f(x)+f(y)

Now suppose T is not an optimal encoding, then there is another optimal tree T''. We have shown that we can put x and y as siblings at the lowest level of T''. Let T''' be T'' with x and y replaced by z, then cost(T''') = cost(T'')-f(x)-f(y) < cost(T)-f(x)-f(y) = cost(T'). But that yields a contradiction with the assumption that T' was optimal for C'. Hence Huffman (ie Greedy) produces an optimal prefix encoding tree.

conclusion All proofs that a greedy method works are based on the same principle. We show that taking the greedy choice first and then recursively solving the reduced problem is optimal, because either we can change an optimal solution without the greedy first choice to one with it, or we can show that an optimal solution must have the greedy first choice.

CS420 lecture eight Greedy Algorithms. Going from A to G Starting with a full tank, we can drive 350 miles before we need to gas up, minimize the number.

Similar presentations

Presentation on theme: "CS420 lecture eight Greedy Algorithms. Going from A to G Starting with a full tank, we can drive 350 miles before we need to gas up, minimize the number."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS420 lecture eight Greedy Algorithms. Going from A to G Starting with a full tank, we can drive 350 miles before we need to gas up, minimize the number.

Similar presentations

Presentation on theme: "CS420 lecture eight Greedy Algorithms. Going from A to G Starting with a full tank, we can drive 350 miles before we need to gas up, minimize the number."— Presentation transcript:

Similar presentations

About project

Feedback