Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 8. Greedy Algorithms.

Slides:



Advertisements
Similar presentations
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Advertisements

Lecture 4 (week 2) Source Coding and Compression
Applied Algorithmics - week7
Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part R4. Disjoint Sets.
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Greedy Algorithms Amihood Amir Bar-Ilan University.
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
Huffman Encoding Dr. Bernard Chen Ph.D. University of Central Arkansas.
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Lecture04 Data Compression.
Compression & Huffman Codes
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
Lecture 6: Huffman Code Thinh Nguyen Oregon State University.
Optimal Merging Of Runs
3 -1 Chapter 3 The Greedy Method 3 -2 The greedy method Suppose that a problem can be solved by a sequence of decisions. The greedy method has that each.
© 2004 Goodrich, Tamassia Greedy Method and Compression1 The Greedy Method and Text Compression.
CPSC 411, Fall 2008: Set 4 1 CPSC 411 Design and Analysis of Algorithms Set 4: Greedy Algorithms Prof. Jennifer Welch Fall 2008.
A Data Compression Algorithm: Huffman Compression
DL Compression – Beeri/Feitelson1 Compression דחיסה Introduction Information theory Text compression IL compression.
Data Structures – LECTURE 10 Huffman coding
Chapter 9: Huffman Codes
16.Greedy algorithms Hsu, Lih-Hsing. Computer Theory Lab. Chapter 16P An activity-selection problem Suppose we have a set S = {a 1, a 2,..., a.
Huffman Codes Message consisting of five characters: a, b, c, d,e
© The McGraw-Hill Companies, Inc., Chapter 3 The Greedy Method.
Huffman Encoding Veronica Morales.
Prof. Amr Goneid Department of Computer Science & Engineering
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
Introduction to Algorithms Chapter 16: Greedy Algorithms.
Greedy algorithms David Kauchak cs161 Summer 2009.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
CSCE350 Algorithms and Data Structure Lecture 19 Jianjun Hu Department of Computer Science and Engineering University of South Carolina
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
Foundation of Computing Systems
Bahareh Sarrafzadeh 6111 Fall 2009
1 Algorithms CSCI 235, Fall 2015 Lecture 30 More Greedy Algorithms.
Lossless Decomposition and Huffman Codes Sophia Soohoo CS 157B.
Huffman Codes. Overview  Huffman codes: compressing data (savings of 20% to 90%)  Huffman’s greedy algorithm uses a table of the frequencies of occurrence.
Chapter 7 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information Theory 7.3 Run-Length Coding 7.4 Variable-Length Coding (VLC) 7.5.
بسم الله الرحمن الرحيم My Project Huffman Code. Introduction Introduction Encoding And Decoding Encoding And Decoding Applications Applications Advantages.
Greedy Algorithms Analysis of Algorithms.
Huffman code and Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Greedy algorithms 2 David Kauchak cs302 Spring 2012.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 18.
Huffman Codes ASCII is a fixed length 7 bit code that uses the same number of bits to define each character regardless of how frequently it occurs. Huffman.
HUFFMAN CODES.
Greedy Technique.
Representing Sets (2.3.3) Huffman Encoding Trees (2.3.4)
Greedy Method 6/22/2018 6:57 PM Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015.
The Greedy Method and Text Compression
The Greedy Method and Text Compression
Optimal Merging Of Runs
Chapter 9: Huffman Codes
Analysis & Design of Algorithms (CSCE 321)
Optimal Merging Of Runs
Merge Sort 11/28/2018 2:21 AM The Greedy Method The Greedy Method.
Algorithms (2IL15) – Lecture 2
Advanced Algorithms Analysis and Design
Greedy Algorithms Many optimization problems can be solved more quickly using a greedy approach The basic principle is that local optimal decisions may.
Chapter 11 Data Compression
Huffman Coding CSE 373 Data Structures.
Greedy Algorithms TOPICS Greedy Strategy Activity Selection
Data Structure and Algorithms
Chapter 16: Greedy algorithms Ming-Te Chi
Podcast Ch23d Title: Huffman Compression
Lecture 2: Greedy Algorithms
Algorithms CSCI 235, Spring 2019 Lecture 30 More Greedy Algorithms
Huffman Coding Greedy Algorithm
CSE 589 Applied Algorithms Spring 1999
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Analysis of Algorithms CS 477/677
Presentation transcript:

Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 8. Greedy Algorithms

Prof. Amr Goneid, AUC2 Greedy Algorithms

Prof. Amr Goneid, AUC3 Greedy Algorithms Microsoft Interview From:

Prof. Amr Goneid, AUC4 Greedy Algorithms The General Method Continuous Knapsack Problem Optimal Merge Patterns

Prof. Amr Goneid, AUC5 1. Greedy Algorithms Methodology: Start with a solution to a small sub- problem Build up to the whole problem Make choices that look good in the short term but not necessarily in the long term

Prof. Amr Goneid, AUC6 Greedy Algorithms Disadvantages: They do not always work. Short term choices may be disastrous on the long term. Correctness is hard to prove Advantages: When they work, they work fast Simple and easy to implement

Prof. Amr Goneid, AUC7 2. The General method Let a[ ] be an array of elements that may contribute to a solution. Let S be a solution, Greedy (a[ ],n) { S = empty; for each element (i) from a[ ], i = 1:n { x = Select (a,i); if ( Feasible (S,x)) S = Union (S,x); } return S; }

Prof. Amr Goneid, AUC8 The General method (continued) Select: Selects an element from a[ ] and removes it.Selection is optimized to satisfy an objective function. Feasible: True if selected value can be included in the solution vector, False otherwise. Union: Combines value with solution and updates objective function.

Prof. Amr Goneid, AUC9 3. Continuous Knapsack Problem

Prof. Amr Goneid, AUC10 Continuous Knapsack Problem Environment Object (i): Total Weight w i Total Profit p i Fraction of object (i) is continuous (0 =< x i <= 1) A Number of Objects 1 =< i <= n A knapsack Capacity m 2 n 1 m

Prof. Amr Goneid, AUC11 The problem Problem Statement: For n objects with weights w i and profits p i, obtain the set of fractions of objects x i which will maximize the total profit without exceeding a total weight m. Formally: Obtain the set X = (x 1, x 2, …, x n ) that will maximize  1  i  n p i x i subject to the constraints:  1  i  n w i x i  m, 0  x i  1, 1  i  n

Prof. Amr Goneid, AUC12 Optimal Solution Feasible Solution: by satisfying constraints. Optimal Solution: Feasible solution and maximizing profit. Lemma 1: If  1  i  n w i = m then x i = 1 is optimal. Lemma 2: An optimal solution will give  1  i  n w i x i = m

Prof. Amr Goneid, AUC13 Greedy Algorithm To maximize profit, choose highest p first. Also choose highest x, i.e., smallest w first. In other words, let us define the “value” of an object (i) to be the ratio v i = p i /w i and so we choose first the object with the highest v i value.

Prof. Amr Goneid, AUC14 Algorithm GreedyKnapsack ( p[ ], w[ ], m, n,x[ ] ) { insert indices (i) of items in a maximum heap on value v i = p i / w i ; Zero the vector x;Rem = m ; For k = 1..n { remove top of heap to get index (i); if (w[i] > Rem) then break; x[i] = 1.0 ; Rem = Rem – w[i] ; } if (k < = n ) x[i] = Rem / w[i] ; } // T(n) = O(n log n)

Prof. Amr Goneid, AUC15 Example n = 3 objects, m = 20 P = (25, 24, 15), W = (18, 15, 10), V = (1.39, 1.6,1.5) Objects in decreasing order of V are {2, 3, 1} Set X = {0,0,0} and Rem = m = 20 K = 1, Choose object i = 2: w 2 < Rem, Set x 2 = 1, w 2 x 2 = 15, Rem = 5 K = 2, Choose object i = 3: w 3 > Rem, break; K < n, x 3 = Rem / w 3 = 0.5 Optimal solution is X = (0, 1.0, 0.5), Total profit is  1  i  n p i x i = 31.5 Total weight is  1  i  n w i x i = m = 20

Prof. Amr Goneid, AUC16 4. Optimal Merge Patterns (a) Definitions Binary Merge Tree: A binary tree with external nodes representing entities and internal nodes representing merges of these entities. Optimal Binary Merge Tree: The sum of paths from root to external nodes is optimal (e.g. minimum). Assuming that the node (i) contributes to the cost by p i and the path from root to such node has length L i, then optimality requires a pattern that minimizes

Prof. Amr Goneid, AUC17 Optimal Binary Merge Tree If the items {A,B,C} contribute to the merge cost by P A, P B, P C, respectively, then the following 3 different patterns will cost: P 1 = 2(P A +P B )+P C P 2 = P A +2(P B +P C )P 3 = 2P A +P B +2P C Which of these merge patterns is optimal?

Prof. Amr Goneid, AUC18 (b) Optimal Merging of Lists Lists {A,B,C} have lengths 30,25,10, respectively. The cost of merging two lists of lengths n,m is n+m. The following 3 different merge patterns will cost: P 1 = 2(30+25)+10 = 120 P 2 = 30+2(25+10) = 100 P 3 = 25+2(30+10) = 105 P 2 is optimal so that the merge order is {{B,C},A}.

Prof. Amr Goneid, AUC19 The Greedy Method Insert lists and their lengths in a minimum heap of lengths. Repeat Remove the two lowest length lists (p i,p j ) from heap. Merge lists with lengths (p i,p j ) to form a new list with length p ij = p i + p j Insert p ij and its into the heap until all symbols are merged into one final list C10 B25A30 A BC35BCA65

Prof. Amr Goneid, AUC20 The Greedy Method Notice that both Lists (B : 25 elements) and (C : 10 elements) have been merged (moved) twice List (A : 30 elements) has been merged (moved) only once. Hence the total number of element moves is 100. This is optimal among the other merge patterns.

Prof. Amr Goneid, AUC21 (c) Huffman Coding Terminology Symbol: A one-to-one representation of a single entity. Alphabet: A finite set of symbols. Message: A sequence of symbols. Encoding: Translating symbols to a string of bits. Decoding: The reverse.

Prof. Amr Goneid, AUC22 Encoding: a 00 b 01 c 10 d 11 Decoding: b c a d a This is fixed length coding Example: Coding Tree for 4-Symbol Alphabet (a,b,c,d) abcd abcd a b c d

Prof. Amr Goneid, AUC23 Coding Efficiency & Redundancy L i =Length of path from root to symbol (i) = no. of bits representing that symbol. P i = probability of occurrence of symbol (i) in message. n = size of alphabet. = Average Symbol Length =  1  i  n P i L i bits/symbol (bps) For fixed length coding, L i = L = constant, = L (bps) Is this optimal (minimum) ? Not necessarily.

Prof. Amr Goneid, AUC24 Coding Efficiency & Redundancy The absolute minimum in a message is called the Entropy. The concept of entropy as a measure of the average content of information in a message has been introduced by Claude Shannon (1948).

Prof. Amr Goneid, AUC25 Coding Efficiency & Redundancy Shannon's entropy represents an absolute limit on the best possible lossless compression of any communication. It is computed as:

Prof. Amr Goneid, AUC26 Coding Efficiency & Redundancy Coding Efficiency:  = H / 0    1 Coding Redundancy: R = 1 -  0  R  1 H Actual Optimal Perfect

Prof. Amr Goneid, AUC27 Example: Fixed Length Coding 4- Symbol Alphabet (a,b,c,d). All symbols have the same length L = 2 bits Message : abbcaada = 2 (bps) Symbol (i)pipi -log p i -p i log p i codeLiLi a b c d H = 1.75

Prof. Amr Goneid, AUC28 Example Entropy H = = 1.75 (bps), Coding Efficiency  = H / = 1.75 / 2 = 0.875, Coding Redundancy R = 1 – = This is not optimal

Prof. Amr Goneid, AUC29 Result Fixed length coding is optimal (perfect) only when all symbol probabilities are equal. To prove this: With n = 2 m symbols, L = m bits and = m (bps). If all probabilities are equal,

Prof. Amr Goneid, AUC30 Variable Length Coding (Huffman Coding) The problem: Given a set of symbols and their probabilities Find a set of binary codewords that minimize the average length of the symbols

Prof. Amr Goneid, AUC31 Variable Length Coding (Huffman Coding) Formally: Input: A message M(A,P) with a symbol alphabet A = {a 1,a 2,…,a n } of size (n) a set of probabilities for the symbols P = {p 1,p 2,….p n } Output: A set of binary codewords C = {c 1,c 2,….c n } with bit lengths L = {L 1,L 2,….L n } Condition:

Prof. Amr Goneid, AUC32 Variable Length Coding (Huffman Coding) To achieve optimality, we use optimal binary merge trees to code symbols of unequal probabilities. Huffman Coding: More frequent symbols occur nearer to the root ( shorter code lengths), less frequent symbols occur at deeper levels (longer code lengths).

Prof. Amr Goneid, AUC33 The Greedy Method Store each symbol in a parentless node of a binary tree. Insert symbols and their probabilities in a minimum heap of probabilities. Repeat Remove lowest two probabilities (p i,p j ) from heap. Merge symbols with (p i,p j ) to form a new symbol (a i a j ) with probability p ij = p i + p j Store symbol (a i a j ) in a parentless node with two children a i and a j Insert p ij and its symbols into the heap until all symbols are merged into one final alphabet (root) Trace path from root to each leaf (symbol) to form the bit string for that symbol. Concatenate “0” for a left branch, and “1” for a right branch.

Prof. Amr Goneid, AUC34 Example (1): 4- Symbol Alphabet A = {a, b, c, d} of size (4). Message M(A,P) : abbcaada, P = {0.5, 0.25, 0.125, 0.125} H = 1.75 Symbol (i)pipi -log p i -p i log p i a0.51 b c d

Prof. Amr Goneid, AUC35 Building The Optimal Merge Table sisi pipi sisi pipi sisi pipi sisi pipi d0.125 c cd0.25 b b bcd0.5 a a a abcd1.0

Prof. Amr Goneid, AUC36 Optimal Merge Tree for Example(1) Example: a (50%), b (25%), c (12.5%), d (12.5%) a b cd

Prof. Amr Goneid, AUC37 Optimal Merge Tree for Example(1) Example: a (50%), b (25%), c (12.5%), d (12.5%) cd a b cd 0 1

Prof. Amr Goneid, AUC38 Optimal Merge Tree for Example(1) Example: a (50%), b (25%), c (12.5%), d (12.5%) bcd cd a b cd

Prof. Amr Goneid, AUC39 Optimal Merge Tree for Example(1) Example: a (50%), b (25%), c (12.5%), d (12.5%) abcd bcd cd a b cd aiai cici L i (bits) a01 b102 c1103 d1113

Prof. Amr Goneid, AUC40 Coding Efficiency for Example(1) = ( 1* * * * 0.125) = 1.75 (bps) H = = 1.75 (bps),  = H / = 1.75 / 1.75 = 1.00, R = 0.0 Notice that: Symbols exist at leaves, i.e., no symbol code is the prefix of another symbol code. This is why the method is also called “prefix coding”

Prof. Amr Goneid, AUC41 Analysis The cost of insertion in a minimum heap is O(n logn) The repeat loop is done (n-1) times. In each iteration, the worst case removal of the least two elements is 2 logn and the insertion of the merged element is logn Hence, the complexity of the Huffman algorithm is O(n logn)

Prof. Amr Goneid, AUC42 Example (2): 4- Symbol Alphabet A = {a, b, c, d} of size (4). P = {0.4, 0.25, 0.18, 0.17} H = Symbol (i)pipi -log p i -p i log p i a b c d

Prof. Amr Goneid, AUC43 Example(2): Merge Table sisi pipi sisi pipi sisi pipi sisi pipi d0.17 c0.18b0.25 b cd0.35a0.40 a a cdb0.60cdba1.0

Prof. Amr Goneid, AUC44 Optimal Merge Tree for Example(2) cdba cdb cd a b cd aiai cici L i (bits) a11 b012 c0013 d0003

Prof. Amr Goneid, AUC45 Coding Efficiency for Example(2) a (40%), b (25%), c (18%), d (17%) = 1.95 bps (Optimal) H =  = 97.9 % R = 2.1 % Coding is optimal (97.9%) but not perfect Important Result: Perfect coding (  = 100 %) can be achieved only for probability values of the form 2 - m (1/2, ¼, 1/8,…etc)

Prof. Amr Goneid, AUC46 File Compression Variable Length Codes can be used to compress files. Symbols are initially coded using ASCII (8-bit) fixed length codes. Steps: 1. Determine Probabilities of symbols in file. 2. Build Merge Tree (or Table) 3. Assign variable length codes to symbols. 4. Encode symbols using new codes. 5. Save coded symbols in another file together with the symbol code table. The Compression Ratio = / 8

Prof. Amr Goneid, AUC47 Huffman Coding Animations For examples of animations of Huffman coding, see: Huffman.html