Optimal Merging Of Runs 4369 43 6 9 715 22 7 13 22.

Slides:



Advertisements
Similar presentations
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Advertisements

Applied Algorithmics - week7
Min Chen School of Computer Science and Engineering Seoul National University Data Structure: Chapter 9.
Greedy Algorithms Amihood Amir Bar-Ilan University.
CS38 Introduction to Algorithms Lecture 5 April 15, 2014.
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Lecture04 Data Compression.
Compression & Huffman Codes
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
3 -1 Chapter 3 The Greedy Method 3 -2 The greedy method Suppose that a problem can be solved by a sequence of decisions. The greedy method has that each.
© 2004 Goodrich, Tamassia Greedy Method and Compression1 The Greedy Method and Text Compression.
Chapter 9: Greedy Algorithms The Design and Analysis of Algorithms.
A Data Compression Algorithm: Huffman Compression
CSC 213 Lecture 18: Tries. Announcements Quiz results are getting better Still not very good, however Average score on last quiz was 5.5 Every student.
DL Compression – Beeri/Feitelson1 Compression דחיסה Introduction Information theory Text compression IL compression.
Compression & Huffman Codes Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Data Structures – LECTURE 10 Huffman coding
Data Compression and Huffman Trees (HW 4) Data Structures Fall 2008 Modified by Eugene Weinstein.
Huffman Codes Message consisting of five characters: a, b, c, d,e
© The McGraw-Hill Companies, Inc., Chapter 3 The Greedy Method.
Huffman Encoding Veronica Morales.
1 Analysis of Algorithms Chapter - 08 Data Compression.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
Introduction to Algorithms Chapter 16: Greedy Algorithms.
Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 8. Greedy Algorithms.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Huffman Encodings Section 9.4. Data Compression: Array Representation Σ denotes an alphabet used for all strings Each element in Σ is called a character.
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
CPS 100, Spring Huffman Coding l D.A Huffman in early 1950’s l Before compressing data, analyze the input stream l Represent data using variable.
Huffman’s Algorithm 11/02/ Weighted 2-tree A weighted 2-tree T is an extended binary tree with n external nodes and each of the external nodes is.
Foundation of Computing Systems
Bahareh Sarrafzadeh 6111 Fall 2009
Lossless Decomposition and Huffman Codes Sophia Soohoo CS 157B.
1 Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004.
1 Huffman Codes. 2 ASCII use same size encoding for all characters. Variable length codes can produce shorter messages than fixed length codes Huffman.
Greedy algorithms 2 David Kauchak cs302 Spring 2012.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 18.
Lecture on Data Structures(Trees). Prepared by, Jesmin Akhter, Lecturer, IIT,JU 2 Properties of Heaps ◈ Heaps are binary trees that are ordered.
HUFFMAN CODES.
CSC317 Greedy algorithms; Two main properties:
Compression & Huffman Codes
Tries 07/28/16 11:04 Text Compression
Assignment 6: Huffman Code Generation
Tries 5/27/2018 3:08 AM Tries Tries.
Chapter 5 : Trees.
Greedy Technique.
The Greedy Method and Text Compression
The Greedy Method and Text Compression
Optimal Merging Of Runs
Data Compression If you’ve ever sent a large file to a friend, you may have compressed it into a zip archive like the one on this slide before doing so.
Chapter 8 – Binary Search Tree
Chapter 9: Huffman Codes
Analysis & Design of Algorithms (CSCE 321)
Optimal Merging Of Runs
Huffman Coding.
Advanced Algorithms Analysis and Design
Huffman Coding CSE 373 Data Structures.
Huffman Encoding Huffman code is method for the compression for standard text documents. It makes use of a binary tree to develop codes of varying lengths.
Greedy Algorithms TOPICS Greedy Strategy Activity Selection
Trees Addenda.
Data Structure and Algorithms
Greedy Algorithms Alexandra Stefan.
Tries 2/23/2019 8:29 AM Tries 2/23/2019 8:29 AM Tries.
Podcast Ch23d Title: Huffman Compression
Algorithms CSCI 235, Spring 2019 Lecture 30 More Greedy Algorithms
Huffman Coding Greedy Algorithm
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Analysis of Algorithms CS 477/677
Presentation transcript:

Optimal Merging Of Runs

Weighted External Path Length 4369 WEPL(T) =  (weight of external node i) * (distance of node i from root of T) WEPL(T) = 4 * 2 + 3*2 + 6*2 + 9*2 =

Weighted External Path Length WEPL(T) =  (weight of external node i) * (distance of node i from root of T) WEPL(T) = 4 * 3 + 3*3 + 6*2 + 9*1 =

Other Applications Message coding and decoding. Lossless data compression.

Message Coding & Decoding Messages M 0, M 1, M 2, …, M n-1 are to be transmitted. The messages do not change. Both sender and receiver know the messages. So, it is adequate to transmit a code that identifies the message (e.g., message index). M i is sent with frequency f i. Select message codes so as to minimize transmission and decoding times.

Example n = 4 messages. The frequencies are [2, 4, 8, 100]. Use 2-bit codes [00, 01, 10, 11]. Transmission cost = 2*2 + 4*2 + 8* *2 = 228. Decoding is done using a binary tree.

Example Decoding cost = 2*2 + 4*2 + 8* *2 = 228 = transmission cost = WEPL M3M3 M0M0 M1M1 M2M2

Example Every binary tree with n external nodes defines a code set for n messages M3M3 M0M0 M1M1 M2M2 10 Decoding cost = 2*3 + 4*3 + 8* *1 = 134 = transmission cost = WEPL

Another Example No code is a prefix of another! M0M0 M1M1 M2M2 M3M3 M4M4 M5M5 M8M8 M9M9 M6M6 M7M7

Lossless Data Compression Alphabet = {a, b, c, d}. String with 10 as, 5 bs, 100 cs, and 900 ds. Use a 2-bit code.  a = 00, b = 01, c = 10, d = 11.  Size of string = 10*2 + 5* * *2 = 2030 bits.  Plus size of code table.

Lossless Data Compression Use a variable length code that satisfies prefix property (no code is a prefix of another).  a = 000, b = 001, c = 01, d = 1.  Size of string = 10*3 + 5* * *1 = 1145 bits.  Plus size of code table.  Compression ratio is approx. 2030/1145 = 1.8.

Lossless Data Compression Decode … addbc… Compression ratio is maximized when the decode tree has minimum WEPL d ab c 10

Huffman Trees Trees that have minimum WEPL. Binary trees with minimum WEPL may be constructed using a greedy algorithm. For higher order trees with minimum WEPL, a preprocessing step followed by the greedy algorithm may be used. Huffman codes: codes defined by minimum WEPL trees.

Greedy Algorithm For Binary Trees Start with a collection of external nodes, each with one of the given weights. Each external node defines a different tree. Reduce number of trees by 1.  Select 2 trees with minimum weight.  Combine them by making them children of a new root node.  The weight of the new tree is the sum of the weights of the individual trees.  Add new tree to tree collection. Repeat reduce step until only 1 tree remains.

Example n = 5, w[0:4] = [2, 5, 4, 7, 9]

Example 2 n = 5, w[0:4] = [2, 5, 4, 7, 9]. 4 6

79 Example n = 5, w[0:4] = [2, 5, 4, 7, 9]

5 Example n = 5, w[0:4] = [2, 5, 4, 7, 9]

5 Example n = 5, w[0:4] = [2, 5, 4, 7, 9]

Data Structure For Tree Collection Operations are:  Initialize with n trees.  Remove 2 trees with least weight.  Insert new tree. Use a min heap. Initialize … O(n). 2(n – 1) remove min operations … O(n log n). n – 1 insert operations … O(n log n). Total time is O(n log n). Or, (n – 1) remove mins and (n – 1) change mins.

Higher Order Trees Greedy scheme doesn’t work! 3-way tree with weights [3, 6, 1, 9] Greedy Tree Cost = Optimal Tree Cost = 23

Cause Of Failure One node is not a 3-way node. A 2-way node is like a 3-way node, one of whose children has a weight of Greedy Tree Cost = 29 0 Must start with enough runs/weights of length 0 so that all nodes are 3-way nodes.

How Many Length 0 Runs To Add? k-way tree, k > 1. Initial number of runs is r. Add least q >= 0 runs of length 0. Each k-way merge reduces the number of runs by k – 1. Number of runs after s k-way merges is r + q – s(k – 1) For some positive integer s, the number of remaining runs must become 1.

How Many Length 0 Runs To Add? So, we want r + q – s(k–1) = 1 for some positive integer s. So, r + q – 1 = s(k – 1). Or, (r + q – 1) mod (k – 1) = 0. Or, r + q – 1 is divisible by k – 1.  This implies that q < k – 1. (r – 1) mod (k – 1) = 0 => q = 0. (r – 1) mod (k – 1) != 0 => q = k –1 – (r – 1) mod (k – 1). Or, q = (1 – r) mod (k – 1).

Examples k = 2.  q = (1 – r) mod (k – 1) = (1 – r) mod 1 = 0.  So, no runs of length 0 are to be added. k = 4, r = 6.  q = (1 – r) mod (k – 1) = (1 – 6) mod 3 = (–5)mod 3 = (6 – 5) mod 3 = 1.  So, must start with 7 runs, and then apply greedy method.