Chapter 9: Huffman Codes

Slides:



Advertisements
Similar presentations
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Advertisements

Lecture 4 (week 2) Source Coding and Compression
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Greedy Algorithms Amihood Amir Bar-Ilan University.
22C:19 Discrete Structures Trees Spring 2014 Sukumar Ghosh.
Huffman Encoding Dr. Bernard Chen Ph.D. University of Central Arkansas.
Lecture04 Data Compression.
Compression & Huffman Codes
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
A Data Compression Algorithm: Huffman Compression
Data Structures – LECTURE 10 Huffman coding
Chapter 9: Huffman Codes
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
Greedy Algorithms Huffman Coding
Fundamentals of Multimedia Chapter 7 Lossless Compression Algorithms Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
16.Greedy algorithms Hsu, Lih-Hsing. Computer Theory Lab. Chapter 16P An activity-selection problem Suppose we have a set S = {a 1, a 2,..., a.
Data Compression Basics & Huffman Coding
Data Compression Gabriel Laden CS146 – Dr. Sin-Min Lee Spring 2004.
Data Structures and Algorithms Huffman compression: An Application of Binary Trees and Priority Queues.
CS 46B: Introduction to Data Structures July 30 Class Meeting Department of Computer Science San Jose State University Summer 2015 Instructor: Ron Mak.
Data Structures Arrays both single and multiple dimensions Stacks Queues Trees Linked Lists.
Huffman Codes. Encoding messages  Encode a message composed of a string of characters  Codes used by computer systems  ASCII uses 8 bits per character.
Huffman Encoding Veronica Morales.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
Data Structures Week 6: Assignment #2 Problem
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
1 Huffman Codes Drozdek Chapter Objectives You will be able to Construct an optimal variable bit length code for an alphabet with known probability.
© Jalal Kawash 2010 Trees & Information Coding Peeking into Computer Science.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
Introduction to Algorithms Chapter 16: Greedy Algorithms.
Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 8. Greedy Algorithms.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
CSCE350 Algorithms and Data Structure Lecture 19 Jianjun Hu Department of Computer Science and Engineering University of South Carolina
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
Foundation of Computing Systems
Bahareh Sarrafzadeh 6111 Fall 2009
Trees (Ch. 9.2) Longin Jan Latecki Temple University based on slides by Simon Langley and Shang-Hua Teng.
Lossless Decomposition and Huffman Codes Sophia Soohoo CS 157B.
Chapter 7 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information Theory 7.3 Run-Length Coding 7.4 Variable-Length Coding (VLC) 7.5.
Lecture 12 Huffman Algorithm. In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly.
Huffman encoding.
Compression and Huffman Coding. Compression Reducing the memory required to store some information. Lossless compression vs lossy compression Lossless.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 18.
Design & Analysis of Algorithm Huffman Coding
HUFFMAN CODES.
CSC317 Greedy algorithms; Two main properties:
Tries 07/28/16 11:04 Text Compression
Assignment 6: Huffman Code Generation
Representing Sets (2.3.3) Huffman Encoding Trees (2.3.4)
Proving the Correctness of Huffman’s Algorithm
Data Compression If you’ve ever sent a large file to a friend, you may have compressed it into a zip archive like the one on this slide before doing so.
Chapter 8 – Binary Search Tree
Analysis & Design of Algorithms (CSCE 321)
Algorithms (2IL15) – Lecture 2
Advanced Algorithms Analysis and Design
Greedy Algorithms Many optimization problems can be solved more quickly using a greedy approach The basic principle is that local optimal decisions may.
Chapter 11 Data Compression
Huffman Coding CSE 373 Data Structures.
Chapter 16: Greedy algorithms Ming-Te Chi
Huffman Encoding Huffman code is method for the compression for standard text documents. It makes use of a binary tree to develop codes of varying lengths.
Greedy Algorithms TOPICS Greedy Strategy Activity Selection
Data Structure and Algorithms
Chapter 16: Greedy algorithms Ming-Te Chi
Podcast Ch23d Title: Huffman Compression
Algorithms CSCI 235, Spring 2019 Lecture 30 More Greedy Algorithms
Huffman Coding Greedy Algorithm
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Proving the Correctness of Huffman’s Algorithm
Analysis of Algorithms CS 477/677
Presentation transcript:

Chapter 9: Huffman Codes The Design and Analysis of Algorithms Chapter 9: Huffman Codes

Huffman Codes Basic Idea Building the tree Implementation Analysis Encoding and Decoding Analysis Discussion

Basic Idea Fixed-length encoding ASCII, Unicode Variable-length encoding : assign longer codewords to less frequent characters, shorter codewords to more frequent characters.

Building a Huffman Tree Compute the frequencies of each character in the alphabet Build a tree forest with one-node trees, where each node corresponds to a character and contains the frequency of the character in the text to be encoded Select two parentless nodes with the lowest frequency Create a new node which is the parent of the two lowest frequency nodes. Label the left link with 0 and the right link with 1 Assign the new node a frequency equal to the sum of its children's frequencies. Repeat Steps 3 through 6 until there is only one parentless node left.

Example here is a simple example _ (space) 4 A 2 1 E 5 H 1 I 2 L 2 M 2 3 A/2 I/2 4 L/2 M/2 P/2 S/2 _/4 E/5 7 8 9 15 24 _ (space) 4 A 2 E 5 H 1 I 2 L 2 M 2 P 2 R 1 S 2 X 1 1 The code for each symbol may be obtained by tracing a path from the root of the tree to that symbol.

Implementation Array frequencies[0...2N] : node frequencies. if frequencies[k] > 0, 0  k  N-1, then frequencies[k] is a terminal node Array parents[0..2N] : represents the parents of each node in array frequencies[]. The parent of node k with frequency frequencies[k] is given by abs(parents[k]). If parents[k] > 0, node k is linked to the left of its parent, otherwise – to the right. Priority queue with elements (k, frequencies[k]), where the priority is frequencies[k].

Algorithm compute frequencies[k] and insert in a PQueue if frequencies[k] > 0 m  N while PQueue not empty do deleteMin from PQueue (node1, frequency1) if PQueue empty break else deleteMin from PQueue (node2, frequency2) create new node m and insert in PQueue m  m + 1 end // tree is built with root = node1

Algorithm Create New Node m frequencies[m]  frequency1 + frequency2 // new node frequencies[node1]  m // left link frequencies[node2]  -m // right link insert in PQueue (m, frequency1 + frequency2)

To Encode a Character Start with the leaf corresponding to that character and follow the path to the root The labels on the links in the path will give the reversed code.

To Restore the Encoded Text Start with the root and follow the path that matches the bits in the encoded text, i.e. go left if ‘0’, go right if ‘1’. Output the character found in the leaf at the end of the path. Repeat for the remaining bits in the encoded text

Analysis Time efficiency of building the Huffman tree The insert and delete operations each take log(N) time, and they are repeated at most 2N times Therefore the run time is O(2NlogN) = O(NlogN)

Discussion Huffman trees give prefix-free codes. Prefix-free codes have the property that no code is a prefix of any other code. No delimiter necessary Huffman trees are full trees – i.e. each node except the leaves has two children. The length of the encoded message is equal to the weighted external path length of the Huffman frequency tree.

Weighted External Path Length Definition:   Let T be a tree with weights w1,...wn at its leaf nodes. The weighted leaf path length L(T) of T is defined as the sum L(T) =  li wi i leaf(T) where leaf(T) is the set of all leaves of T, and li is the path length - the length of the path from the root to node i. Huffman codes solve the more general problem: Given a set of weighted leaves, construct a tree with the minimum weighted path length.

Discussion Optimality: No tree with the same frequencies in external nodes has lower weighted external path length than the Huffman tree. This property can be proved by induction (proof is omitted here). The tree must be saved and sent along with the message in order to decode it. Alternative: adaptive Huffman coding

Adaptive Huffman coding The frequencies are assumed initially to be all the same, and then adjusted in the light of the message being coded to reflect the actual frequencies. Since the decoder has access to the same information as the encoder, it can be arranged that the decoder changes coding trees at the same point as the encoder does.

Discussion For truly random files the code is not effective, since each character will have approximately same frequency Widely used coding schemes such as zip are based on the Huffman encoding. A copy of one David Huffman's original publications about his algorithm may be found at http://compression.graphicon.ru/download/articles/huff/huffman_1952_minimum-redundancy-codes.pdf