Lossless Decomposition and Huffman Codes Sophia Soohoo CS 157B.

Slides:



Advertisements
Similar presentations
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Advertisements

Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Huffman Encoding Dr. Bernard Chen Ph.D. University of Central Arkansas.
Greedy Algorithms (Huffman Coding)
Lecture 10 : Huffman Encoding Bong-Soo Sohn Assistant Professor School of Computer Science and Engineering Chung-Ang University Lecture notes : courtesy.
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Compression & Huffman Codes
Huffman Encoding 16-Apr-17.
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
Optimal Merging Of Runs
Chapter 9: Greedy Algorithms The Design and Analysis of Algorithms.
A Data Compression Algorithm: Huffman Compression
DL Compression – Beeri/Feitelson1 Compression דחיסה Introduction Information theory Text compression IL compression.
Data Structures – LECTURE 10 Huffman coding
Fundamentals of Multimedia Chapter 7 Lossless Compression Algorithms Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
Data Compression Gabriel Laden CS146 – Dr. Sin-Min Lee Spring 2004.
Huffman Codes Message consisting of five characters: a, b, c, d,e
Algorithm Design & Analysis – CS632 Group Project Group Members Bijay Nepal James Hansen-Quartey Winter
Huffman Codes. Encoding messages  Encode a message composed of a string of characters  Codes used by computer systems  ASCII uses 8 bits per character.
Huffman Encoding Veronica Morales.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
CS-2852 Data Structures LECTURE 13B Andrew J. Wozniewicz Image copyright © 2010 andyjphoto.com.
Prof. Amr Goneid Department of Computer Science & Engineering
1 Huffman Codes Drozdek Chapter Objectives You will be able to Construct an optimal variable bit length code for an alphabet with known probability.
© Jalal Kawash 2010 Trees & Information Coding: 3 Peeking into Computer Science.
Communication Technology in a Changing World Week 2.
Data Structures and Algorithms Lecture (BinaryTrees) Instructor: Quratulain.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
ICS 220 – Data Structures and Algorithms Lecture 11 Dr. Ken Cosh.
ALGORITHMS FOR ISNE DR. KENNETH COSH WEEK 13.
© Jalal Kawash 2010 Trees & Information Coding Peeking into Computer Science.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
Trees (Ch. 9.2) Longin Jan Latecki Temple University based on slides by Simon Langley and Shang-Hua Teng.
Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 8. Greedy Algorithms.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Huffman Code and Data Decomposition Pranav Shah CS157B.
Huffman Encodings Section 9.4. Data Compression: Array Representation Σ denotes an alphabet used for all strings Each element in Σ is called a character.
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
Bahareh Sarrafzadeh 6111 Fall 2009
Trees (Ch. 9.2) Longin Jan Latecki Temple University based on slides by Simon Langley and Shang-Hua Teng.
1 Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004.
1 Huffman Codes. 2 ASCII use same size encoding for all characters. Variable length codes can produce shorter messages than fixed length codes Huffman.
Huffman code and Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Compression and Huffman Coding. Compression Reducing the memory required to store some information. Lossless compression vs lossy compression Lossless.
D ESIGN & A NALYSIS OF A LGORITHM 12 – H UFFMAN C ODING Informatics Department Parahyangan Catholic University.
Design & Analysis of Algorithm Huffman Coding
Huffman Codes ASCII is a fixed length 7 bit code that uses the same number of bits to define each character regardless of how frequently it occurs. Huffman.
HUFFMAN CODES.
COMP261 Lecture 22 Data Compression 2.
Assignment 6: Huffman Code Generation
Greedy Technique.
Algorithms for iSNE Dr. Kenneth Cosh Week 13.
ISNE101 – Introduction to Information Systems and Network Engineering
Optimal Merging Of Runs
Data Compression If you’ve ever sent a large file to a friend, you may have compressed it into a zip archive like the one on this slide before doing so.
Chapter 9: Huffman Codes
Analysis & Design of Algorithms (CSCE 321)
Optimal Merging Of Runs
Math 221 Huffman Codes.
Advanced Algorithms Analysis and Design
Chapter 11 Data Compression
Huffman Coding CSE 373 Data Structures.
Huffman Encoding Huffman code is method for the compression for standard text documents. It makes use of a binary tree to develop codes of varying lengths.
Trees Addenda.
Data Structure and Algorithms
Huffman Encoding.
Podcast Ch23d Title: Huffman Compression
Huffman Coding Greedy Algorithm
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Presentation transcript:

Lossless Decomposition and Huffman Codes Sophia Soohoo CS 157B

Lossless Data Compression o Any compression algorithm can be viewed as a function that maps sequences of units into other sequences of units. o The original data to be reconstructed from the compressed data. - Lossless o Lossless is in contrast to lossy data compression, which only allows an approximation of the original data to be reconstructed in exchange for better compression rates.

David A. Huffman o BS Electrical Engineering at Ohio State University o Worked as a radar maintenance officer for the US Navy o PhD student, Electrical Engineering at MIT 1952 o Was given the choice of writing a term paper or to take a final exam o Paper topic: most efficient method for representing numbers, letters or other symbols as binary code

Huffman Coding o Uses the minimum number of bits o Variable length coding – good for data transfer o Different symbols have different lengths o Symbols with the most frequency will result in shorter codewords o Symbols with lower frequency will have longer codewords o “Z” will have a longer code representation then “E” if looking at the frequency of character occurrences in an alphabet o No codeword is a prefix for another codeword!

Decoding SymbolCode E0 T11 N100 I1010 S1011 To determine the original message, read the string of bits from left to right and use the table to determine the individual symbols Decode the following:

Decoding 11 SymbolCode E0 T11 N100 I1010 S T ENNIS Original String:

Representing a Huffman Table as a Binary Tree o Codewords are presented by a binary tree o Each leaf stores a character o Each node has two children o Left = 0 o Right = 1 o The codeword is the path from the root to the leaf storing a given character o The code is represented by the leads of the tree is the prefix code

Constructing Huffman Codes o Goal: construct a prefix code for Σ: associate each letter i with a codeword w i to minimize the average codeword length:

Example Letterpipi wiwi A B C0.201 D0.310 E0.311 Where p i = probability of w i

Algorithm o Make a leaf node for node symbol o Add the generation probability for each symbol to the leaf node o Take the two leaf nodes with the smallest probability (p i ) and connect them into a new node (which becomes the parent of those nodes) o Add 1 for the right edge o Add 0 for the left edge o The probability of the new node is the sum of the probabilities of the two connecting nodes o If there is only one node left, the code construction is completed. If not, to back to (2)

Example SymbolProbability A0.387 B0.194 C0.161 D0.129 E

Example – Creating the tree D C A B E SymbolProbability A0.387 B0.194 C0.161 D0.129 E

Example – Iterate Step 2 Take the two leaf nodes with the smallest probability (p i ) and connect them into a new node (which becomes the parent of those nodes) o Green nodes – nodes to be evaluated o White nodes – nodes which have already been evaluated o Blue nodes – nodes which are added in this iteration D C A B E

Example – Iterate Step 2 D C A B E Note: when two nodes are connected by a parent, the parent should be evaluated in the next iteration

D B A C E Example – Iterate Step 2

Example: Completed Tree D C A B E

Example: Table for Huffman Code SymbolProbability A0 B111 C110 D100 E101 Generate the table by reading from the root node to the leaves for each symbol

Practice SymbolOccurrencesHuffman Code A0.45? B0.13? C0.12? D0.16? E0.09? F0.05?

Practice Solution C A 0.45 D 0.16 B F 0.05 E

Questions?

References o ithms/huffman.php ithms/huffman.php o o html html o o htm htm