Design & Analysis of Algorithm Huffman Coding

Slides:



Advertisements
Similar presentations
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Advertisements

Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Greedy Algorithms Amihood Amir Bar-Ilan University.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture3.
Huffman Encoding Dr. Bernard Chen Ph.D. University of Central Arkansas.
Greedy Algorithms (Huffman Coding)
Lecture 10 : Huffman Encoding Bong-Soo Sohn Assistant Professor School of Computer Science and Engineering Chung-Ang University Lecture notes : courtesy.
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Compression & Huffman Codes
A Data Compression Algorithm: Huffman Compression
Is ASCII the only way? For computers to do anything (besides sit on a desk and collect dust) they need two things: 1. PROGRAMS 2. DATA A program is a.
Compression & Huffman Codes Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
CS 206 Introduction to Computer Science II 04 / 29 / 2009 Instructor: Michael Eckmann.
Chapter 9: Huffman Codes
CS 206 Introduction to Computer Science II 12 / 10 / 2008 Instructor: Michael Eckmann.
CSE 143 Lecture 18 Huffman slides created by Ethan Apter
Lossless Data Compression Using run-length and Huffman Compression pages
Data Compression Basics & Huffman Coding
Data Compression Gabriel Laden CS146 – Dr. Sin-Min Lee Spring 2004.
Spring 2015 Mathematics in Management Science Binary Linear Codes Two Examples.
Huffman Codes Message consisting of five characters: a, b, c, d,e
CSE Lectures 22 – Huffman codes
Algorithm Design & Analysis – CS632 Group Project Group Members Bijay Nepal James Hansen-Quartey Winter
MA/CSSE 473 Day 31 Student questions Data Compression Minimal Spanning Tree Intro.
Huffman Codes. Encoding messages  Encode a message composed of a string of characters  Codes used by computer systems  ASCII uses 8 bits per character.
Data Compression1 File Compression Huffman Tries ABRACADABRA
Huffman Encoding Veronica Morales.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
CS-2852 Data Structures LECTURE 13B Andrew J. Wozniewicz Image copyright © 2010 andyjphoto.com.
 The amount of data we deal with is getting larger  Not only do larger files require more disk space, they take longer to transmit  Many times files.
© Jalal Kawash 2010 Trees & Information Coding: 3 Peeking into Computer Science.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
© Jalal Kawash 2010 Trees & Information Coding Peeking into Computer Science.
Huffman Coding. Huffman codes can be used to compress information –Like WinZip – although WinZip doesn’t use the Huffman algorithm –JPEGs do use Huffman.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Huffman Code and Data Decomposition Pranav Shah CS157B.
Priority Queues, Trees, and Huffman Encoding CS 244 This presentation requires Audio Enabled Brent M. Dingle, Ph.D. Game Design and Development Program.
CS654: Digital Image Analysis Lecture 34: Different Coding Techniques.
Huffman’s Algorithm 11/02/ Weighted 2-tree A weighted 2-tree T is an extended binary tree with n external nodes and each of the external nodes is.
Bahareh Sarrafzadeh 6111 Fall 2009
1 Algorithms CSCI 235, Fall 2015 Lecture 30 More Greedy Algorithms.
Lossless Decomposition and Huffman Codes Sophia Soohoo CS 157B.
1Computer Sciences Department. 2 Advanced Design and Analysis Techniques TUTORIAL 7.
Characters CS240.
Lecture 12 Huffman Algorithm. In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly.
CSE 143 Lecture 22 Huffman slides created by Ethan Apter
Huffman code and Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Compression and Huffman Coding. Compression Reducing the memory required to store some information. Lossless compression vs lossy compression Lossless.
D ESIGN & A NALYSIS OF A LGORITHM 12 – H UFFMAN C ODING Informatics Department Parahyangan Catholic University.
Data Compression: Huffman Coding in Weiss (p.389)
Huffman Codes ASCII is a fixed length 7 bit code that uses the same number of bits to define each character regardless of how frequently it occurs. Huffman.
HUFFMAN CODES.
COMP261 Lecture 22 Data Compression 2.
Compression & Huffman Codes
Assignment 6: Huffman Code Generation
Huffman Coding Based on slides by Ethan Apter & Marty Stepp
Data Compression If you’ve ever sent a large file to a friend, you may have compressed it into a zip archive like the one on this slide before doing so.
Chapter 8 – Binary Search Tree
Chapter 9: Huffman Codes
Advanced Algorithms Analysis and Design
Huffman Coding CSE 373 Data Structures.
Trees Addenda.
Data Structure and Algorithms
Podcast Ch23d Title: Huffman Compression
Algorithms CSCI 235, Spring 2019 Lecture 30 More Greedy Algorithms
Huffman Coding Greedy Algorithm
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Analysis of Algorithms CS 477/677
Presentation transcript:

Design & Analysis of Algorithm Huffman Coding Informatics Department Parahyangan Catholic University

How a Computer Stores Data ? Example: string “WOMBAT” 6 characters @8 bit = 48bits needed to store string “WOMBAT” character stream W O M B A T ASCII code 87 79 77 66 65 84 in binary 01010111 01001111 01001101 01000010 01000001 01010100

ASCII Table Not all characters are used in every occasion ! i.e., chatting app usually don’t use ÜÃÊпæ,etc.

New Code ? 52 characters only Can be coded using 6 bits So a string “WOMBAT” can be stored using 36 bits only A a 26 B 1 b 27 C 2 c 28 D 3 d 29 E 4 e 30 F 5 f 31 G 6 g 32 H 7 h 33 … Z 25 z 51 Problem is that computer nowdays uses ASCII code as a standard. String that is using our own set of code cannot be read properly, unless we specifically tell the program how to read it. What’s the problem ?

Compression In signal processing, data compression, source coding, or bit-rate reduction involves encoding information using fewer bits than the original representation. Original Data Compression Technique Copressed Data (usually smaller)

Compression Two types: lossless compression (compressed data can be reverted back to its original version. Ex: zip, rar, etc.) lossy compression (some information is discarded, so the compressed data cannot be reverted back to its original version. Ex: jpg, mp3)

Huffman Coding Huffman coding is a lossless data compression algorithm. The idea is to assign variable-length codes to input characters, lengths of the assigned codes are based on the frequencies of corresponding characters. The most frequent character gets the smallest code and the least frequent character gets the largest code.

Example String: AABAABBAAABCAACAABAA 20 characters A appears 13 times B appears 5 times C appears 2 times Normal coding : 20 x 8bits = 160 bits 2 bits coding (A=00, B=01, C=10): 20 x 2bits = 40 bits Huffman Coding (A = 0, B=10, C=11): (13 x 1 bit) + (5 x 2 bit) + (2 x 2 bit) = 27 bits

How to Build a Huffman Code? An algorithm developed by David A. Huffman while he was a Ph.D. student at MIT, and published in the 1952 paper "A Method for the Construction of Minimum-Redundancy Codes“ Huffman coding uses a specific method for choosing the representation for each symbol, resulting in a prefix code Prefix code = the code of a particular symbol is never a prefix of another symbol’s code

How to Build a Huffman Code? The algorithm uses greedy approach STEP1 : count each character’s frequency STEP2: build a binary tree which leaves contains each symbol’s frequency. The tree is built by iteratively combine 2 nodes with smallest frequency

Example Priority Queue D (1) B (3) F (3) E (7) C (8) G (10) A (12)

Example Priority Queue F (3) E (7) C (8) G (10) A (12) D (1) B (3) 4

Example Priority Queue E (7) C (8) G (10) A (12) 7 F (3) D (1) B (3) 4

Example Priority Queue C (8) G (10) A (12) E (7) F (3) D (1) B 4 7 14

Example Priority Queue A (12) C (8) G (10) 18 E (7) F (3) D (1) B 4 7 14

Example Priority Queue C (8) G (10) 18 A (12) E (7) F (3) D (1) B 4 7 14 26

Finished Binary Tree 44 C (8) G (10) 18 A (12) E (7) F (3) D (1) B 4 7 14 26

Finished Binary Tree Label each edges: left  0 right  1 44 1 C (8) G 1 Label each edges: left  0 right  1 C (8) G (10) 18 A (12) E (7) F (3) D (1) B 4 7 14 26 1 1 1 1 1

Finished Binary Tree 44 Each symbol’s code is the path from the root to that symbol’s leaf 1 C (8) G (10) 18 A (12) E (7) F (3) D (1) B 4 7 14 26 1 1 1 10 00 01 1 110 Example: CAGE = 00 10 01 110 BEAD = 11111 110 10 11110 1 1110 11110 11111

Decoding What does this code means ? 1100011100110101111101 The reader needs the huffman tree to be able to decode

Huffman Tree Tree structure and leaves’ symbol are sufficient G A E F D B Tree structure and leaves’ symbol are sufficient In practice, we cannot write anything other than 0-bit or 1-bit, so each letter is replaced by its 8-bit ASCII symbol. DFS preorder: C G A E F D B 0 0 1C 1G 0 1A 0 1E 0 1F 0 1D 1B

Exercise Draw the Huffman’s Tree: 001C1G01A01E01F01D1B Decode this message: 1100011100110101111101

Exercise Build the huffman tree for this data (space is also a symbol): TWINKLE TWINKLE LITTLE STARS Encode this string: “TWINKLE”