Compression & Huffman Codes

Slides:



Advertisements
Similar presentations
Lecture 4 (week 2) Source Coding and Compression
Advertisements

Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Greedy Algorithms Amihood Amir Bar-Ilan University.
SIMS-201 Compressing Information. 2  Overview Chapter 7: Compression Introduction Entropy Huffman coding Universal coding.
Greedy Algorithms (Huffman Coding)
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Data Compression Michael J. Watts
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
Compression Techniques. Digital Compression Concepts ● Compression techniques are used to replace a file with another that is smaller ● Decompression.
Optimal Merging Of Runs
© 2004 Goodrich, Tamassia Greedy Method and Compression1 The Greedy Method and Text Compression.
A Data Compression Algorithm: Huffman Compression
DL Compression – Beeri/Feitelson1 Compression דחיסה Introduction Information theory Text compression IL compression.
Compression & Huffman Codes Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
CS 206 Introduction to Computer Science II 04 / 29 / 2009 Instructor: Michael Eckmann.
CS 206 Introduction to Computer Science II 12 / 10 / 2008 Instructor: Michael Eckmann.
Greedy Algorithms Huffman Coding
Lossless Data Compression Using run-length and Huffman Compression pages
Data Compression Basics & Huffman Coding
Data Compression Gabriel Laden CS146 – Dr. Sin-Min Lee Spring 2004.
Spring 2015 Mathematics in Management Science Binary Linear Codes Two Examples.
Huffman Codes Message consisting of five characters: a, b, c, d,e
Data Compression1 File Compression Huffman Tries ABRACADABRA
Huffman Encoding Veronica Morales.
1 Analysis of Algorithms Chapter - 08 Data Compression.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
Image Compression (Chapter 8) CSC 446 Lecturer: Nada ALZaben.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
COMPRESSION. Compression in General: Why Compress? So Many Bits, So Little Time (Space) CD audio rate: 2 * 2 * 8 * = 1,411,200 bps CD audio storage:
Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 8. Greedy Algorithms.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Huffman Code and Data Decomposition Pranav Shah CS157B.
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
CS654: Digital Image Analysis Lecture 34: Different Coding Techniques.
Bahareh Sarrafzadeh 6111 Fall 2009
Lossless Decomposition and Huffman Codes Sophia Soohoo CS 157B.
1 Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004.
Lecture 12 Huffman Algorithm. In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly.
بسم الله الرحمن الرحيم My Project Huffman Code. Introduction Introduction Encoding And Decoding Encoding And Decoding Applications Applications Advantages.
Greedy Algorithms Analysis of Algorithms.
Huffman code and Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Computer Sciences Department1. 2 Data Compression and techniques.
Compression and Huffman Coding. Compression Reducing the memory required to store some information. Lossless compression vs lossy compression Lossless.
Submitted To-: Submitted By-: Mrs.Sushma Rani (HOD) Aashish Kr. Goyal (IT-7th) Deepak Soni (IT-8 th )
Data Compression: Huffman Coding in Weiss (p.389)
Data Compression Michael J. Watts
Design & Analysis of Algorithm Huffman Coding
HUFFMAN CODES.
COMP261 Lecture 22 Data Compression 2.
Compression & Huffman Codes
Data Compression.
Tries 07/28/16 11:04 Text Compression
Tries 5/27/2018 3:08 AM Tries Tries.
Chapter 5. Greedy Algorithms
Applied Algorithmics - week7
The Greedy Method and Text Compression
The Greedy Method and Text Compression
Data Compression.
Huffman Coding, Arithmetic Coding, and JBIG2
Optimal Merging Of Runs
Chapter 9: Huffman Codes
Why Compress? To reduce the volume of data to be transmitted (text, fax, images) To reduce the bandwidth required for transmission and to reduce storage.
Optimal Merging Of Runs
Huffman Coding CSE 373 Data Structures.
Data Structure and Algorithms
Podcast Ch23d Title: Huffman Compression
Algorithms CSCI 235, Spring 2019 Lecture 30 More Greedy Algorithms
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Presentation transcript:

Compression & Huffman Codes

Compression Definition Benefits Reduce size of data (number of bits needed to represent data) Benefits Reduce storage needed Reduce transmission cost / latency / bandwidth

Sources of Compressibility Redundancy Recognize repeating patterns Exploit using Dictionary Variable length encoding Human perception Less sensitive to some information Can discard less important data

Types of Compression Lossless Lossy Preserves all information Exploits redundancy in data Applied to general data Lossy May lose some information Exploits redundancy & human perception Applied to audio, image, video

Effectiveness of Compression Metrics Bits per byte (8 bits) 2 bits / byte  ¼ original size 8 bits / byte  no compression Percentage 75% compression  ¼ original size

Effectiveness of Compression Depends on data Random data  hard Example: 1001110100  ? Organized data  easy Example: 1111111111  110 Corollary No universally best compression algorithm

Effectiveness of Compression Lossless Compression is not always possible If compression is always possible (alternative view) Compress file (reduce size by 1 bit) Recompress output Repeat (until we can store data with 0 bits)

Lossless Compression Techniques LZW (Lempel-Ziv-Welch) compression Build pattern dictionary Replace patterns with index into dictionary Run length encoding Find & compress repetitive sequences Huffman codes Use variable length codes based on frequency

Huffman Code Approach Principle Variable length encoding of symbols Exploit statistical frequency of symbols Efficient when symbol probabilities vary widely Principle Use fewer bits to represent frequent symbols Use more bits to represent infrequent symbols A A B A A A B A

Huffman Code Example Expected size Symbol A B C D Frequency 13% 25% Original  1/82 + 1/42 + 1/22 + 1/82 = 2 bits / symbol Huffman  1/83 + 1/42 + 1/21 + 1/83 = 1.75 bits / symbol Symbol A B C D Frequency 13% 25% 50% 12% Original Encoding 00 01 10 11 2 bits Huffman Encoding 110 111 3 bits 1 bit

Huffman Code Data Structures Binary (Huffman) tree Represents Huffman code Edge  code (0 or 1) Leaf  symbol Path to leaf  encoding Example A = “110”, B = “10”, C = “0” Priority queue To efficiently build binary tree 1 D C B A

Huffman Code Algorithm Overview Encoding Calculate frequency of symbols in file Create binary tree representing “best” encoding Use binary tree to encode compressed file For each symbol, output path from root to leaf Size of encoding = length of path Save binary tree

Huffman Code – Creating Tree Algorithm Place each symbol in leaf Weight of leaf = symbol frequency Select two trees L and R (initially leafs) Such that L, R have lowest frequencies in tree Create new (internal) node Left child  L Right child  R New frequency  frequency( L ) + frequency( R ) Repeat until all nodes merged into one tree

Huffman Tree Construction 1 5 8 2 7 3

Huffman Tree Construction 2 5 8 7 3 2 5

Huffman Tree Construction 3 8 7 3 2 C 5 5 10

Huffman Tree Construction 4 8 7 3 2 C 15 5 5 10

Huffman Tree Construction 5 3 2 C E I 1 5 8 7 5 1 1 15 10 1 25

Huffman Coding Example Huffman code Input ACE Output (111)(10)(01) = 1111001 E = 01 I = 00 C = 10 A = 111 H = 110

Huffman Code Algorithm Overview Decoding Read compressed file & binary tree Use binary tree to decode file Follow path from root to leaf

Huffman Decoding 1 A H 1111001 3 2 C E I 1 5 8 7 5 1 1 15 10 1 25

Huffman Decoding 2 A H 1111001 3 2 C E I 1 5 8 7 5 1 1 15 10 1 25

Huffman Decoding 3 A H 1111001 A 3 2 C E I 1 5 8 7 5 1 1 15 10 1 25

Huffman Decoding 4 A H 1111001 A 3 2 C E I 1 5 8 7 5 1 1 15 10 1 25

Huffman Decoding 5 A H 1111001 AC 3 2 C E I 1 5 8 7 5 1 1 15 10 1 25

Huffman Decoding 6 A H 1111001 AC 3 2 C E I 1 5 8 7 5 1 1 15 10 1 25

Huffman Decoding 7 A H 1111001 ACE 3 2 C E I 1 5 8 7 5 1 1 15 10 1 25

Huffman Code Properties Prefix code No code is a prefix of another code Example Huffman(“I”)  00 Huffman(“X”)  001 // not legal prefix code Can stop as soon as complete code found No need for end-of-code marker Nondeterministic Multiple Huffman coding possible for same input If more than two trees with same minimal weight

Huffman Code Properties Greedy algorithm Chooses best local solution at each step Combines 2 trees with lowest frequency Still yields overall best solution Optimal prefix code Based on statistical frequency Better compression possible (depends on data) Using other approaches (e.g., pattern dictionary)