DL - 2004Compression – Beeri/Feitelson1 Compression דחיסה Introduction Information theory Text compression IL compression.

Slides:



Advertisements
Similar presentations
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Advertisements

Lecture 4 (week 2) Source Coding and Compression
Applied Algorithmics - week7
Lecture 3: Source Coding Theory TSBK01 Image Coding and Data Compression Jörgen Ahlberg Div. of Sensor Technology Swedish Defence Research Agency (FOI)
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Greedy Algorithms Amihood Amir Bar-Ilan University.
Data Structures: A Pseudocode Approach with C 1 Chapter 6 Objectives Upon completion you will be able to: Understand and use basic tree terminology and.
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Lecture04 Data Compression.
Compression & Huffman Codes
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
Lecture 6: Huffman Code Thinh Nguyen Oregon State University.
Optimal Merging Of Runs
Binary Trees A binary tree is made up of a finite set of nodes that is either empty or consists of a node called the root together with two binary trees,
A Data Compression Algorithm: Huffman Compression
Lecture 6: Greedy Algorithms I Shang-Hua Teng. Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization.
Compression & Huffman Codes Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
CS 206 Introduction to Computer Science II 04 / 29 / 2009 Instructor: Michael Eckmann.
Data Structures – LECTURE 10 Huffman coding
Chapter 9: Huffman Codes
CS 206 Introduction to Computer Science II 12 / 10 / 2008 Instructor: Michael Eckmann.
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
CSE 326 Huffman coding Richard Anderson. Coding theory Conversion, Encryption, Compression Binary coding Variable length coding A B C D E F.
Variable-Length Codes: Huffman Codes
Data Compression Gabriel Laden CS146 – Dr. Sin-Min Lee Spring 2004.
CSE Lectures 22 – Huffman codes
Algorithm Design & Analysis – CS632 Group Project Group Members Bijay Nepal James Hansen-Quartey Winter
Data Compression1 File Compression Huffman Tries ABRACADABRA
1 Analysis of Algorithms Chapter - 08 Data Compression.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
CS-2852 Data Structures LECTURE 13B Andrew J. Wozniewicz Image copyright © 2010 andyjphoto.com.
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
COMPRESSION. Compression in General: Why Compress? So Many Bits, So Little Time (Space) CD audio rate: 2 * 2 * 8 * = 1,411,200 bps CD audio storage:
Introduction to Algorithms Chapter 16: Greedy Algorithms.
Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 8. Greedy Algorithms.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
CS654: Digital Image Analysis Lecture 34: Different Coding Techniques.
Bahareh Sarrafzadeh 6111 Fall 2009
Lossless Decomposition and Huffman Codes Sophia Soohoo CS 157B.
1 Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004.
بسم الله الرحمن الرحيم My Project Huffman Code. Introduction Introduction Encoding And Decoding Encoding And Decoding Applications Applications Advantages.
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
Compression and Huffman Coding. Compression Reducing the memory required to store some information. Lossless compression vs lossy compression Lossless.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 18.
ENTROPY Entropy measures the uncertainty in a random experiment. Let X be a discrete random variable with range S X = { 1,2,3,... k} and pmf p k = P X.
Design & Analysis of Algorithm Huffman Coding
HUFFMAN CODES.
CSC317 Greedy algorithms; Two main properties:
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
Compression & Huffman Codes
Chapter 5 : Trees.
Applied Algorithmics - week7
Chapter 8 – Binary Search Tree
Chapter 9: Huffman Codes
Math 221 Huffman Codes.
Advanced Algorithms Analysis and Design
Chapter 11 Data Compression
Huffman Coding CSE 373 Data Structures.
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
Greedy: Huffman Codes Yin Tat Lee
Trees Addenda.
Data Structure and Algorithms
Podcast Ch23d Title: Huffman Compression
Huffman Coding Greedy Algorithm
CSE 589 Applied Algorithms Spring 1999
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Analysis of Algorithms CS 477/677
Presentation transcript:

DL Compression – Beeri/Feitelson1 Compression דחיסה Introduction Information theory Text compression IL compression

DL Compression – Beeri/Feitelson2 General: Compression methods depend on data characteristic  there is no universal (best) method Requirements : text, EL’s: lossless images – may be lossy efficiency -- how may bits per byte of data? (often in percentage) coding should be fast, decoding superfast

DL Compression – Beeri/Feitelson3 Compression vs. communications: Minor difference: Communication is always on-line, Compression is on/off line (off-line: complete file given) source destination fileline noisenoise

DL Compression – Beeri/Feitelson4 A general model for statistics-based compression: Same model must be used at both sides Model is (often) stored in compressed file – its size affects compression efficiency Model coder Model decoder

DL Compression – Beeri/Feitelson5 Appetizer: Huffman coding (standard) binary coding: Uniquely decodable Model = table efficiency: bits/symbol (no/little compression) Can do better if symbol frequencies are known: frequent symbol – short code rare symbol – long code Minimizes the average

DL Compression – Beeri/Feitelson6 Assume: Huffman’s Algorithm (eager construction of code tree): Allocate a node for each symbol, weight = symbol probability Enter nodes into priority queue Q (small weights first) While |Q|>1 { –Remove two first nodes (smallest weights) –Create new node, make it their parent, assign it the sum of their weights –Enter new node into Q } Return: single node in Q (root of tree)

DL Compression – Beeri/Feitelson7 Example: Q: { } 1/4 1/8 1/41/2 1/8 1/4 1/2 1 1/4 1

DL Compression – Beeri/Feitelson8 How are the trees used? Coding: for each symbol s, output binary path from root to leaf(s) Decoding: read incoming stream of bits, follow path from root of tree. When leaf(s) reached, output s, and return to root. Common model (stored on both sides) : the tree

DL Compression – Beeri/Feitelson9 Expected cost bits/symbol: Binary: Huffman : In example: binary: 2 Huffman : 1/2x1 + ¼x2 + 1/8x3 + 1/8x3 = 1.75 Q: what would be the tree and cost for: 5/12, 1/3, 1/6, 1/12 ?

DL Compression – Beeri/Feitelson10 A note on Huffman trees: The algorithm is non-deterministic: In each step, either node can be the left child of new parent If two children of a node are exchanged, result is also a Huffman tree Closure under rotation w.r.t nodes Consider 0.4, 0.2, 0.2, 0.1, 0.1 after 1 st step, 2 out of 3 nodes are selected  There are many Huffman trees for a given probability distribution

DL Compression – Beeri/Feitelson11 Concepts: variable length code: (e.g. Huffman) uniquely decodable code: each legal code sequence is generated by a unique source sequence instantaneous/prefix code מיידי end of code of each symbol can be recognized Examples: 0, 010, 01, 10 10, 00, 11, 110 0, 10, 110, 111 (Huffman of example) (comma code) 0, 01, 011, 111 (inverted comma code)

DL Compression – Beeri/Feitelson12 A prefix code = binary tree Every binary tree with q leaves is a prefix code for q symbols, lengths of code words = lengths of paths Kraft inequality: Exists a q-leaf tree with path lengths iff =1 iff tree is complete

DL Compression – Beeri/Feitelson13 Proof :  assume exists a tree T Take T’ to be the full tree of depth The number of its leaves: A leaf of T, at distance from root has leaves of T’ under it Sum on all leaves of T: Full: all paths same length T

DL Compression – Beeri/Feitelson14 If T is not complete (every node has 0/2 children) it has a node with a single child  Can be “shortened” new tree still satisfies hence given tree must satisfy  Only complete trees have equality Comment: In general a prefix code that is not a complete tree is dominated by a tree with smaller cost From now: tree are complete

DL Compression – Beeri/Feitelson15  : Assume Lemma: if Replace these two by their sum (hence q-1 lengths) and use induction Assume must the tree be complete?

DL Compression – Beeri/Feitelson16 MacMillan Theorem : exists a uniquely decodable code with lengths iff Corollary: when there is a uniquely decodeable code, there is also a prefix code (same cost)  No need to think about the first class Uniquely decodable prefix

DL Compression – Beeri/Feitelson17 On optimality of Huffman: Cost of a tree/code T: L(T) = Claim: if a tree T does not satisfy then it is dominated by a tree with smaller cost Claim: for any T, Proof: can assume T satisfies (*) Use induction: Q=2: both trees have lengths 1,1

DL Compression – Beeri/Feitelson18 Q>1: In Huffman tree, there are two maximal paths that end in sibling nodes In T, the paths for last two symbols are longest (by (*)) but their ends may not be siblings But, T is complete, hence the leaf with has a sibling with same length; exchange with the leaf corresponding to Now, in both trees, these two longest paths can be replaced by their parents  Case of q-1 (induction hypothesis)

DL Compression – Beeri/Feitelson19 Summary: Huffman trees are optimal hence satisfy (*) Any two Huffman trees have equal costs Huffman trees have min cost among all trees (codes)