Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.

Slides:



Advertisements
Similar presentations
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Advertisements

Lecture 4 (week 2) Source Coding and Compression
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
Greedy Algorithms (Huffman Coding)
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Algorithms for Data Compression
Data Compression Michael J. Watts
Lecture04 Data Compression.
Compression & Huffman Codes
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
CSc 461/561 CSc 461/561 Multimedia Systems Part B: 1. Lossless Compression.
SWE 423: Multimedia Systems
A Data Compression Algorithm: Huffman Compression
Chapter 9: Huffman Codes
Fundamentals of Multimedia Chapter 7 Lossless Compression Algorithms Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
Lossless Data Compression Using run-length and Huffman Compression pages
Data Compression Basics & Huffman Coding
1 Lossless Compression Multimedia Systems (Module 2) r Lesson 1: m Minimum Redundancy Coding based on Information Theory: Shannon-Fano Coding Huffman Coding.
Spring 2015 Mathematics in Management Science Binary Linear Codes Two Examples.
Basics of Compression Goals: to understand how image/audio/video signals are compressed to save storage and increase transmission efficiency to understand.
8. Compression. 2 Video and Audio Compression Video and Audio files are very large. Unless we develop and maintain very high bandwidth networks (Gigabytes.
Chapter 2 Source Coding (part 2)
Source Coding-Compression
296.3Page 1 CPS 296.3:Algorithms in the Real World Data Compression: Lecture 2.5.
Page 110/6/2015 CSE 40373/60373: Multimedia Systems So far  Audio (scalar values with time), image (2-D data) and video (2-D with time)  Higher fidelity.
Data Structures and Algorithms Lecture (BinaryTrees) Instructor: Quratulain.
Multimedia Data Introduction to Lossless Data Compression Dr Sandra I. Woolley Electronic, Electrical.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
COMPRESSION. Compression in General: Why Compress? So Many Bits, So Little Time (Space) CD audio rate: 2 * 2 * 8 * = 1,411,200 bps CD audio storage:
Introduction to Algorithms Chapter 16: Greedy Algorithms.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Huffman Code and Data Decomposition Pranav Shah CS157B.
CS Spring 2011 CS 414 – Multimedia Systems Design Lecture 6 – Basics of Compression (Part 1) Klara Nahrstedt Spring 2011.
Lecture 4: Lossless Compression(1) Hongli Luo Fall 2011.
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
CS654: Digital Image Analysis Lecture 34: Different Coding Techniques.
Foundation of Computing Systems
Lossless Compression(2)
Lossless Decomposition and Huffman Codes Sophia Soohoo CS 157B.
Multi-media Data compression
1 Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004.
Chapter 7 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information Theory 7.3 Run-Length Coding 7.4 Variable-Length Coding (VLC) 7.5.
Lecture 12 Huffman Algorithm. In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly.
Chapter 7 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information Theory 7.3 Run-Length Coding 7.4 Variable-Length Coding (VLC) 7.5.
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
Compression and Huffman Coding. Compression Reducing the memory required to store some information. Lossless compression vs lossy compression Lossless.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 18.
1 SWE 423 – Multimedia System. 2 SWE Multimedia System Introduction  Compression is the process of coding that will effectively reduce the total.
Lossless Compression-Statistical Model Lossless Compression One important to note about entropy is that, unlike the thermodynamic measure of entropy,
Chapter 7 Lossless Compression Algorithms
Data Compression: Huffman Coding in Weiss (p.389)
Data Compression Michael J. Watts
HUFFMAN CODES.
Data Coding Run Length Coding
Data Compression.
Algorithms in the Real World
Applied Algorithmics - week7
ISNE101 – Introduction to Information Systems and Network Engineering
Huffman Coding, Arithmetic Coding, and JBIG2
Chapter 9: Huffman Codes
Chapter 11 Data Compression
Huffman Coding CSE 373 Data Structures.
Huffman Encoding Huffman code is method for the compression for standard text documents. It makes use of a binary tree to develop codes of varying lengths.
Data Structure and Algorithms
CSE 589 Applied Algorithms Spring 1999
Analysis of Algorithms CS 477/677
Presentation transcript:

Lossless Compression CIS 465 Multimedia

Compression Compression: the process of coding that will effectively reduce the total number of bits needed to represent certain information.

Compression There are two main categories  Lossless  Lossy Compression ratio:

Information Theory We define the entropy  of an information source with alphabet S = {s 1, s 2, …, s n } as p i - probability that s i occurs in the source and log 2 1/p i is amount of information in s i

Information Theory Figure (a) has a maximum entropy of 256  (1/256  log 2 256) = 8. Any other distribution has lower entropy

Entropy and Code Length The entropy  gives a lower bound on the average number of bits needed to code a symbol in the alphabet    l where l is the average bit length of the code words produced by the encoder assuming a memoryless source

Run-Length Coding Run-length coding is a very widely used and simple compression technique which does not assume a memoryless source  We replace runs of symbols (possibly of length one) with pairs of (run-length, symbol)  For images, the maximum run-length is the size of a row

Variable Length Coding A number of compression techniques are based on the entropy ideas seen previously. These are known as entropy coding or variable length coding  The number of bits used to code symbols in the alphabet is variable  Two famous entropy coding techniques are Huffman coding and Arithmetic coding

Huffman Coding Huffman coding constructs a binary tree starting with the probabilities of each symbol in the alphabet  The tree is built in a bottom-up manner  The tree is then used to find the codeword for each symbol  An algorithm for finding the Huffman code for a given alphabet with associated probabilities is given in the following slide

Huffman Coding Algorithm 1. Initialization: Put all symbols on a list sorted according to their frequency counts. 2. Repeat until the list has only one symbol left: a. From the list pick two symbols with the lowest frequency counts. Form a Huffman subtree that has these two symbols as child nodes and create a parent node.

Huffman Coding Algorithm b. Assign the sum of the children's frequency counts to the parent and insert it into the list such that the order is maintained. c. Delete the children from the list. 3. Assign a codeword for each leaf based on the path from the root.

Huffman Coding Algorithm

Properties of Huffman Codes No Huffman code is the prefix of any other Huffman codes so decoding is unambiguous The Huffman coding technique is optimal (but we must know the probabilities of each symbol for this to be true) Symbols that occur more frequently have shorter Huffman codes

Huffman Coding Variants:  In extended Huffman coding we group the symbols into k symbols giving an extended alphabet of n k symbols  This leads to somewhat better compression  In adaptive Huffman coding we don’t assume that we know the exact probabilities  Start with an estimate and update the tree as we encode/decode Arithmetic Coding is a newer (and more complicated) alternative which usually performs better

Dictionary-based Coding LZW uses fixed-length codewords to represent variable-length strings of symbols/characters that commonly occur together, e.g., words in English text. The LZW encoder and decoder build up the same dictionary dynamically while receiving the data. LZW places longer and longer repeated entries into a dictionary, and then emits the code for an element, rather than the string itself, if the element has already been placed in the dictionary.

LZW Compression Algorithm

LZW Compression Example We will compress the string  "ABABBABCABABBA" Initially the dictionary is the following

LZW Example CodeString 1a 2b 2c

LZW Example

LZW Decompression

LZW Decompression Example

Quadtrees Quadtrees are both an indexing structure for and compression scheme for binary images  A quadtree is a tree where each non-leaf node has four children  Each node is labelled either B (black), W (white) or G (gray)  Leaf nodes can only be B or W

Quadtrees Algorithm for construction of a quadtree for an N  N binary image:  1. If the binary images contains only black pixels, label the root node B and quit.  2. Else if the binary image contains only white pixels, label the root node W and quit.  3. Otherwise create four child nodes corresponding to the 4 N/4  N/4 quadrants of the binary image.  4. For each of the quadrants, recursively repeat steps 1 to 3. (In worst case, recursion ends when each sub- quadrant is a single pixel).

Quadtree Example

Lossless JPEG JPEG offers both lossy (common) and lossless (uncommon) modes. Lossless mode is much different than lossy (and also gives much worse results)  Added to JPEG standard for completeness

Lossless JPEG Lossless JPEG employs a predictive method combined with entropy coding. The prediction for the value of a pixel (greyscale or color component) is based on the value of up to three neighboring pixels

Lossless JPEG One of 7 predictors is used (choose the one which gives the best result for this pixel).

Lossless JPEG Now code the pixel as the pair (predictor- used, difference from predicted method) Code this pair using a lossless method such as Huffman coding  The difference is usually small so entropy coding gives good results  Can only use a limited number of methods on the edges of the image