Data Coding Run Length Coding

Slides:



Advertisements
Similar presentations
Data Compression CS 147 Minh Nguyen.
Advertisements

Lecture 4 (week 2) Source Coding and Compression
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
SIMS-201 Compressing Information. 2  Overview Chapter 7: Compression Introduction Entropy Huffman coding Universal coding.
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Algorithms for Data Compression
Lossless Compression - II Hao Jiang Computer Science Department Sept. 18, 2007.
Lecture04 Data Compression.
Text Operations: Coding / Compression Methods. Text Compression Motivation –finding ways to represent the text in fewer bits –reducing costs associated.
Data Structures – LECTURE 10 Huffman coding
Lecture 4 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan
Fundamentals of Multimedia Chapter 7 Lossless Compression Algorithms Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
Lossless Data Compression Using run-length and Huffman Compression pages
Data Compression Basics & Huffman Coding
1 Lossless Compression Multimedia Systems (Module 2) r Lesson 1: m Minimum Redundancy Coding based on Information Theory: Shannon-Fano Coding Huffman Coding.
Lossless Compression Multimedia Systems (Module 2 Lesson 3)
Huffman Codes Message consisting of five characters: a, b, c, d,e
8. Compression. 2 Video and Audio Compression Video and Audio files are very large. Unless we develop and maintain very high bandwidth networks (Gigabytes.
Noiseless Coding. Introduction Noiseless Coding Compression without distortion Basic Concept Symbols with lower probabilities are represented by the binary.
1 Lossless Compression Multimedia Systems (Module 2 Lesson 2) Summary:  Adaptive Coding  Adaptive Huffman Coding Sibling Property Update Algorithm 
Source Coding-Compression
Page 110/6/2015 CSE 40373/60373: Multimedia Systems So far  Audio (scalar values with time), image (2-D data) and video (2-D with time)  Higher fidelity.
Huffman Encoding Veronica Morales.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
Data Structures and Algorithms Lecture (BinaryTrees) Instructor: Quratulain.
Multimedia Data Introduction to Lossless Data Compression Dr Sandra I. Woolley Electronic, Electrical.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Lecture 5.
ICS 220 – Data Structures and Algorithms Lecture 11 Dr. Ken Cosh.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Huffman Code and Data Decomposition Pranav Shah CS157B.
UNIT II TEXT COMPRESSION. a. Outline Compression techniques Run length coding Huffman coding Adaptive Huffman Coding Arithmetic coding Shannon-Fano coding.
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
CS654: Digital Image Analysis Lecture 34: Different Coding Techniques.
Huffman’s Algorithm 11/02/ Weighted 2-tree A weighted 2-tree T is an extended binary tree with n external nodes and each of the external nodes is.
Lossless Compression(2)
ENTROPY & RUN LENGTH CODING. Contents What is Entropy coding? Huffman Encoding Huffman encoding Example Arithmetic coding Encoding Algorithms for arithmetic.
Lempel-Ziv-Welch Compression
Chapter 7 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information Theory 7.3 Run-Length Coding 7.4 Variable-Length Coding (VLC) 7.5.
Chapter 7 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information Theory 7.3 Run-Length Coding 7.4 Variable-Length Coding (VLC) 7.5.
Lampel ZIV (LZ) code The Lempel-Ziv algorithm is a variable-to-fixed length code Basically, there are two versions of the algorithm LZ77 and LZ78 are the.
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
Compression and Huffman Coding. Compression Reducing the memory required to store some information. Lossless compression vs lossy compression Lossless.
Lossless Compression-Statistical Model Lossless Compression One important to note about entropy is that, unlike the thermodynamic measure of entropy,
Data Compression Michael J. Watts
HUFFMAN CODES.
COMP261 Lecture 22 Data Compression 2.
Compression & Huffman Codes
Data Compression.
Digital Image Processing Lecture 20: Image Compression May 16, 2005
Increasing Information per Bit
Data Compression.
Applied Algorithmics - week7
Lempel-Ziv-Welch (LZW) Compression Algorithm
Huffman Coding, Arithmetic Coding, and JBIG2
Data Compression CS 147 Minh Nguyen.
Optimal Merging Of Runs
Chapter 8 – Binary Search Tree
Why Compress? To reduce the volume of data to be transmitted (text, fax, images) To reduce the bandwidth required for transmission and to reduce storage.
Optimal Merging Of Runs
Chapter 11 Data Compression
Huffman Coding CSE 373 Data Structures.
Huffman Encoding Huffman code is method for the compression for standard text documents. It makes use of a binary tree to develop codes of varying lengths.
Greedy: Huffman Codes Yin Tat Lee
Trees Addenda.
CSE 589 Applied Algorithms Spring 1999
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
CPS 296.3:Algorithms in the Real World
Presentation transcript:

Data Coding Run Length Coding Rather than code each symbol of a lengthy run of the same symbol, Describe the run as a symbol and a run length of that symbol.

Data Coding Entropy Log2(1/prob(head)) = 1 (“0”) The Entropy of a sequence of symbols is the theoretical minimum average number of bits that can represent a sequence of symbols, given the probability of occurrence of each symbol of the sequence. Suppose we had a coin flicking experiment recording the number of times or probability a head or tail occurs: Prob(head) = 0.5 Prob(tail) = 0.5 Log2(1/prob(symbol)) gives the optimum number of bits required to encode a symbol given its probability of occurrence: Log2(1/prob(head)) = 1 (“0”) Log2(1/prob(tail)) = 1 (“1”) Multiply each with the probability of occurrence of the symbol provides the average number of bits required to encode a symbol: [prob(head) x Log2(1/prob(head))] + [prob(tail) x Log2(1/prob(tail))] [0.5 x 1] + [0.5 x 1] = 1 In practice the number of symbols and probability of occurrence of each symbol is more complicated than this simple example, so an intuitive assignment of bits to represent symbols is not possible. Instead the following algorithms have been devised to achieve this: Shannon-Fano, Huffman, Arithmetic and Lempel-Ziv-Welch.

Data Coding Huffman Coding Encoding for Huffman Coding Algorithm (A bottom-up approach ): Initialization: Put all nodes in an OPEN list, keep it sorted at all times (e.g., ABCDE). Repeat until the OPEN list has only one node left: (a) From OPEN pick two nodes having the lowest frequencies/probabilities, create a parent node of them. (b) Assign the sum of the children's frequencies/probabilities to the parent node and insert it into OPEN. (c) Assign code 0, 1 to the two branches of the tree, and delete the children from OPEN.

Data Coding Huffman Coding: Encoding Algorithm Encoding for Huffman Coding Algorithm (with sorting after each merge) 1 (39) (24) A (15) 1 (13) (24) (11) 1 B (7) (13) C (6) D (6) 1 (11) E (5)

Data Coding Huffman Coding: Encoding Algorithm loge(x) = y x = ey log10(x) = log10(ey) log10(x) = ylog10(e) y = [log10(x)/log10(e)] 39 TOTAL (# of bits): 87 entropy = (15 x 1.38 + 7 x 2.48 + 6 x 2.7 + 6 x 2.7 + 5 x 2.96) / 39 = 85.26 / 39 = 2.19 Number of bits needed for Huffman Coding: 87/39 = 2.23

Data Coding Adaptive Huffman Coding: Motivations The previous algorithms requires apriori statistical knowledge which is often not available (e.g., live audio, video). Even when it is available, it could be a heavy overhead especially when many tables had to be sent when a non-order 0 model is used, i.e. taking into account the impact of the previous symbol to the probability of the current symbol (e.g., "qu" often come together, ...). The solution is to use adaptive algorithms.

Data Coding Adaptive Huffman Coding: Algorithm encoder and decoder use exactly the same initialization and update_model routines. ENCODER DECODER ------------ ------------ Initialize_model(); Initialize_model(); while ((c = getc (input)) != eof) while ((c = decode (input)) != eof) { { encode (c, output); putc (c, output); update_model (c); update_model (c); } }

Data Coding Adaptive Huffman Coding: Example update_model does two things: (a) increment the count, (b) update the Huffman tree. During the updates, the Huffman tree will maintained its sibling property, i.e. the nodes (internal and leaf) are arranged in order of increasing weights (see figure).

Data Coding Adaptive Huffman Coding: Example (cont.) When swapping is necessary, the farthest node with weight W is swapped with the node whose weight has just been increased to W+1. Note: If the node with weight W has a subtree beneath it, then the subtree will go with it. The Huffman tree could look very different after node swapping, e.g., in the third tree, node A is again swapped and becomes the #5 node. It is now encoded using only 2 bits.

Data Coding Adaptive Huffman Coding: Example (cont.)

Data Coding Arithmetic Coding: Algorithm A message is represented by an interval of real numbers (floating point) between 0.0 and 1.0. As message becomes larger, the interval needed to represent it becomes smaller, and the number of bits needed to specify that interval grows. A single number (floating point) in the interval can be uniquely decoded to create the exact stream of symbols that went into its construction.

Data Coding Arithmetic: Example Consider letters (a, e, i, o, u, EOS), where EOS represents end of message As an example the message "eaiiEOS" is coded

Data Coding Arithmetic: Example

Data Coding Lempel-Ziv-Welch Compression Algorithms Huffman and Arithmetic coding assume a stationary source. Lempel-Ziv-Welch is an adaptive lossless coding technique which “learns” its symbols as it codes. Original methods due to Ziv and Lempel in 1977 and 1978. Terry Welch improved the scheme in 1984 (called LZW compression). It is used in e.g., zip, gzip, pkzip, winzip, GIF, V.42 bis, Stacker. Reference: Terry A. Welch, "A Technique for High Performance Data Compression", IEEE Computer, Vol. 17, No. 6, 1984, pp. 8-19.

Data Coding: Lempel-Ziv (LZ78) Binary Compression Algorithm LZ78 algorithm maintains a stack or tree containing all the phrases into which it has divided the portion of data sequence it has parsed so far. The next phrase is formed by concatenating two items 1. The phrase in the structure that achieves the longest match with the beginning of the as yet unparsed portion of the data, 2. The source datum beyond the end of this maximal match.

Data Coding Lempel-Ziv LZ78 Binary Coding Example Input string is: 0, 00, 1, 01, 10, 000, 010, 100, 1001, 0001, 001, 1 ....... LZ78 Tree is: Log2(k) = x k = 2x log10k=xlog102 x = log10k/ log102

Data Coding LZ78S, LZ78E Compression Algorithm LZ78S algorithm improves compression performance by noting that when a node is specified for the second time as being an ancestor node for the current phrase, everyone immediately knows that this phrase must end with the remaining descendant of that node rather than the descendant that was appended earlier. (E.g. if 0 ancestor has already been selected then we know the other ancestor must be 1) LZ78E algorithm improves compression performance by noting that after phrase 3 has been parsed, root node will never be used as the ancestor node because both its descendants are now in tree. after phrase 4 has been parsed, node 1 will never be used as the ancestor node because both its descendants are now in the tree. These “dead” nodes now no longer need to be coded.

Data Coding Lempel-Ziv-Welch Compression Algorithm Given Webster's English dictionary which contains about 159,000 entries find a way to build the dictionary adaptively. w = NIL; while ( read a character k ) { if wk exists in the dictionary w = wk; else add wk to the dictionary; output the code for w; w = k; }

Data Coding Lempel-Ziv-Welch Coding: Example Input string is "^WED^WE^WEE^WEB^WET". A 19-symbol input has been reduced to 7-symbol plus 5-code output. Each code/symbol will need more than 8 bits, say 9 bits. Usually, compression doesn't start until a large number of bytes (e.g., > 100) are read in.

Data Coding Lempel-Ziv-Welch Decompression Algorithm read a character k; output k; w = k; while ( read a character k ) /* k could be a character or a code. */ { entry = dictionary entry for k; output entry; add w + entry[0] to dictionary; w = entry; }

Data Coding: Lempel-Ziv-Welch Decoding Example Input string is "^WED<256>E<260><261><257>B<260>T".

Data Coding Binary Lempel-Ziv-Welch Coding: Example Input string is “00010110000010100100100010011”.