1 Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004.

Slides:



Advertisements
Similar presentations
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Advertisements

Lecture 4 (week 2) Source Coding and Compression
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Huffman Encoding Dr. Bernard Chen Ph.D. University of Central Arkansas.
Greedy Algorithms (Huffman Coding)
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Lecture04 Data Compression.
Compression & Huffman Codes
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
Optimal Merging Of Runs
A Data Compression Algorithm: Huffman Compression
DL Compression – Beeri/Feitelson1 Compression דחיסה Introduction Information theory Text compression IL compression.
Data Structures – LECTURE 10 Huffman coding
Chapter 9: Huffman Codes
Fundamentals of Multimedia Chapter 7 Lossless Compression Algorithms Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
Lossless Data Compression Using run-length and Huffman Compression pages
EEE377 Lecture Notes1 EEE436 DIGITAL COMMUNICATION Coding En. Mohd Nazri Mahmud MPhil (Cambridge, UK) BEng (Essex, UK) Room 2.14.
Data Compression Basics & Huffman Coding
1 Lossless Compression Multimedia Systems (Module 2) r Lesson 1: m Minimum Redundancy Coding based on Information Theory: Shannon-Fano Coding Huffman Coding.
Representation of Strings  Background  Huffman Encoding.
Data Compression Gabriel Laden CS146 – Dr. Sin-Min Lee Spring 2004.
Huffman Codes Message consisting of five characters: a, b, c, d,e
Data Structures and Algorithms Huffman compression: An Application of Binary Trees and Priority Queues.
Compression Algorithms
Chapter 2 Source Coding (part 2)
Huffman Codes. Encoding messages  Encode a message composed of a string of characters  Codes used by computer systems  ASCII uses 8 bits per character.
Huffman Encoding Veronica Morales.
1 Analysis of Algorithms Chapter - 08 Data Compression.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
CS-2852 Data Structures LECTURE 13B Andrew J. Wozniewicz Image copyright © 2010 andyjphoto.com.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
ICS 220 – Data Structures and Algorithms Lecture 11 Dr. Ken Cosh.
ALGORITHMS FOR ISNE DR. KENNETH COSH WEEK 13.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
COMPRESSION. Compression in General: Why Compress? So Many Bits, So Little Time (Space) CD audio rate: 2 * 2 * 8 * = 1,411,200 bps CD audio storage:
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Huffman Code and Data Decomposition Pranav Shah CS157B.
CS654: Digital Image Analysis Lecture 34: Different Coding Techniques.
Huffman’s Algorithm 11/02/ Weighted 2-tree A weighted 2-tree T is an extended binary tree with n external nodes and each of the external nodes is.
Bahareh Sarrafzadeh 6111 Fall 2009
1 Algorithms CSCI 235, Fall 2015 Lecture 30 More Greedy Algorithms.
Lossless Decomposition and Huffman Codes Sophia Soohoo CS 157B.
Huffman Coding The most for the least. Design Goals Encode messages parsimoniously No character code can be the prefix for another.
Chapter 7 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information Theory 7.3 Run-Length Coding 7.4 Variable-Length Coding (VLC) 7.5.
Lecture 12 Huffman Algorithm. In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly.
Huffman encoding.
Huffman code and Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
Compression and Huffman Coding. Compression Reducing the memory required to store some information. Lossless compression vs lossy compression Lossless.
Lossless Compression-Statistical Model Lossless Compression One important to note about entropy is that, unlike the thermodynamic measure of entropy,
HUFFMAN CODES.
COMP261 Lecture 22 Data Compression 2.
Assignment 6: Huffman Code Generation
Algorithms for iSNE Dr. Kenneth Cosh Week 13.
ISNE101 – Introduction to Information Systems and Network Engineering
Huffman Coding, Arithmetic Coding, and JBIG2
Optimal Merging Of Runs
Chapter 8 – Binary Search Tree
Optimal Merging Of Runs
Chapter 11 Data Compression
Huffman Coding CSE 373 Data Structures.
Huffman Encoding Huffman code is method for the compression for standard text documents. It makes use of a binary tree to develop codes of varying lengths.
Trees Addenda.
Data Structure and Algorithms
Algorithms CSCI 235, Spring 2019 Lecture 30 More Greedy Algorithms
CSE 589 Applied Algorithms Spring 1999
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Presentation transcript:

1 Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004

2

3 Introduction  Compression is used to reduce the volume of information to be stored into storages or to reduce the communication bandwidth required for its transmission over the networks

4

5 Compression Principles  Entropy Encoding  Run-length encoding  Lossless & Independent of the type of source information  Used when the source information comprises long substrings of the same character or binary digit (string or bit pattern, # of occurrences), as FAX e.g) ……  0,7 1, 10, 0,5 1,2……  7,10,5,2……

6 Compression Principles  Entropy Encoding  Statistical encoding  Based on the probability of occurrence of a pattern  The more probable, the shorter codeword  “Prefix property”: a shorter codeword must not form the start of a longer codeword

7 Compression Principles  Huffman Encoding  Entropy, H: theoretical min. avg. # of bits that are required to transmit a particular stream H = -Σ i=1 n P i log 2 P i where n: # of symbols, P i : probability of symbol i  Efficiency, E = H/H’ where, H’ = avr. # of bits per codeword = Σ i=1 n N i P i N i : # of bits of symbol i

8  E.g) symbols M(10), F(11), Y(010), N(011), 0(000), 1(001) with probabilities 0.25, 0.25, 0.125, 0.125, 0.125,  H’ = Σ i=1 6 N i P i = (2(2  0.25) + 4(3  0.125)) = 2.5 bits/codeword  H = -Σ i=1 6 P i log 2 P i = - (2(0.25log ) + 4(0.125log )) = 2.5  E = H/H’ =100 %  3-bit/codeword if we use fixed-length codewords for six symbols

9 Huffman Algorithm Method of construction for an encoding tree Full Binary Tree Representation Each edge of the tree has a value, (0 is the left child, 1 is the right child) Data is at the leaves, not internal nodes Result: encoding tree “Variable-Length Encoding”

10 Huffman Algorithm 1. Maintain a forest of trees 2. Weight of tree = sum frequency of leaves 3. For 0 to N-1 –Select two smallest weight trees –Form a new tree

11 Huffman coding variable length code whose length is inversely proportional to that character’s frequency must satisfy nonprefix property to be uniquely decodable two pass algorithm –first pass accumulates the character frequency and generate codebook –second pass does compression with the codebook

12 create codes by constructing a binary tree 1. consider all characters as free nodes 2. assign two free nodes with lowest frequency to a parent nodes with weights equal to sum of their frequencies 3. remove the two free nodes and add the newly created parent node to the list of free nodes 4. repeat step2 and 3 until there is one free node left. It becomes the root of tree Huffman coding

13 Right of binary tree :1 Left of Binary tree :0 Prefix (example) –e:”01”, b: “010” –“01” is prefix of “010” ==> “e0” same frequency : need consistency of left or right

14 Example(64 data) RKKKKKKK KKKRRKKK KKRRRR GG KKBCCCRR GGGMCBRR BBBMYBBR GGGGGGGR GRRRRGRR

15 Color frequency Huffman code ================================= R1900 K1701 G1410 B7110 C41110 M Y111111

16

17 Static Huffman Coding  Huffman (Code) Tree  Given : a number of symbols (or characters) and their relative probabilities in prior  Must hold “prefix property” among codes Symbol Occurrence A 4/8 B 2/8 C 1/8 D 1/8 Symbol Code A 1 B 01 C 001 D     3 = 14 bits are required to transmit “AAAABBCD” 0 1 D A B C Leaf node Root node Branch node Prefix Property !

18 The end