Compression techniques. Why we need compression. Types of compression –Lossy and lossless Concentrate on lossless techniques. Run Length coding. Entropy.

Slides:



Advertisements
Similar presentations
Data compression. INTRODUCTION If you download many programs and files off the Internet, we have probably encountered.
Advertisements

Introduction What are we going to learn? Module outline. Some details. Assessment.
15 Data Compression Foundations of Computer Science ã Cengage Learning.
Data Compression CS 147 Minh Nguyen.
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Lecture 4 (week 2) Source Coding and Compression
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
SIMS-201 Compressing Information. 2  Overview Chapter 7: Compression Introduction Entropy Huffman coding Universal coding.
Greedy Algorithms (Huffman Coding)
Lecture 10 : Huffman Encoding Bong-Soo Sohn Assistant Professor School of Computer Science and Engineering Chung-Ang University Lecture notes : courtesy.
Data Compression Michael J. Watts
Compression & Huffman Codes
Huffman Encoding 16-Apr-17.
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
CSCI 3 Chapter 1.8 Data Compression. Chapter 1.8 Data Compression  For the purpose of storing or transferring data, it is often helpful to reduce the.
A Data Compression Algorithm: Huffman Compression
Is ASCII the only way? For computers to do anything (besides sit on a desk and collect dust) they need two things: 1. PROGRAMS 2. DATA A program is a.
Compression & Huffman Codes Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Fundamentals of Multimedia Chapter 7 Lossless Compression Algorithms Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
Lossless Data Compression Using run-length and Huffman Compression pages
EEE377 Lecture Notes1 EEE436 DIGITAL COMMUNICATION Coding En. Mohd Nazri Mahmud MPhil (Cambridge, UK) BEng (Essex, UK) Room 2.14.
Spring 2015 Mathematics in Management Science Binary Linear Codes Two Examples.
CSE Lectures 22 – Huffman codes
MA/CSSE 473 Day 31 Student questions Data Compression Minimal Spanning Tree Intro.
Prof. Amr Goneid Department of Computer Science & Engineering
Communication Technology in a Changing World Week 2.
Image Compression (Chapter 8) CSC 446 Lecturer: Nada ALZaben.
Multimedia Data Introduction to Lossless Data Compression Dr Sandra I. Woolley Electronic, Electrical.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
COMPRESSION. Compression in General: Why Compress? So Many Bits, So Little Time (Space) CD audio rate: 2 * 2 * 8 * = 1,411,200 bps CD audio storage:
An introduction to audio/video compression Dr. Malcolm Wilson.
Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 8. Greedy Algorithms.
Design of Novel Two-Level Quantizer with Extended Huffman Coding for Laplacian Source Lazar Velimirović, Miomir Stanković, Zoran Perić, Jelena Nikolić,
Huffman Code and Data Decomposition Pranav Shah CS157B.
Lecture 4: Lossless Compression(1) Hongli Luo Fall 2011.
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
CS654: Digital Image Analysis Lecture 34: Different Coding Techniques.
STATISTIC & INFORMATION THEORY (CSNB134) MODULE 11 COMPRESSION.
Lossless Decomposition and Huffman Codes Sophia Soohoo CS 157B.
1 Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004.
Chapter 7 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information Theory 7.3 Run-Length Coding 7.4 Variable-Length Coding (VLC) 7.5.
Lecture 12 Huffman Algorithm. In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly.
ECE 101 An Introduction to Information Technology Information Coding.
Huffman code and Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
An introduction to audio/video compression Prepared by :: Bhatt shivani ( )
Compression and Huffman Coding. Compression Reducing the memory required to store some information. Lossless compression vs lossy compression Lossless.
Submitted To-: Submitted By-: Mrs.Sushma Rani (HOD) Aashish Kr. Goyal (IT-7th) Deepak Soni (IT-8 th )
Data Compression: Huffman Coding in Weiss (p.389)
Data Compression Michael J. Watts
Design & Analysis of Algorithm Huffman Coding
Huffman Codes ASCII is a fixed length 7 bit code that uses the same number of bits to define each character regardless of how frequently it occurs. Huffman.
HUFFMAN CODES.
Compression & Huffman Codes
Data Compression.
ISNE101 – Introduction to Information Systems and Network Engineering
Data Compression.
Data Compression CS 147 Minh Nguyen.
Data Compression If you’ve ever sent a large file to a friend, you may have compressed it into a zip archive like the one on this slide before doing so.
Chapter 11 Data Compression
Communication Technology in a Changing World
Communication Technology in a Changing World
Huffman Encoding Huffman code is method for the compression for standard text documents. It makes use of a binary tree to develop codes of varying lengths.
15 Data Compression Foundations of Computer Science ã Cengage Learning.
File Compression Even though disks have gotten bigger, we are still running short on disk space A common technique is to compress files so that they take.
Huffman Encoding.
15 Data Compression Foundations of Computer Science ã Cengage Learning.
Presentation transcript:

Compression techniques. Why we need compression. Types of compression –Lossy and lossless Concentrate on lossless techniques. Run Length coding. Entropy or variable length coding.Huffman coding. DCT (Discrete Cosine Transform) –Not a compression technique itself, but allows the introduction of other techniques.

Compression Digitised sound and video produces a lot of data. In particular digitised television quality pictures produce data at 270 Mbits/second which is faster than most hard disks, CD roms and networks devices can accommodate. We need to compress data for use on computers.

Compression We have two types of compression. Lossy compression and lossless compression. As the names suggest lossy compression loses some of the original signal, while lossless does not. Lossless techniques such as run-length encoding and Huffman coding achieve compression by creating shorter codes. This is not always possible.

Compression Lossy techniques rely on throwing away some information which the viewer or listener will not notice too much. Involves changing the data to some other form. (Transform) Most lossy techniques are noticeable. The more lossy compression that is applied, the more the compression effect will be noticeable.

Probability. Consider the throwing of a die. What is the probability of, say of throwing a 5. In this equal probability problem the probability of throwing any specified number between 1 and 6 is a sixth.

Probability. Now make up a short sentence, for example. –This is the best class that I have ever taught – The sentence does not have to be true for the exercise. Work out the probability of finding an e in the sentence. The probability of finding any given letter is not equal in this example. There are four es in the sentence which have a total of 37 letters the probability of finding an e is then 4/37.

Information. When we send pictures, sound and text we are sending information. Information is closely related to probability. For example, if the die had the same number on each side then we would know the answer of any throw without being given any information. The lower the probability of a piece of data then the greater the information.

Entropy (variable length) coding (VLC) The idea is to give shorter codes to values (symbols) which occur most frequently and longer codes to infrequently occurring values. Therefore more information takes longer codes and less information is given shorter codes. Huffman coding is an example of such a variable length code.

Huffman coding The following algorithm generates Huffman code: –Find (or assume) the probability of each values occurrence. –Order the values in a row of a table according to their probability. –Take the two symbols with the lowest probability, and place them as leaves on a binary tree. – Form a new row in the table replacing the these two symbols symbols with a new symbol. This new symbol forms a branch node in the tree. Draw it in the tree with branches to its leaf (component) symbols –Assign the new symbol a probability equal to the sum of the component symbols probability.

Huffman coding –Repeat the above until there is only one symbol left. This is the root of the tree. –Nominally assign 1s to the right hand branches and 0s to the left hand branches at each node. –Read the code for each symbol from the root of the tree.

Huffman coding Examples –Form a Huffman code based upon the following symbols and associated probabilities (in brackets) A(0.5) B(0.15) C(0.15) D(0.1) E(0.1) Form Huffman tree: Take 2 symbols with lowest probability add as leaves to the tree (see next slide), and create new row combining these 2 symbols, with a probability equal to the sum of the 2 symbols probability: A(0.5) B(0.15) C(0.15) DE(0.2) Draw branch node DE on the tree connecting to D and E Continue repeat the above until one symbol left. A(0.5) BC(0.3) DE(0.2) A(0.5) BCDE(0.5) ABCDE(1) Try your own with the following symbols A(0.2) B(0.1) C(0.3) D(0.05) E(0.35)

Huffman coding Examples

Limits of Huffman coding (worst case) When all the probabilities are equal. That is there is no statistical bias. Example A(1/8), B(1/8), C(1/8), D(1/8) E(1/8), F(1/8), G(1/8). H(1/8) Figures in brackets are probabilities Construct Huffman tree: A(1/8), B(1/8), C(1/8), D(1/8) E(1/8), F(1/8), G(1/8). H(1/8) AB(1/4), C(1/8), D(1/8) E(1/8), F(1/8), G(1/8). H(1/8) AB(1/4), CD(1/4), E(1/8), F(1/8), G(1/8). H(1/8) AB(1/4), CD(1/4), EF(1/4), G(1/8), H(1/8) AB(1/4), CD(1/4), EF(1/4), GH(1/4) ABCD(1/2), EFGH(1/2) ABCDEFGH(1)

Limits of Huffman coding (worst case) Reading the codes A111E011 B110F010 C101G001 D100H000

Limits of Huffman coding (best case) When all the probabilities change in powers of 2. That is there is optimum statistical bias. Example A(1/128), B(1/128), C(1/64), D(1/32) E(1/16), F(1/8), G(1/4). H(1/2) Figures in brackets are probabilities Construct Huffman tree: A(1/128), B(1/128), C(1/64), D(1/32), E(1/16), F(1/8), G(1/4). H(1/2) AB(1/64), C(1/64), D(1/32), E(1/16), F(1/8), G(1/4). H(1/2) ABC(1/32), D(1/32), E(1/16), F(1/8), G(1/4). H(1/2) ABCD(1/16), E(1/16), F(1/8), G(1/4). H(1/2) ABCDE(1/8), F(1/8), G(1/4). H(1/2) ABCDEF(1/4), G(1/4). H(1/2) ABCDEFG(1/2). H(1/2) ABCDEFGH(1)

Limits of Huffman coding (best case) Reading the codes A E 1110 B F 110 C G 10 D H 0

Huffman coding Examples –Repeat the above until there is only one symbol left. This is the root of the tree. –Nominally assign 1s to the right hand branches and 0s to the left hand branches at each node. –Read the code for each symbol from the root of the tree.

Run length coding Another lossless technique. Suppose we have a sequence of values: –S= –The sequence uses 17 separate values. We could code this by saying: –We have one 1, three 2s, 2 1s …….. In run length code this would be – –Taking only 12 values No use if we dont have runs – five values would be coded. – taking ten values.

Run length coding We also have to decide and specify how many spaces we will leave for the data and how much for the run length value. For example, in the above the values and the run lengths are all less than 10, the spaces are inserted to explain the principle. The code could mean 11 3s, 22 1s, 53 2s and 14 6s if we did not know the allocation of data for the values and the run length. It will be inefficient to allocate this data without consideration of the original data.

Exercises Calculate a Huffman code for your sentence above. Check what compression is achieved. Express the following sequence as a run length code, specifying your data allocation