SIMS-201 Compressing Information. 2  Overview Chapter 7: Compression Introduction Entropy Huffman coding Universal coding.

Slides:



Advertisements
Similar presentations
Compression techniques. Why we need compression. Types of compression –Lossy and lossless Concentrate on lossless techniques. Run Length coding. Entropy.
Advertisements

Data Compression CS 147 Minh Nguyen.
Lecture 4 (week 2) Source Coding and Compression
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Greedy Algorithms (Huffman Coding)
Lecture 10 : Huffman Encoding Bong-Soo Sohn Assistant Professor School of Computer Science and Engineering Chung-Ang University Lecture notes : courtesy.
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Data Compression Michael J. Watts
Compression & Huffman Codes
Huffman Encoding 16-Apr-17.
SWE 423: Multimedia Systems
Fundamental limits in Information Theory Chapter 10 :
CSCI 3 Chapter 1.8 Data Compression. Chapter 1.8 Data Compression  For the purpose of storing or transferring data, it is often helpful to reduce the.
Text Operations: Coding / Compression Methods. Text Compression Motivation –finding ways to represent the text in fewer bits –reducing costs associated.
CS336: Intelligent Information Retrieval
Computer Science 335 Data Compression.
Compression & Huffman Codes Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Information Theory Eighteenth Meeting. A Communication Model Messages are produced by a source transmitted over a channel to the destination. encoded.
Document and Query Forms Chapter 2. 2 Document & Query Forms Q 1. What is a document? A document is a stored data record in any form A document is a stored.
Fundamentals of Multimedia Chapter 7 Lossless Compression Algorithms Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
Lossless Data Compression Using run-length and Huffman Compression pages
EEE377 Lecture Notes1 EEE436 DIGITAL COMMUNICATION Coding En. Mohd Nazri Mahmud MPhil (Cambridge, UK) BEng (Essex, UK) Room 2.14.
Source Coding Hafiz Malik Dept. of Electrical & Computer Engineering The University of Michigan-Dearborn
Data Compression Basics & Huffman Coding
Spring 2015 Mathematics in Management Science Binary Linear Codes Two Examples.
8. Compression. 2 Video and Audio Compression Video and Audio files are very large. Unless we develop and maintain very high bandwidth networks (Gigabytes.
Chapter 2 Source Coding (part 2)
1 Analysis of Algorithms Chapter - 08 Data Compression.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
Multimedia Specification Design and Production 2012 / Semester 1 / L3 Lecturer: Dr. Nikos Gazepidis
Communication Technology in a Changing World Week 2.
1 Introduction to Information Technology LECTURE 5 COMPRESSION.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
ALGORITHMS FOR ISNE DR. KENNETH COSH WEEK 13.
Huffman Coding. Huffman codes can be used to compress information –Like WinZip – although WinZip doesn’t use the Huffman algorithm –JPEGs do use Huffman.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
COMPRESSION. Compression in General: Why Compress? So Many Bits, So Little Time (Space) CD audio rate: 2 * 2 * 8 * = 1,411,200 bps CD audio storage:
Introduction to Algorithms Chapter 16: Greedy Algorithms.
Huffman Code and Data Decomposition Pranav Shah CS157B.
CS Spring 2011 CS 414 – Multimedia Systems Design Lecture 6 – Basics of Compression (Part 1) Klara Nahrstedt Spring 2011.
Lecture 4: Lossless Compression(1) Hongli Luo Fall 2011.
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
CS654: Digital Image Analysis Lecture 34: Different Coding Techniques.
Digital Image Processing Lecture 22: Image Compression
STATISTIC & INFORMATION THEORY (CSNB134) MODULE 11 COMPRESSION.
Multi-media Data compression
1 Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004.
Lecture 12 Huffman Algorithm. In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly.
ECE 101 An Introduction to Information Technology Information Coding.
Huffman code and Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Compression and Huffman Coding. Compression Reducing the memory required to store some information. Lossless compression vs lossy compression Lossless.
Lossless Compression-Statistical Model Lossless Compression One important to note about entropy is that, unlike the thermodynamic measure of entropy,
Submitted To-: Submitted By-: Mrs.Sushma Rani (HOD) Aashish Kr. Goyal (IT-7th) Deepak Soni (IT-8 th )
Data Compression: Huffman Coding in Weiss (p.389)
Data Compression Michael J. Watts
Design & Analysis of Algorithm Huffman Coding
HUFFMAN CODES.
Data Coding Run Length Coding
Compression & Huffman Codes
Digital Image Processing Lecture 20: Image Compression May 16, 2005
Data Compression.
ISNE101 – Introduction to Information Systems and Network Engineering
Data Compression.
Data Compression CS 147 Minh Nguyen.
Why Compress? To reduce the volume of data to be transmitted (text, fax, images) To reduce the bandwidth required for transmission and to reduce storage.
Chapter 11 Data Compression
Data Structure and Algorithms
Huffman Encoding.
Presentation transcript:

SIMS-201 Compressing Information

2  Overview Chapter 7: Compression Introduction Entropy Huffman coding Universal coding

3  Introduction Compression techniques can significantly reduce the bandwidth and memory required for sending, receiving, and storing data. Most computers are equipped with modems that compress or decompress all information leaving or entering via the line. With a mutually recognized system (e.g. WinZip) the amount of data can be significantly diminished. Examples of compression techniques: Compressing BINARY DATA STREAMS Variable length coding (e.g. Huffman coding) Universal Coding (e.g. WinZip) IMAGE-SPECIFIC COMPRESSION (will will see that images are well suited for compression) GIF and JPEG VIDEO COMPRESSION MPEG World Wide Web not World Wide Wait

4 Why can we compress information? Compression is possible because information usually contains redundancies, or information that is often repeated. For example, two still images from a video sequence of images are often similar. This fact can be exploited by transmitting only the changes from one image to the next. For example, a line of data often contains redundancies: File compression programs remove this redundancy. “Ask not what your country can do for you - ask what you can do for your country.”

5 Some characters occur more frequently than others. It’s possible to represent frequently occurring characters with a smaller number of bits during transmission. This may be accomplished by a variable length code, as opposed to a fixed length code like ASCII. An example of a simple variable length code is Morse Code. “E” occurs more frequently than “Z” so we represent “E” with a shorter length code:. = E - = T = Z = Q

6 Information Theory Variable length coding exploits the fact that some information occurs more frequently than others. The mathematical theory behind this concept is known as: INFORMATION THEORY Claude E. Shannon developed modern Information Theory at Bell Labs in He saw the relationship between the probability of appearance of a transmitted signal and its information content. This realization enabled the development of compression techniques.

7 A Little Probability Shannon (and others) found that information can be related to probability. An event has a probability of 1 (or 100%) if we believe this event will occur. An event has a probability of 0 (or 0%) if we believe this event will not occur. The probability that an event will occur takes on values anywhere from 0 to 1. Consider a coin toss: heads or tails each has a probability of.50 In two tosses, the probability of tossing two heads is: 1/2 x 1/2 = 1/4 or.25 In three tosses, the probability of tossing all tails is: 1/2 x 1/2 x 1/2 = 1/8 or.125 We compute probability this way because the result of each toss is independent of the results of other tosses.

8  Entropy If the probability of a binary event is.5 (like a coin), then, on average, you need one bit to represent the result of this event. As the probability of a binary event increases or decreases, the number of bits you need, on average, to represent the result decreases The figure is expressing that unless an event is totally random, you can convey the information of the event in fewer bits, on average, than it might first appear Let’s do an example... As part of information theory, Shannon developed the concept of ENTROPY Probability of an event Bits

9 Example from text.. The probability of male patrons is.8 The probability of female patrons is.2 Assume for this example, groups of two enter the store. Calculate the probabilities of different pairings: Event A, Male-Male. P(MM) =.8 x.8 =.64 Event B, Male-Female. P(MF) =.8 x.2 =.16 Event C, Female-Male. P(FM) =.2 x.8 =.16 Event D, Female-Female. P(FF) =.2 x.2 =.04 We could assign the longest codes to the most infrequent events while maintaining unique decodability. A MEN’S SPECIALTY STORE

10 Let’s assign a unique string of bits to each event based on the probability of that event occurring. EventNameCode AMale-Male0 BMale-Female10 CFemale-Male110 DFemale-Female111 Given a received code of: , determine the events: The above example has used a variable length code. Example (cont..) A MM B MF B MF C FM B MF A MM

11 Variable Length Coding Unlike fixed length codes like ASCII, variable length codes: Assign the longest codes to the most infrequent events. Assign the shortest codes to the most frequent events. Each code word must be uniquely identifiable regardless of length. Examples of Variable Length Coding Morse Code Huffman Coding Takes advantage of the probabilistic nature of information. If we have total uncertainty about the information we are conveying, fixed length codes are preferred.

12 Morse Code Characters represented by patterns of dots and dashes. More frequently used letters use short code symbols. Short pauses are used to separate the letters. Represent “Hello” using Morse Code: H.... E. L. -.. O- - - Hello

13  Huffman Coding Creates a Binary Code Tree Nodes connected by branches with leaves Top node – root Two branches from each node D B C A Start Root Branches Node Leaves The Huffman coding procedure finds the optimum, uniquely decodable, variable length code associated with a set of events, given their probabilities of occurrence.

14 A0 B10 C110 D 111 Given the adjacent Huffman code tree, decode the following sequence: Huffman Coding D B C A Start Root Branches Node Leaves C 10 B 0A0A 0A0A 111 D 0A0A

15 Huffman Code Construction First list all events in descending order of probability. Pair the two events with lowest probabilities and add their probabilities..3 Event A.3 Event B.13 Event C.12 Event D.1 Event E.05 Event F.3 Event A.3 Event B.13 Event C.12 Event D.1 Event E.05 Event F 0.15

16 Repeat for the pair with the next lowest probabilities..3 Event A.3 Event B.13 Event C.12 Event D.1 Event E.05 Event F Huffman Code Construction

17 Huffman Code Construction Repeat for the pair with the next lowest probabilities..3 Event A.3 Event B.13 Event C.12 Event D.1 Event E.05 Event F

18 Repeat for the pair with the next lowest probabilities..3 Event A.3 Event B.13 Event C.12 Event D.1 Event E.05 Event F Huffman Code Construction

19 Repeat for the last pair and add 0s to the left branches and 1s to the right branches..3 Event A.3 Event B.13 Event C.12 Event D.1 Event E.05 Event F Huffman Code Construction

20 Exercise Given the code we just constructed: Event A: 00Event B: 01 Event C: 100Event D: 101 Event E: 110Event F: 111 How can you decode the string: ? Starting from the leftmost bit, find the shortest bit pattern that matches one of the codes in the list. The first bit is 0, but we don’t have an event represented by 0. We do have one represented by 00, which is event A. Continue applying this procedure: 00 A 00 A 111 F 01 B 01 B 100 C 01 B 00 A 00 A 00 A 111 F

21  Universal Coding Huffman has its limits We must know a priori the probability of the characters or symbols we are encoding. What if a document is “one of a kind?” Universal Coding schemes do not require a knowledge of the statistics of the events to be coded. Universal Coding is based on the realization that any stream of data consists of some repetition. Lempel-Ziv coding is one form of Universal Coding presented in the text. Compression results from reusing frequently occurring strings. Works better for long data streams. Inefficient for short strings. Used by WinZip to compress information.

22 Lempel-Ziv Coding The basis for Lempel-Ziv coding is the idea that we can achieve compression of a string by always coding a series of zeroes and ones as some previous string (prefix string) plus one new bit. Compression results from reusing frequently occurring strings We will not go through Lempel-Ziv coding in detail..