Computer Science 335 Data Compression.

Slides:



Advertisements
Similar presentations
T.Sharon-A.Frank 1 Multimedia Compression Basics.
Advertisements

15 Data Compression Foundations of Computer Science ã Cengage Learning.
Data Compression CS 147 Minh Nguyen.
Image Compression. Data and information Data is not the same thing as information. Data is the means with which information is expressed. The amount of.
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
SIMS-201 Compressing Information. 2  Overview Chapter 7: Compression Introduction Entropy Huffman coding Universal coding.
Data Compression Michael J. Watts
Chapter 7 End-to-End Data
School of Computing Science Simon Fraser University
Spring 2003CS 4611 Multimedia Outline Compression RTP Scheduling.
SWE 423: Multimedia Systems
Data Compression Understanding Data Communications and Networks – William A. Shay (3 rd Edition)
CSCI 3 Chapter 1.8 Data Compression. Chapter 1.8 Data Compression  For the purpose of storing or transferring data, it is often helpful to reduce the.
A Data Compression Algorithm: Huffman Compression
CS :: Fall 2003 MPEG-1 Video (Part 1) Ketan Mayer-Patel.
Fundamentals of Multimedia Chapter 7 Lossless Compression Algorithms Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
T.Sharon-A.Frank 1 Multimedia Image Compression 2 T.Sharon-A.Frank Coding Techniques – Hybrid.
Why Compress? To reduce the volume of data to be transmitted (text, fax, images) To reduce the bandwidth required for transmission and to reduce storage.
©Brooks/Cole, 2003 Chapter 15 Data Compression. ©Brooks/Cole, 2003 Realize the need for data compression. Differentiate between lossless and lossy compression.
Spring 2015 Mathematics in Management Science Binary Linear Codes Two Examples.
Image Compression - JPEG. Video Compression MPEG –Audio compression Lossy / perceptually lossless / lossless 3 layers Models based on speech generation.
Trevor McCasland Arch Kelley.  Goal: reduce the size of stored files and data while retaining all necessary perceptual information  Used to create an.
CS559-Computer Graphics Copyright Stephen Chenney Image File Formats How big is the image? –All files in some way store width and height How is the image.
Compression Algorithms Robert Buckley MCIS681 Online Dr. Smith Nova Southeastern University.
Lecture 10 Data Compression.
JPEG C OMPRESSION A LGORITHM I N CUDA Group Members: Pranit Patel Manisha Tatikonda Jeff Wong Jarek Marczewski Date: April 14, 2009.
Chapter 2 Source Coding (part 2)
Compression is the reduction in size of data in order to save space or transmission time. And its used just about everywhere. All the images you get on.
CS Spring 2012 CS 414 – Multimedia Systems Design Lecture 8 – JPEG Compression (Part 3) Klara Nahrstedt Spring 2012.
Lab #5-6 Follow-Up: More Python; Images Images ● A signal (e.g. sound, temperature infrared sensor reading) is a single (one- dimensional) quantity that.
: Chapter 12: Image Compression 1 Montri Karnjanadecha ac.th/~montri Image Processing.
Data Representation CS280 – 09/13/05. Binary (from a Hacker’s dictionary) A base-2 numbering system with only two digits, 0 and 1, which is perfectly.
D ATA C OMMUNICATIONS Compression Techniques. D ATA C OMPRESSION Whether data, fax, video, audio, etc., compression can work wonders Compression can be.
Concepts of Multimedia Processing and Transmission IT 481, Lecture 5 Dennis McCaughey, Ph.D. 19 February, 2007.
Prof. Amr Goneid Department of Computer Science & Engineering
1 i206: Lecture 2: Computer Architecture, Binary Encodings, and Data Representation Marti Hearst Spring 2012.
Image Processing and Computer Vision: 91. Image and Video Coding Compressing data to a smaller volume without losing (too much) information.
Indiana University Purdue University Fort Wayne Hongli Luo
CIS679: Multimedia Basics r Multimedia data type r Basic compression techniques.
CS 111 – Sept. 10 Quiz Data compression –text –images –sounds Commitment: –Please read rest of chapter 1. –Department picnic next Wednesday.
Huffman Coding. Huffman codes can be used to compress information –Like WinZip – although WinZip doesn’t use the Huffman algorithm –JPEGs do use Huffman.
Understanding JPEG MIT-CETI Xi’an ‘99 Lecture 10 Ben Walter, Lan Chen, Wei Hu.
COMPRESSION. Compression in General: Why Compress? So Many Bits, So Little Time (Space) CD audio rate: 2 * 2 * 8 * = 1,411,200 bps CD audio storage:
Advances in digital image compression techniques Guojun Lu, Computer Communications, Vol. 16, No. 4, Apr, 1993, pp
Lecture 4: Lossless Compression(1) Hongli Luo Fall 2011.
Chapter 1 Background 1. In this lecture, you will find answers to these questions Computers store and transmit information using digital data. What exactly.
Programming Abstractions Cynthia Lee CS106X. Topics:  Today we’re going to be talking about your next assignment: Huffman coding › It’s a compression.
JPEG.
STATISTIC & INFORMATION THEORY (CSNB134) MODULE 11 COMPRESSION.
Chapter 7 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information Theory 7.3 Run-Length Coding 7.4 Variable-Length Coding (VLC) 7.5.
ECE 101 An Introduction to Information Technology Information Coding.
Huffman code and Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
IS502:M ULTIMEDIA D ESIGN FOR I NFORMATION S YSTEM M ULTIMEDIA OF D ATA C OMPRESSION Presenter Name: Mahmood A.Moneim Supervised By: Prof. Hesham A.Hefny.
Submitted To-: Submitted By-: Mrs.Sushma Rani (HOD) Aashish Kr. Goyal (IT-7th) Deepak Soni (IT-8 th )
1 Chapter 3 Text and image compression Compression principles u Source encoders and destination decoders u Lossless and lossy compression u Entropy.
Data Compression Michael J. Watts
Textbook does not really deal with compression.
HUFFMAN CODES.
Data Compression.
Multimedia Outline Compression RTP Scheduling Spring 2000 CS 461.
JPEG.
Data Compression.
Chapter 7.2: Layer 5: Compression
Data Compression CS 147 Minh Nguyen.
CMPT 365 Multimedia Systems
UNIT IV.
Judith Molka-Danielsen, Oct. 02, 2000
15 Data Compression Foundations of Computer Science ã Cengage Learning.
15 Data Compression Foundations of Computer Science ã Cengage Learning.
Presentation transcript:

Computer Science 335 Data Compression

Compression .. Magic or Science Only works when a MORE EFFICIENT means of encoding can be found Special assumptions must be made about the data in many cases in order to gain compression benefits “Compression” can lead to larger files if the data does not conform to assumptions

Why compress? In files on a disk In internet access save disk space In internet access reduce wait time In a general queueing system keep paybacks can be more than linear if operation is nearing or in saturation

A typical queueing graph Delay 66 % delay decrease Load 25% load decrease

Example Ascii characters require 7 bits Data may not use ALL characters in Ascii set consider just digits 0..9 Only 10 values -> really only requires 4 bits There is actually a well used code for this which also allows for +/- -> BCD

Other Approaches

Run length encoding Preface each run with a 8-bit length byte aaabbabbbccdddaaaa -> 18 bytes 3a2b1a3b2c3d4a -> 14 bytes benefit from runs of 3 or more aaa versus 3a No gain or loss aa versus 2a lose in single characters a versus 1a

Facsimile Compression (example of run-length encoding) Example of application of run-length encoding. Decomposed into black/white pixels Lots of long runs of black and white pixels Don’t encode each pixel but runs of pixels

Differential encoding values between 1000 and 1050 1050 requires 11 bits difference plus +/- requires 7 bits 6 bits -> 64 1 additional bit for direction (+/-) Differential encoding can lead to problems as each value is relative to the last value. Like directions, one wrong turn and everything else is irrelevant.

Frequency Based Encoding Huffman Encoding is not the same length for all values Short codes for frequently occurring symbols Longer codes for infrequently occurring Arithmetic (not responsible for this) Interpret a string as a real number Infinite number of values between 0 and 1 Divide region up based on frequency A ->12% and B 5%, A is 0 to 0.12 and B 0.12 to 0.17 Limit based on the fact that computer has limited precision

Huffman (more details)

Huffman encoding Must know distribution of symbols Symbols typically have DIFFERENT lengths unlike most schemes you have seen (Ascii, etc) Characters occurring most have shortest code Characters occurring least have longest Solution minimal but not unique

Assume following data A -> 30% B-> 20% C-> 10 % D-> 5% E-> 35%

Lets peek at the answer Note that you read the encoding 1 Note that you read the encoding of a character from top to bottom. For example C is 0110. Also note that choice of 0 or 1 for a branch is arbitrary. 1 E A 1 B 1 C D

Build the solution tree Choose the smallest two at a time and group This could choose E and A instead! A 30 B 1 65 20 1 E C 35 10 A 1 100 D 15 B 5 1 C E D 35

And the binary encoding.. 1 A 00 B 010 C 0110 D 0111 E 1 1 E A 1 B 1 C D

Compute expected length Expected Bits Per Character .3*2 + .2*3 + .1*4 + .05*4 + .35*1 A 00 B 010 C 0110 D 0111 E 1 A -> 30% B-> 20% C-> 10 % D-> 5% E-> 35% .6+ .4+ .2+ .35 = = 2.15 Each symbol has average length of 2.15 bits You would have assumed 5 values -> 3 bits

Is it hard to interpret a message? Message Example: 0 0 1 0 1 0 0 0 0 1 1 1 A 00 B 010 C 0110 D 0111 E 1 A E B A D NOT REALLY! What if last 1 in message was missing? -> illegal While message is not ambiguous, illegal message are possible

Observations of Huffman Method creates a shorter code Assumes knowledge of symbol distribution Different symbols .. Different length Knowing distribution ahead of time is not always possible! Another version of Huffman coding can solve that problem

Revisiting Facsimiles Huffman says one can minimize by assigning different length codes to symbols Fax transmissions can use this principle to give short messages to long runs of white/black pixels/ Run-length combined with Huffman See Table 5.7 in the text

Table 5.7 Example: 66 white -> 64 + 2 = 110110111 TERMINATING Length White Black 0 00110101 000110111 1 000111 010 2 0111 11 3 1000 10 Example: 66 white -> 64 + 2 = 110110111 MAKEUP Length White Black 64 11011 000001111 128 10011 000011001000 256 0110111 000001011011

Multimedia compression Many of these include techniques that result in “lossy” compression. Uncompressing results in loss of information. This loss is tolerated because the inaccuracy is only perceivable based on human perception Video Pictures Audio Compression ratios of other techniques result in 2-3:1 Compression in multimedia need 10-20:1 Compression rates achieved by lossy techniques -> tradeoff Techniques JPEG – pictures MPEG – motion pictures MP3 - music

Image compression Represented as RGB 8 bits typical for each color Or as Luminance (brightness 8 bits) and Chrominance (color 16 bits) Perception of color by humans reacts significantly to light in addition to color Really two ways to represent the same thing Y = 0.30R + 0.59G + 0.11 B (luminance) I = 0.60R - 0.28G – 0.32B (color) Q = 0.21R – 0.52G + 0.31B (color)

JPEG Image -> DCT Phase Quantization Phase Encoding Phase Compressed Image -> -> -> So how does this work?

JPEG algorithm Consider 8x8 blocks at a time Create 3 8x8 arrays with color values of each pixel(for RGB) Now go through a complex transformation (theory beyond us) When you finish the transformation the numbers in upper left indicate little variation in color in the block, values further away from [0,0] indicate large color variation - see fig 5.10 top one with small variation, bottom with large Simplify the numbers in the result (eliminate small values) by dividing by an integer and then truncating. see Eqn 5-5 - value is different for each term and application dependent Use encoding (run-length) and odd pattern (Fig 5.11) to compress I don’t expect you to do this on a test, but it shows how JPEG is lossy.

MPEG Uses differential encoding to compare successive frames of a motion picture. Three kinds of frames: I -> JPEG complete image P -> incremental change to I (where block moves) ½ size I B -> use a different interpolation technique ¼ size I Typical sequence -> I B B P B B I ….

MP3 Music/audio compression Uses psychoacoustic principles Some sounds can’t be heard because they are drowned by other louder sounds (freqs) Divide the sound into smaller subbands Eliminate sounds you can’t hear anyway because others are too loud. 3 types with varying compression Layer 1 4:1 192K Layer 2 8:1 128K Layer 3 12:1 64K