Lecture 4: Data Compression Techniques TSBK01 Image Coding and Data Compression Jörgen Ahlberg Div. of Sensor Technology Swedish Defence Research Agency.

Slides:



Advertisements
Similar presentations
Lecture 4: Data Compression Techniques TSBK01 Image Coding and Data Compression Jörgen Ahlberg Div. of Sensor Technology Swedish Defence Research Agency.
Advertisements

15-583:Algorithms in the Real World
Applied Algorithmics - week7
Lecture 3: Source Coding Theory TSBK01 Image Coding and Data Compression Jörgen Ahlberg Div. of Sensor Technology Swedish Defence Research Agency (FOI)
Image Data Representations and Standards
Source Coding Data Compression A.J. Han Vinck. DATA COMPRESSION NO LOSS of information and exact reproduction (low compression ratio 1:4) general problem.
Lecture 10: Dictionary Coding
Algorithms for Data Compression
Lecture 6 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan
CS216: Program and Data Representation University of Virginia Computer Science Spring 2006 David Evans Lecture 15: Compression
Chapter 7 End-to-End Data
Compression & Huffman Codes
Algorithm Programming Some Topics in Compression Bar-Ilan University תשס"ח by Moshe Fresko.
Lecture 6: Huffman Code Thinh Nguyen Oregon State University.
CSc 461/561 CSc 461/561 Multimedia Systems Part B: 1. Lossless Compression.
2015/6/15VLC 2006 PART 1 Introduction on Video Coding StandardsVLC 2006 PART 1 Variable Length Coding  Information entropy  Huffman code vs. arithmetic.
Text Operations: Coding / Compression Methods. Text Compression Motivation –finding ways to represent the text in fewer bits –reducing costs associated.
CS336: Intelligent Information Retrieval
Compression & Huffman Codes Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
2015/7/12VLC 2008 PART 1 Introduction on Video Coding StandardsVLC 2008 PART 1 Variable Length Coding  Information entropy  Huffman code vs. arithmetic.
Source Coding Hafiz Malik Dept. of Electrical & Computer Engineering The University of Michigan-Dearborn
Lossless Compression in Multimedia Data Representation Hao Jiang Computer Science Department Sept. 20, 2007.
Chapter 7 Special Section Focus on Data Compression.
Lossless Compression Multimedia Systems (Module 2 Lesson 3)
Basics of Compression Goals: to understand how image/audio/video signals are compressed to save storage and increase transmission efficiency to understand.
Huffman Coding Vida Movahedi October Contents A simple example Definitions Huffman Coding Algorithm Image Compression.
8. Compression. 2 Video and Audio Compression Video and Audio files are very large. Unless we develop and maintain very high bandwidth networks (Gigabytes.
Noiseless Coding. Introduction Noiseless Coding Compression without distortion Basic Concept Symbols with lower probabilities are represented by the binary.
15-853Page :Algorithms in the Real World Data Compression II Arithmetic Coding – Integer implementation Applications of Probability Coding – Run.
Algorithm Design & Analysis – CS632 Group Project Group Members Bijay Nepal James Hansen-Quartey Winter
Source Coding-Compression
Dr.-Ing. Khaled Shawky Hassan
296.3Page 1 CPS 296.3:Algorithms in the Real World Data Compression: Lecture 2.5.
Information and Coding Theory Heuristic data compression codes. Lempel- Ziv encoding. Burrows-Wheeler transform. Juris Viksna, 2015.
Page 110/6/2015 CSE 40373/60373: Multimedia Systems So far  Audio (scalar values with time), image (2-D data) and video (2-D with time)  Higher fidelity.
Multimedia Specification Design and Production 2012 / Semester 1 / L3 Lecturer: Dr. Nikos Gazepidis
The LZ family LZ77 LZ78 LZR LZSS LZB LZH – used by zip and unzip
Addressing Image Compression Techniques on current Internet Technologies By: Eduardo J. Moreira & Onyeka Ezenwoye CIS-6931 Term Paper.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
Lecture 4: Lossless Compression(1) Hongli Luo Fall 2011.
CS654: Digital Image Analysis Lecture 34: Different Coding Techniques.
Lecture 7 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan
Lossless Compression(2)
compress! From theoretical viewpoint...
1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Lecture 7 (W5)
Multi-media Data compression
Page 1KUT Graduate Course Data Compression Jun-Ki Min.
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" Basics
Lampel ZIV (LZ) code The Lempel-Ziv algorithm is a variable-to-fixed length code Basically, there are two versions of the algorithm LZ77 and LZ78 are the.
15-853Page :Algorithms in the Real World Data Compression III Lempel-Ziv algorithms Burrows-Wheeler Introduction to Lossy Compression.
Information theory Data compression perspective Pasi Fränti
CSE 589 Applied Algorithms Spring 1999
DATA STRUCTURES AND ALGORITHM (CSE 220)
Data Coding Run Length Coding
Compression & Huffman Codes
Data Compression.
Digital Image Processing Lecture 20: Image Compression May 16, 2005
Information and Coding Theory
Multimedia Outline Compression RTP Scheduling Spring 2000 CS 461.
Algorithms in the Real World
Applied Algorithmics - week7
Introduction to Computer Science - Lecture 4
Chapter 7 Special Section
Chapter 11 Data Compression
CSE 589 Applied Algorithms Spring 1999
COMS 161 Introduction to Computing
COMS 161 Introduction to Computing
Chapter 7 Special Section
Chapter 8 – Compression Aims: Outline the objectives of compression.
CPS 296.3:Algorithms in the Real World
Presentation transcript:

Lecture 4: Data Compression Techniques TSBK01 Image Coding and Data Compression Jörgen Ahlberg Div. of Sensor Technology Swedish Defence Research Agency (FOI)

Outline  Huffman coding  Arithmetic coding  Application: JBIG  Universal coding  LZ-coding  LZ77, LZ78, LZW  Applications: GIF and PNG

Repetition  Coding: Assigning binary codewords to (blocks of) source symbols.  Variable-length codes (VLC) and fixed- length codes.  Instantaneous codes ½ Uniqely decodable codes ½ Non-singular codes ½ All codes  Tree codes are instantaneous.  Tree code, Kraft’s Inequality.

Creating a Code: The Data Compression Problem  Assume a source with an alphabet A and known symbol probabilities {p i }.  Goal: Chose the codeword lengths as to minimize the bitrate, i.e., the average number of bits per symbol  l i ¢ p i.  Trivial solution: l i = 0 8 i.  Restriction: We want an instantaneous code, so  2 -l i · 1 (KI) must be valid.  Solution (at least in theory): l i = – log p i

In practice…  Use some nice algorithm to find the code tree –Huffman coding –Tunnstall coding

Huffman Coding  Two-step algorithm: 1.Iterate: –Merge the least probable symbols. –Sort. 2.Assign bits. a d b c Merge Sort Assign Get code

Coding of the BMS  Trick: Code blocks of symbols (extended source).  Example: p 1 = ¼, p 2 = ¾.  Applying the Huffman algorithm directly: 1 bit/symbol. Block P (block) Code 009/160) 013/1610 approx /16110bits/symbol 111/16111

Huffman Coding: Pros and Cons +Fast implementations. + Error resilient: resynchronizes in ~ l 2 steps. -The code tree grows exponentially when the source is extended. -The symbol probabilities are built-in in the code. Hard to use Huffman coding for extended sources / large alphabets or when the symbol probabilities are varying by time.

Arithmetic Coding  Shannon-Fano-Elias  Basic idea: Split the interval [0,1] according to the symbol probabilities.  Example: A = {a,b,c,d}, P = {½, ¼, 1/8, 1/8}.

b c a Start in b. Code the sequence c c a. cb a 0.9 Code the sequence c c a. ) Code the interval [0.9, 0.96] Bit IntervalDecoder 1c c a bc cba

An Image Coding Application  Consider the image content in a local environment of a pixel as a state in a Markov model.  Example (binary image):  Such an environment is called a context.  A probability distribution for X can be estimated for each state. Then arithmetic coding is used.  This is the basic idea behind the JBIG algorithm for binary images and data. X

Flushing the Coder  The coding process is ended (restarted) and the coder flushed –after a given number of symbols (FIVO) or –When the interval is too small for a fixed number of output bits (VIFO).

Universal Coding  A universal coder doesn’t need to know the statistics in advance. Instead, estimate from data.  Forward estimation: Estimate statistics in a first pass and transmit to the decoder.  Backward estimation: Estimate from already transmitted (received) symbols.

Universal Coding: Examples 1.An adaptive arithmetic coder 2.An adaptive dictionary technique –The LZ coders [Sayood 5] 3.An adaptive Huffman coder [Sayood 3.4] Arithmetic coder Statistics estimation

Ziv-Lempel Coding (ZL or LZ)  Named after J. Ziv and A. Lempel (1977).  Adaptive dictionary technique. –Store previously coded symbols in a buffer. –Search for the current sequence of symbols to code. –If found, transmit buffer offset and length.

LZ77 abcabd acab d eee f c Search bufferLook-ahead buffer Output triplet d0e2f 2 Transmitted to decoder: If the size of the search buffer is N and the size of the alphabet is M we need bits to code a triplet. Variation: Variation: Use a VLC to code the triplets! PKZip, Zip, Lharc, PNG, gzip, ARJ

Drawback with LZ77  Repetetive patterns with a period longer than the search buffer size are not found.  If the search buffer size is 4, the sequence a b c d e a b c d e a b c d e a b c d e … will be expanded, not compressed.

LZ78  Store patterns in a dictionary  Transmit a tuple  Transmit a tuple

LZ78 Output tuple Dictionary: 1 a 2 b 3 c 4 a b 5 a b c abcaba bc 0a Transmitted to decoder: 0b0c1b4c Decoded:a b c a a b b c Strategy needed for limiting dictionary size!

LZW  Modification to LZ78 by Terry Welch,  Applications: GIF, v42bis  Patented by UniSys Corp.  Transmit only the dictionary index.  The alphabet is stored in the dictionary in advance.

LZW Output: dictionary index Encoder dictionary: 1 a 2 b 3 c 4 d 5 a b abcaba bc 1 Transmitted: 2355 Decoded: abca b 6 bc 7 ca 8 aba 9 abc Decoder dictionary: 1 a 2 b 3 c 4 d 5 a b 6 bc 7 ca 8 aba Input sequence:

And now for some applications: GIF & PNG

GIF  CompuServe Graphics Interchange Format (1987, 89).  Features: –Designed for up/downloading images to/from BBSes via PSTN. –1-, 4-, or 8-bit colour palettes. –Interlace for progressive decoding (four passes, starts with every 8th row). –Transparent colour for non-rectangular images. –Supports multiple images in one file (”animated GIFs”).

GIF: Method  Compression by LZW.  Dictionary size 2 b+1 8-bit symbols –b is the number of bits in the palette.  Dictionary size doubled if filled (max 4096).  Works well on computer generated images.

GIF: Problems  Unsuitable for natural images (photos): –Maximum 256 colors ( ) bad quality). –Repetetive patterns uncommon ( ) bad compression).  LZW patented by UniSys Corp.  Alternative: PNG

PNG: Portable Network Graphics  Designed to replace GIF.  Some features: –Indexed or true-colour images ( · 16 bits per plane). –Alpha channel. –Gamma information. –Error detection.  No support for multiple images in one file. –Use MNG for that.  Method: –Compression by LZ77 using a 32KB search buffer. –The LZ77 triplets are Huffman coded.  More information:

Summary  Huffman coding –Simple, easy, fast –Complexity grows exponentially with the block length –Statistics built-in in the code  Arithmetic coding –Complexity grows linearly with the block size –Easily adapted to variable statistics ) used for coding of Markov sources  Universal coding –Adaptive Huffman or arithmetic coder –LZ77: Buffer with previously sent sequences –LZ77: Buffer with previously sent sequences –LZ78: Dictionary instead of buffer –LZ78: Dictionary instead of buffer –LZW: Modification to LZ78 –LZW: Modification to LZ78

Summary, cont  Where are the algorithms used? –Huffman coding: JPEG, MPEG, PNG, … –Arithmetic coding: JPEG, JBIG, MPEG-4, … –LZ77: PNG, PKZip, Zip, gzip, … –LZW: compress, GIF, v42bis, …

Finally  These methods work best if the source alphabet is small and the distribution skewed. –Text –Graphics  Analog sources (images, sound) require other methods –complex dependencies –accepted distortion