Chapter 7 Special Section

Slides:



Advertisements
Similar presentations
T.Sharon-A.Frank 1 Multimedia Compression Basics.
Advertisements

15-583:Algorithms in the Real World
Data Compression CS 147 Minh Nguyen.
Image Compression. Data and information Data is not the same thing as information. Data is the means with which information is expressed. The amount of.
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Arithmetic Coding. Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a How we can do better than Huffman? - I As we have seen, the.
Chapter 7 Special Section Focus on Data Compression.
Compression & Huffman Codes
Fundamentals of Multimedia Chapter 7 Lossless Compression Algorithms Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
Source Coding Hafiz Malik Dept. of Electrical & Computer Engineering The University of Michigan-Dearborn
Lossless Compression in Multimedia Data Representation Hao Jiang Computer Science Department Sept. 20, 2007.
Data Compression Basics & Huffman Coding
Chapter 7 Special Section Focus on Data Compression.
1 Lossless Compression Multimedia Systems (Module 2) r Lesson 1: m Minimum Redundancy Coding based on Information Theory: Shannon-Fano Coding Huffman Coding.
Data dan Teknologi Multimedia Sesi 08 Nofriyadi Nurdam.
©Brooks/Cole, 2003 Chapter 15 Data Compression. ©Brooks/Cole, 2003 Realize the need for data compression. Differentiate between lossless and lossy compression.
Spring 2015 Mathematics in Management Science Binary Linear Codes Two Examples.
Management Information Systems Lection 06 Archiving information CLARK UNIVERSITY College of Professional and Continuing Education (COPACE)
8. Compression. 2 Video and Audio Compression Video and Audio files are very large. Unless we develop and maintain very high bandwidth networks (Gigabytes.
Lecture 10 Data Compression.
Chapter 7 Input/Output and Storage Systems. 2 Chapter 7 Objectives Understand how I/O systems work, including I/O methods and architectures. Become familiar.
Algorithm Design & Analysis – CS632 Group Project Group Members Bijay Nepal James Hansen-Quartey Winter
Source Coding-Compression
: Chapter 12: Image Compression 1 Montri Karnjanadecha ac.th/~montri Image Processing.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
 The amount of data we deal with is getting larger  Not only do larger files require more disk space, they take longer to transmit  Many times files.
Multimedia Specification Design and Production 2012 / Semester 1 / L3 Lecturer: Dr. Nikos Gazepidis
Multimedia Data Introduction to Lossless Data Compression Dr Sandra I. Woolley Electronic, Electrical.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
The LZ family LZ77 LZ78 LZR LZSS LZB LZH – used by zip and unzip
1 Classification of Compression Methods. 2 Data Compression  A means of reducing the size of blocks of data by removing  Unused material: e.g.) silence.
Addressing Image Compression Techniques on current Internet Technologies By: Eduardo J. Moreira & Onyeka Ezenwoye CIS-6931 Term Paper.
COMPRESSION. Compression in General: Why Compress? So Many Bits, So Little Time (Space) CD audio rate: 2 * 2 * 8 * = 1,411,200 bps CD audio storage:
Huffman Code and Data Decomposition Pranav Shah CS157B.
Abdullah Aldahami ( ) April 6,  Huffman Coding is a simple algorithm that generates a set of variable sized codes with the minimum average.
Lecture 4: Lossless Compression(1) Hongli Luo Fall 2011.
CS654: Digital Image Analysis Lecture 34: Different Coding Techniques.
Multi-media Data compression
Chapter 7 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information Theory 7.3 Run-Length Coding 7.4 Variable-Length Coding (VLC) 7.5.
Huffman code and Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Lampel ZIV (LZ) code The Lempel-Ziv algorithm is a variable-to-fixed length code Basically, there are two versions of the algorithm LZ77 and LZ78 are the.
Compression and Huffman Coding. Compression Reducing the memory required to store some information. Lossless compression vs lossy compression Lossless.
Data Compression Michael J. Watts
File Formats Different applications (programs) store data in different formats. Applications support some file formats and not others. Open…, Save…, Save.
Textbook does not really deal with compression.
COMP261 Lecture 22 Data Compression 2.
IMAGE PROCESSING IMAGE COMPRESSION
Data Coding Run Length Coding
Compression & Huffman Codes
Data Compression.
Digital Image Processing Lecture 20: Image Compression May 16, 2005
IMAGE COMPRESSION.
Data Compression.
Lesson Objectives Aims You should know about: 1.3.1:
Algorithms in the Real World
Introduction to Computer Science - Lecture 4
Huffman Coding, Arithmetic Coding, and JBIG2
Chapter 7 Special Section
Data Compression CS 147 Minh Nguyen.
Why Compress? To reduce the volume of data to be transmitted (text, fax, images) To reduce the bandwidth required for transmission and to reduce storage.
Chapter 11 Data Compression
UNIT IV.
Image Coding and Compression
15 Data Compression Foundations of Computer Science ã Cengage Learning.
File Compression Even though disks have gotten bigger, we are still running short on disk space A common technique is to compress files so that they take.
Chapter 8 – Compression Aims: Outline the objectives of compression.
CPS 296.3:Algorithms in the Real World
15 Data Compression Foundations of Computer Science ã Cengage Learning.
Presentation transcript:

Chapter 7 Special Section Focus on Data Compression

7A Objectives Understand the essential ideas underlying data compression. Become familiar with the different types of compression algorithm. Be able to describe the most popular data compression algorithms in use today and know the applications for which each is suitable.

7A.1 Introduction Data compression is important to storage systems because it allows more bytes to be packed into a given storage medium than when the data is uncompressed. Some storage devices (notably tape) compress data automatically as it is written, resulting in less tape consumption and significantly faster backup operations. Compression also reduces file transfer time, saving time and communications bandwidth.

7A.1 Introduction A good metric for compression is the compression factor (or compression ratio) given by: If we have a 100KB file that we compress to 40KB, we have a compression factor of:

7A.1 Introduction H= -P(x)  log2P(xi) Compression is achieved by removing data redundancy while preserving information content. The information content of a group of bytes (a message) is its entropy. Data with low entropy permit a larger compression ratio than data with high entropy. Entropy, H, is a function of symbol frequency. It is the weighted average of the number of bits required to encode the symbols of a message: H= -P(x)  log2P(xi)

 P(x)  li -  -P(x)  log2P(xi) 7A.1 Introduction The entropy of the entire message is the sum of the individual symbol entropies.  -P(x)  log2P(xi) The average redundancy for each character in a message of length l is given by:  P(x)  li -  -P(x)  log2P(xi)

7A.2 Statistical Coding Consider the message: HELLO WORLD! The letter L has a probability of 3/12 = 1/4 of appearing in this message. The number of bits required to encode this symbol is -log2(1/4) = 2. Using our formula,  -P(x)  log2P(xi), the average entropy of the entire message is 3.022. This means that the theoretical minimum number of bits per character is 3.022. Theoretically, the message could be sent using only 37 bits. (3.022 12 = 36.26)

7A.2 Statistical Coding The entropy metric just described forms the basis for statistical data compression. Two widely-used statistical coding algorithms are Huffman coding and arithmetic coding. Huffman coding builds a binary tree from the letter frequencies in the message. The binary symbols for each character are read directly from the tree. Symbols with the highest frequencies end up at the top of the tree, and result in the shortest codes.

7A.2 Statistical Coding The process of building the tree begins by counting the occurrences of each symbol in the text to be encoded. HIGGLETY PIGGLTY POP THE DOG HAS EATEN THE MOP THE PIGS IN A HURRY THE CATS IN A FLURRY

7A.2 Statistical Coding Next, place the letters and their frequencies into a forest of trees that each have two nodes: one for the letter, and one for its frequency.

7A.2 Statistical Coding We start building the tree by joining the nodes having the two lowest frequencies.

7A.2 Statistical Coding And then we again join the nodes with two lowest frequencies.

7A.2 Statistical Coding And again ....

7A.2 Statistical Coding Here is our finished tree.

7A.2 Statistical Coding This is the code derived from this tree.

7A.2 Statistical Coding The second type of statistical coding, arithmetic coding, partitions the real number interval between 0 and 1 into segments according to symbol probabilities. An abbreviated algorithm for this process is given in the text. Arithmetic coding is computationally intensive and it runs the risk of causing divide underflow. Variations in floating-point representation among various systems can also cause the terminal condition (a zero value) to be missed.

7A.2 Statistical Coding For most data, statistical coding methods offer excellent compression ratios. Their main disadvantage is that they require two passes over the data to be encoded. The first pass calculates probabilities, the second encodes the message. This approach is unacceptably slow for storage systems, where data must be read, written, and compressed within one pass over a file.

7A.3 LZ Dictionary Systems Ziv-Lempel (LZ) dictionary systems solve the two-pass problem by using values in the data as a dictionary to encode itself. The LZ77 compression algorithm employs a text window in conjunction with a lookahead buffer. The text window serves as the dictionary. If text is found in the lookahead buffer that matches text in the dictionary, the location and length of the text in the window is output.

7A.3 LZ Dictionary Systems The LZ77 implementations include PKZIP and IBM’s RAMAC RVA 2 Turbo disk array. The simplicity of LZ77 lends itself well to a hardware implementation. LZ78 is another dictionary coding system. It removes the LZ77 constraint of a fixed-size window. Instead, it creates a trie as the data is read. Where LZ77 uses pointers to locations in a dictionary, LZ78 uses pointers to nodes in the trie.

7A.4 GIF and PNG Compression GIF compression is a variant of LZ78, called LZW, for Lempel-Ziv-Welsh. It improves upon LZ78 through its efficient management of the size of the trie. Terry Welsh, the designer of LZW, was employed by the Unisys Corporation when he created the algorithm, and Unisys subsequently patented it. Owing to royalty disputes, development of another algorithm PNG, was hastened.

7A.4 GIF and PNG Compression PNG employs two types of compression, first a Huffman algorithm is applied, which is followed by LZ77 compression. The advantage that GIF holds over PNG, is that GIF supports multiple images in one file. MNG is an extension of PNG that supports multiple images in one file. GIF, PNG, and MNG are primarily used for graphics compression. To compress larger, photographic images, JEPG is often more suitable.

7A.5 JPEG Compression Photographic images incorporate a great deal of information. However, much of that information can be lost without objectionable deterioration in image quality. With this in mind, JPEG allows user-selectable image quality, but even at the “best” quality levels, JPEG makes an image file smaller owing to its multiple-step compression algorithm. It’s important to remember that JPEG is lossy, even at the highest quality setting. It should be used only when the loss can be tolerated. The JPEG algorithm is illustrated on the next slide.

7A.5 JPEG Compression

Section 7A Conclusion Two approaches to data compression are statistical data compression and dictionary systems. Statistical coding requires two passes over the input, dictionary systems require only one. LZ77 and LZ78 are two popular dictionary systems. GIF, PNG, MNG, and JPEG are used for image compression. JPEG is lossy, so its use is not suited for all types of images.

End of Section 7A