Data Compression Michael J. Watts http://mike.watts.net.nz.

Slides:



Advertisements
Similar presentations
Compression techniques. Why we need compression. Types of compression –Lossy and lossless Concentrate on lossless techniques. Run Length coding. Entropy.
Advertisements

15 Data Compression Foundations of Computer Science ã Cengage Learning.
Data Compression CS 147 Minh Nguyen.
Image Compression. Data and information Data is not the same thing as information. Data is the means with which information is expressed. The amount of.
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
SIMS-201 Compressing Information. 2  Overview Chapter 7: Compression Introduction Entropy Huffman coding Universal coding.
Data Compression Michael J. Watts
Lecture04 Data Compression.
Compression & Huffman Codes
SWE 423: Multimedia Systems
CSCI 3 Chapter 1.8 Data Compression. Chapter 1.8 Data Compression  For the purpose of storing or transferring data, it is often helpful to reduce the.
Compression JPG compression, Source: Original 10:1 Compression 45:1 Compression.
Computer Science 335 Data Compression.
Compression & Huffman Codes Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Fundamentals of Multimedia Chapter 7 Lossless Compression Algorithms Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
Lossless Data Compression Using run-length and Huffman Compression pages
CS430 © 2006 Ray S. Babcock Lossy Compression Examples JPEG MPEG JPEG MPEG.
EEE377 Lecture Notes1 EEE436 DIGITAL COMMUNICATION Coding En. Mohd Nazri Mahmud MPhil (Cambridge, UK) BEng (Essex, UK) Room 2.14.
Data Compression Basics & Huffman Coding
©Brooks/Cole, 2003 Chapter 15 Data Compression. ©Brooks/Cole, 2003 Realize the need for data compression. Differentiate between lossless and lossy compression.
Spring 2015 Mathematics in Management Science Binary Linear Codes Two Examples.
8. Compression. 2 Video and Audio Compression Video and Audio files are very large. Unless we develop and maintain very high bandwidth networks (Gigabytes.
Lecture 10 Data Compression.
Chapter 2 Source Coding (part 2)
Data Compression.
CS Spring 2011 CS 414 – Multimedia Systems Design Lecture 7 – Basics of Compression (Part 2) Klara Nahrstedt Spring 2011.
Page 110/6/2015 CSE 40373/60373: Multimedia Systems So far  Audio (scalar values with time), image (2-D data) and video (2-D with time)  Higher fidelity.
D ATA C OMMUNICATIONS Compression Techniques. D ATA C OMPRESSION Whether data, fax, video, audio, etc., compression can work wonders Compression can be.
Lecture 29. Data Compression Algorithms 1. Commonly, algorithms are analyzed on the base probability factor such as average case in linear search. Amortized.
Multimedia Specification Design and Production 2012 / Semester 1 / L3 Lecturer: Dr. Nikos Gazepidis
Image Compression (Chapter 8) CSC 446 Lecturer: Nada ALZaben.
Multimedia Data Introduction to Lossless Data Compression Dr Sandra I. Woolley Electronic, Electrical.
8. 1 MPEG MPEG is Moving Picture Experts Group On 1992 MPEG-1 was the standard, but was replaced only a year after by MPEG-2. Nowadays, MPEG-2 is gradually.
File Compression Techniques Alex Robertson. Outline History Lossless vs Lossy Basics Huffman Coding Getting Advanced Lossy Explained Limitations Future.
Addressing Image Compression Techniques on current Internet Technologies By: Eduardo J. Moreira & Onyeka Ezenwoye CIS-6931 Term Paper.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
COMPRESSION. Compression in General: Why Compress? So Many Bits, So Little Time (Space) CD audio rate: 2 * 2 * 8 * = 1,411,200 bps CD audio storage:
Lecture 4: Lossless Compression(1) Hongli Luo Fall 2011.
CS654: Digital Image Analysis Lecture 34: Different Coding Techniques.
STATISTIC & INFORMATION THEORY (CSNB134) MODULE 11 COMPRESSION.
Multi-media Data compression
Chapter 7 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information Theory 7.3 Run-Length Coding 7.4 Variable-Length Coding (VLC) 7.5.
Submitted To-: Submitted By-: Mrs.Sushma Rani (HOD) Aashish Kr. Goyal (IT-7th) Deepak Soni (IT-8 th )
Textbook does not really deal with compression.
Design & Analysis of Algorithm Huffman Coding
Data Coding Run Length Coding
Compression & Huffman Codes
Digital Image Processing Lecture 20: Image Compression May 16, 2005
IMAGE COMPRESSION.
Data Compression.
Multimedia Outline Compression RTP Scheduling Spring 2000 CS 461.
Algorithms in the Real World
Lesson Objectives Aims You should know about: 1.3.1:
Algorithms in the Real World
Introduction to Computer Science - Lecture 4
Data Compression.
Lempel-Ziv-Welch (LZW) Compression Algorithm
Huffman Coding, Arithmetic Coding, and JBIG2
Data Compression CS 147 Minh Nguyen.
Why Compress? To reduce the volume of data to be transmitted (text, fax, images) To reduce the bandwidth required for transmission and to reduce storage.
Chapter 11 Data Compression
Image Transforms for Robust Coding
Data Compression Section 4.8 of [KT].
Image Coding and Compression
15 Data Compression Foundations of Computer Science ã Cengage Learning.
Chapter 8 – Compression Aims: Outline the objectives of compression.
15 Data Compression Foundations of Computer Science ã Cengage Learning.
Lecture 8 Huffman Encoding (Section 2.2)
Presentation transcript:

Data Compression Michael J. Watts http://mike.watts.net.nz

Lecture Outline What is compression? Huffman encoding Lempel-Ziv encoding Fraunhofer compression JPEG compression

What is Compression? Reduction of the length of symbols used to represent data Symbols can be anything Based on redundancy Information theory Lossless vs lossy Degree of compression depends on amount of redundancy Random data compression?

What is Compression? Represent symbols with codewords One symbol == one codeword Length of codeword < length of symbol Many algorithms exist Huffman Lempel-Ziv Fraunhofer

Huffman Encoding Based on probability (frequency) of symbols Assign codewords as bits Builds a binary tree based on probability Frequent / high probability symbols at the top Infrequent / low probability symbols lower down Branches labelled with 0 and 1 Frequent symbols get short bit strings Infrequent symbols get longer bit strings

Huffman Encoding Algorithm Arrange symbols in order of decreasing probability Combine two symbols with lowest probability Assign a new name, add their probabilities Rebuild the list Combine the next two lowest symbols Repeat until there is one symbol, with probability of one Build a binary tree Form codewords by tracing down the tree to the symbol

Huffman Encoding Decompression is the opposite of compression Follow bit sequence to a terminal node Needs to see the entire data set Must know symbol probabilities Also combined with other techniques MP3 encoding

Lempel-Ziv Dictionary based encoder Lossless Replaces phrases with tokens Tokens smaller than the phrases Converts input stream to compressed output stream Doesn't need to see entire data set Dictionary is built as it moves through the stream

Lempel-Ziv Separates input stream into tokens Each token represents the shortest phrase that has not been seen Tokens are numbered Tokens contain other tokens Compression is inefficient at the start Better later Larger dictionary

Fraunhofer / MP3 Encoding Lossy algorithm Compression of audio data Perceptual encoding Psycho-acoustic model Throws out what can't be heard Frequency bands Masking effects Output is further compressed Uses Huffman encoding

MP3 Encoding Only usable on audio Part of the MPEG standard DVDs Rather widely used by itself

JPEG Compression Joint Photographic Experts Group Used to compress images Compresses groups of pixels Uses a Discrete Cosine Transformation DCT Quantises the results of the DCT Lossy! Further compression is applied

Summary Compression is an application of information theory Encodes long strings as shorter strings All schemes depend on redundant / repetitive data Common gotchas A compression algorithm cannot compress its own output Cannot easily compress completely random data No / little redundancy