ICS 220 – Data Structures and Algorithms Lecture 11 Dr. Ken Cosh.

Slides:



Advertisements
Similar presentations
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Advertisements

Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Binary Trees CSC 220. Your Observations (so far data structures) Array –Unordered Add, delete, search –Ordered Linked List –??
SIMS-201 Compressing Information. 2  Overview Chapter 7: Compression Introduction Entropy Huffman coding Universal coding.
Huffman Encoding Dr. Bernard Chen Ph.D. University of Central Arkansas.
Greedy Algorithms (Huffman Coding)
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Lecture04 Data Compression.
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
A Data Compression Algorithm: Huffman Compression
CS336: Intelligent Information Retrieval
Data Structures – LECTURE 10 Huffman coding
Chapter 9: Huffman Codes
Fundamentals of Multimedia Chapter 7 Lossless Compression Algorithms Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
Algorithm Design & Analysis – CS632 Group Project Group Members Bijay Nepal James Hansen-Quartey Winter
Huffman Codes. Encoding messages  Encode a message composed of a string of characters  Codes used by computer systems  ASCII uses 8 bits per character.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
Data Structures Week 6: Assignment #2 Problem
 The amount of data we deal with is getting larger  Not only do larger files require more disk space, they take longer to transmit  Many times files.
Communication Technology in a Changing World Week 2.
Data Structures and Algorithms Lecture (BinaryTrees) Instructor: Quratulain.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
ALGORITHMS FOR ISNE DR. KENNETH COSH WEEK 13.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
Introduction to Algorithms Chapter 16: Greedy Algorithms.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Huffman Code and Data Decomposition Pranav Shah CS157B.
CSCE350 Algorithms and Data Structure Lecture 19 Jianjun Hu Department of Computer Science and Engineering University of South Carolina
Lecture 4: Lossless Compression(1) Hongli Luo Fall 2011.
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
CS654: Digital Image Analysis Lecture 34: Different Coding Techniques.
Bahareh Sarrafzadeh 6111 Fall 2009
1 Algorithms CSCI 235, Fall 2015 Lecture 30 More Greedy Algorithms.
Lossless Decomposition and Huffman Codes Sophia Soohoo CS 157B.
Huffman Codes. Overview  Huffman codes: compressing data (savings of 20% to 90%)  Huffman’s greedy algorithm uses a table of the frequencies of occurrence.
1 Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004.
Chapter 7 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information Theory 7.3 Run-Length Coding 7.4 Variable-Length Coding (VLC) 7.5.
Lecture 12 Huffman Algorithm. In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly.
CSE 143 Lecture 22 Huffman slides created by Ethan Apter
Huffman code and Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
Compression and Huffman Coding. Compression Reducing the memory required to store some information. Lossless compression vs lossy compression Lossless.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 18.
Design & Analysis of Algorithm Huffman Coding
Huffman Codes ASCII is a fixed length 7 bit code that uses the same number of bits to define each character regardless of how frequently it occurs. Huffman.
HUFFMAN CODES.
Data Compression.
Assignment 6: Huffman Code Generation
Applied Algorithmics - week7
Algorithms for iSNE Dr. Kenneth Cosh Week 13.
ISNE101 – Introduction to Information Systems and Network Engineering
Huffman Coding Based on slides by Ethan Apter & Marty Stepp
Proving the Correctness of Huffman’s Algorithm
The Huffman Algorithm We use Huffman algorithm to encode a long message as a long bit string - by assigning a bit string code to each symbol of the alphabet.
Chapter 9: Huffman Codes
Advanced Algorithms Analysis and Design
Chapter 11 Data Compression
Huffman Coding CSE 373 Data Structures.
Communication Technology in a Changing World
Communication Technology in a Changing World
Greedy Algorithms TOPICS Greedy Strategy Activity Selection
Trees Addenda.
Data Structure and Algorithms
Algorithms CSCI 235, Spring 2019 Lecture 30 More Greedy Algorithms
Huffman Coding Greedy Algorithm
CSE 589 Applied Algorithms Spring 1999
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Proving the Correctness of Huffman’s Algorithm
Analysis of Algorithms CS 477/677
Presentation transcript:

ICS 220 – Data Structures and Algorithms Lecture 11 Dr. Ken Cosh

Review Hash Functions –Spreading data evenly through a table avoiding collisions while maintaining a simple efficient function. Linear Probing, Quadratic Probing, General Increment Probing. Tombstone deletion.

This week Data Compression Techniques –We can speed up the efficiency of many algorithms by compressing data. –Obviously storing ‘M’ or ‘F’, rather than ‘Male’ or ‘Female’, takes less space, AND less time to transfer between bits of a program. –Lets investigate how we can efficiently encode data.

Encoding CXXVIII ρκη |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||||||||||||||||||||||||||||||||||

Assumptions We have ‘n’ different symbols we can use to encode a message. –In Morse code n=3. –In Binary n=2. All symbols m i forming the set M, have probabilities of occurrence P(m i ) such that P(m i ) + … + P(m n ) =1 –Infrequently occurring symbols can be assigned a long code word, while short code words are reserved for frequent symbols.

Coding Objectives Each codeword corresponds to exactly one symbol. Decoding should not require any look ahead. –This is known as the ‘prefix’ property.

Prefix Property Symbols: A, B, C Codes: 1, 2, 12 Message: 12 Is it ‘AB’? Is it ‘C’?

Prefix Property Symbols: A, B, C, Codes: 1, 22, 12 Message: 1222 Read in 1, is it an A? Read in 2, was it a C? Read in 2, Should it be AB? Read in 2, Ah, finally we can assume it was CB.

Prefix Property Symbols: A, B, C Codes: 11, 12, 21 No Ambiguity here.

Code Optimisation The length of a code for one symbol should not exceed the length of a less likely symbol; –if P(m i ) ≤ P(m j ) then L(m i ) ≥ L(m j ) There should be no unused short codes, either as stand alone encodings or as prefixs for longer codes. –01, 000, 001, 100, 101 is not ideal as 11 is not used.

Huffman Coding Huffman coding is a method for choosing a representation for each symbol, resulting in a prefix-free code –The bit string representing some particular symbol is never a prefix of the bit string representing any other symbol The most common characters are expressed using shorter strings of bits than are used for less common symbols.

Huffman Coding Huffman coding starts by creating a heap, based on the frequencies of the symbols to be encoded. –The root of the heap should be 1.0 – the sum of both child nodes. –Each data node appears in a leaf node. Take the set (with corresponding occurrence frequencies out of 120); A B C D E F G H I

Creating a Heap There are several ways a heap can be created from the given data, here is one option; A C 10 FI BD H 35 EG

Huffman Coding Each node is then given a code based on their position in the tree – 0 for left node, 1 for right node; A C 10 FI BD H 35 EG A = 000 B = 010 C = 0010 D = 011 E = 111 F = G = 110 H = 10 I = 00111

Huffman Decoding A = 000 B = 010 C = 0010 D = 011 E = 111 F = G = 110 H = 10 I = Does this code fit our prefix requirements?

Adaptive Huffman Huffman coding was invented back in the 1950’s, and has a key assumption – that we know the expected frequency of each symbol. We could get this frequency from a large corpus, such as the BNC, but then the resulting huffman tree might not be ideal for all circumstances. –A technical computing paper is likely to contain more non alphabet symbols than ‘war and peace’. Adaptive Huffman is usable in this situation, as it reconstructs the tree as data is read in.

Adaptive Huffman Initially all symbols are unused and so stored in a leaf with value 0. As symbols are used they are moved up the tree. As symbols are used more frequently, they are compared and if necessary swapped with their siblings. Sibling comparison is easy if the heap is stored as a doubly linked list.

Doubly Linked List A C 10 FI BD H 35 EG

Cabbages are bad 0 abcdegrs c abdegrs c bdegrs 0 a c degrs 1 a 1 01 b c 1 a 1 01 b add c add aadd b transpose

Adaptive Huffman Note that Adaptive Huffman requires just one pass through the message to encode it; –The first time the b is transmitted it has the code 101. –The second time the b is transmitted it has the code 0. As symbols rise up the tree their codes become shorter. So long as the decoder uses the same formula it can decode the same message.

Run-Length Encoding The word cabbage has a repeated b, so run length encoding might be able to shorten the code, by indicating that there are 2 b’s. –This is similar to the ๆ symbol in Thai. Run-length encoding doesn’t work if there are numbers in the sequence!

Homework Taking the following phrase; –“Data Structures And Algorithms Analysis” Use the Adaptive Huffman Coding method to generate a code for it.