Data Structures Week 6: Assignment #2 Problem

Slides:



Advertisements
Similar presentations
Lecture 4 (week 2) Source Coding and Compression
Advertisements

Applied Algorithmics - week7
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Binary Trees CSC 220. Your Observations (so far data structures) Array –Unordered Add, delete, search –Ordered Linked List –??
Greedy Algorithms Amihood Amir Bar-Ilan University.
22C:19 Discrete Structures Trees Spring 2014 Sukumar Ghosh.
Greedy Algorithms (Huffman Coding)
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Trees Chapter 8.
Huffman Coding: An Application of Binary Trees and Priority Queues
Fall 2007CS 2251 Trees Chapter 8. Fall 2007CS 2252 Chapter Objectives To learn how to use a tree to represent a hierarchical organization of information.
Trees Chapter 8. Chapter 8: Trees2 Chapter Objectives To learn how to use a tree to represent a hierarchical organization of information To learn how.
1 Assignment 2: (Due at 10:30 a.m on Friday of Week 10) Question 1 (Given in Tutorial 5) Question 2 (Given in Tutorial 7) If you do Question 1 only, you.
A Data Compression Algorithm: Huffman Compression
Data Structures – LECTURE 10 Huffman coding
Chapter 9: Huffman Codes
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
Lossless Data Compression Using run-length and Huffman Compression pages
Huffman code uses a different number of bits used to encode characters: it uses fewer bits to represent common characters and more bits to represent rare.
CSE Lectures 22 – Huffman codes
Data Structures and Algorithms Huffman compression: An Application of Binary Trees and Priority Queues.
Data Structures Arrays both single and multiple dimensions Stacks Queues Trees Linked Lists.
Binary Trees A binary tree is made up of a finite set of nodes that is either empty or consists of a node called the root together with two binary trees.
Trees. Tree Terminology Chapter 8: Trees 2 A tree consists of a collection of elements or nodes, with each node linked to its successors The node at the.
Algorithm Design & Analysis – CS632 Group Project Group Members Bijay Nepal James Hansen-Quartey Winter
Huffman Codes. Encoding messages  Encode a message composed of a string of characters  Codes used by computer systems  ASCII uses 8 bits per character.
Huffman Encoding Veronica Morales.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
Ceng-112 Data Structures I 1 Chapter 7 Introduction to Trees.
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
Trees Chapter 8. Chapter 8: Trees2 Chapter Objectives To learn how to use a tree to represent a hierarchical organization of information To learn how.
Spring 2010CS 2251 Trees Chapter 6. Spring 2010CS 2252 Chapter Objectives Learn to use a tree to represent a hierarchical organization of information.
Data Structures and Algorithms Lecture (BinaryTrees) Instructor: Quratulain.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
Huffman Coding. Huffman codes can be used to compress information –Like WinZip – although WinZip doesn’t use the Huffman algorithm –JPEGs do use Huffman.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Priority Queues, Trees, and Huffman Encoding CS 244 This presentation requires Audio Enabled Brent M. Dingle, Ph.D. Game Design and Development Program.
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
CPS 100, Spring Huffman Coding l D.A Huffman in early 1950’s l Before compressing data, analyze the input stream l Represent data using variable.
Foundation of Computing Systems
Bahareh Sarrafzadeh 6111 Fall 2009
Lossless Decomposition and Huffman Codes Sophia Soohoo CS 157B.
1 Huffman Codes. 2 ASCII use same size encoding for all characters. Variable length codes can produce shorter messages than fixed length codes Huffman.
Compression and Huffman Coding. Compression Reducing the memory required to store some information. Lossless compression vs lossy compression Lossless.
Lecture on Data Structures(Trees). Prepared by, Jesmin Akhter, Lecturer, IIT,JU 2 Properties of Heaps ◈ Heaps are binary trees that are ordered.
Design & Analysis of Algorithm Huffman Coding
Huffman Codes ASCII is a fixed length 7 bit code that uses the same number of bits to define each character regardless of how frequently it occurs. Huffman.
HUFFMAN CODES.
COMP261 Lecture 22 Data Compression 2.
CSC317 Greedy algorithms; Two main properties:
Tries 07/28/16 11:04 Text Compression
Assignment 6: Huffman Code Generation
Tries 5/27/2018 3:08 AM Tries Tries.
ISNE101 – Introduction to Information Systems and Network Engineering
The Greedy Method and Text Compression
The Greedy Method and Text Compression
Data Compression If you’ve ever sent a large file to a friend, you may have compressed it into a zip archive like the one on this slide before doing so.
Chapter 8 – Binary Search Tree
The Huffman Algorithm We use Huffman algorithm to encode a long message as a long bit string - by assigning a bit string code to each symbol of the alphabet.
Chapter 9: Huffman Codes
Huffman Coding.
Chapter 11 Data Compression
Huffman Coding CSE 373 Data Structures.
Data Structure and Algorithms
Tries 2/23/2019 8:29 AM Tries 2/23/2019 8:29 AM Tries.
Heaps and Priority Queues
Podcast Ch23d Title: Huffman Compression
Algorithms CSCI 235, Spring 2019 Lecture 30 More Greedy Algorithms
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Presentation transcript:

Data Structures Week 6: Assignment #2 Problem

Requirement Encode a message using Huffman's algorithm Use Min Heap as the priority queue dynamic allocation The input consists of stings A string consists of alphabets only  Upper case and lower case letters are treated as different characters  stored in a text file  given in separate lines

Requirement – cont’ Output should be stored in a text file in the following format Due date 2001/5/23 24:00 Heap Traversal: [character or string]... Huffman Tree Traversal: [character or string]... character: frequency, code. the code for the message:

Encoding Encode the message as a long bit string assign a bit string code to each symbol of the alphabet then, concatenate the individual codes of the symbols making up the message to produce an encoding for the message

Example#1 SymbolCode A 010 B 100 C 000 D 111 ABACCDA Three bits are used for each symbol 21 bits are needed to encode the message  inefficient

Example#2 Symbol Code A 00 B 01 C 10 D 11 ABACCDA Two bits are used for each symbol 14 bits are needed to encode the message

Example#3 ABACCDA Each of the letters B and D appears only once in the message The letter A appears three times The letter A assigned a shorter bit string than the letters B and D

Example#3 - cont’ SymbolCode A0 B110 C10 D111 ABACCDA Encoding of the message requires only 13 bits  more efficient

Variable-Length Code If variable-length codes are used the code for one symbol may not be a prefix of the code for another Example The code for a symbol x, c(x)  a prefix of the code of another symbol y, c(y) When c(x) is encountered in a left-to-right scan  It is unclear whether c(x) represents the symbol x or whether it is the first part of c(y).

Optimal Encoding Scheme (1) Symbol Frequency A3 B1 C2 D1 Find the two symbols that appear least frequently These are B and D Combine these two symbols into the single symbol BD The frequency of this new symbol is the sum of the frequencies of its two symbols The frequency of BD is 2

Optimal Encoding Scheme (2) Symbol Frequency A 3 C 2 BD 2 Again choose the two symbols with smallest frequency These are C and BD Combine these two symbols into the single symbol CBD The frequency of this new symbol is the sum of the frequencies of its two symbols The frequency of CBD is 4

Optimal Encoding Scheme (3) Symbol Frequency A3 CBD4 There are now only two symbols remaining These are combined into the single symbol ACBD The frequency of ACBD is 7 Symbol Frequency ACBD 7

Optimal Encoding Scheme (4) ACBD (A and CBD) assigned the codes 0 and 1 CBD (C and BD) assigned the codes 10 and 11 BD (B and D) assigned the codes 110 and 111

D1 C2 B1 A3 The Huffman’s Algorithm (1)

The Huffman’s Algorithm (2) C2 B1D1 A3

The Huffman’s Algorithm (3) B1D1 C2 A3 BD2

The Huffman’s Algorithm (4) B1D1 A3 BD2 C2

The Huffman’s Algorithm (5) B1D1 A3 BD2 C2 CBD4

The Huffman’s Algorithm (6) B1D1 A3 BD2 C2 CBD4

The Huffman’s Algorithm (7) B1D1 A3 BD2 C2 CBD4 ACBD7

The Huffman’s Algorithm (8)  Build a min heap which contains the nodes of all symbols with the frequency values as the keys  Delete two nodes from the heap, concatenate the two symbols, add their frequencies, and put the result back into the heap  Make the two nodes become the two children of the node of the concatenated symbol i.e) if s=s 1 s 2 is the symbol concatenated from s 1 and s 2, then s 1 and s 2 become the left child and right child of s  Continue steps 2 and 3 until priority queue is empty

The Huffman’s Algorithm (9) Once the Huffman tree is constructed the code of any symbol can be constructed by starting at the leaf representing that symbol climbing up to the root The code is initialized to null each time that a left branch is climbed  0 is appended to the beginning of the code each time that a right branch is climbed  1 is appended to the beginning of the code

VAR position[i] : a pointer to the ith symbol n : the number of symbols /*none zero frequency */ frequency[i] : the relative frequency of the ith symbol code[i] : the code assigned to the ith symbol p, p1, p2: a pointer to Min heap's node or huffman tree's node Main Function { initialization; count the frequency of each symbol within the message; // construct a node for each symbol for(i=0; i < n; i++){ = create a node; position[i] = p; //a pointer to the leaf containing the ith symbol insert into Min heap ; }//end for The Huffman’s Algorithm (10)

The Huffman’s Algorithm (11) while(Min heap contains more than one item){ = delete Min heap; //combine p1 and p2 as branches of a single tree = create a node; set to be left_child of huffman tree p; set to be right_child of huffman tree p; insert into Min heap; }//end while

The Huffman’s Algorithm (12) //the tree is now constructed; use it to find codes = delete Min heap; for(i=0; i<n; i++){ p = position[i]; code[i] = NULL; while(p!=root){ //travel up to the root if(is left ) code[i]= 0 followed by code[i]; else code[i]= 1 followed by code[i]; = move to father node; } // end while }//end for }//end main