Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Structures Week 6: Assignment #2 Problem

Similar presentations


Presentation on theme: "Data Structures Week 6: Assignment #2 Problem"— Presentation transcript:

1 Data Structures Week 6: Assignment #2 Problem http://www.cs.hongik.ac.kr/~rhanha/rhanha_teaching.html/

2 Requirement Encode a message using Huffman's algorithm Use Min Heap as the priority queue dynamic allocation The input consists of stings A string consists of alphabets only  Upper case and lower case letters are treated as different characters  stored in a text file  given in separate lines

3 Requirement – cont’ Output should be stored in a text file in the following format Due date 2001/5/23 24:00 Heap Traversal: [character or string]... Huffman Tree Traversal: [character or string]... character: frequency, code. the code for the message:

4 Encoding Encode the message as a long bit string assign a bit string code to each symbol of the alphabet then, concatenate the individual codes of the symbols making up the message to produce an encoding for the message

5 Example#1 SymbolCode A 010 B 100 C 000 D 111 ABACCDA 010100010000000111010 Three bits are used for each symbol 21 bits are needed to encode the message  inefficient

6 Example#2 Symbol Code A 00 B 01 C 10 D 11 ABACCDA 00010010101100 Two bits are used for each symbol 14 bits are needed to encode the message

7 Example#3 ABACCDA Each of the letters B and D appears only once in the message The letter A appears three times The letter A assigned a shorter bit string than the letters B and D

8 Example#3 - cont’ SymbolCode A0 B110 C10 D111 ABACCDA 0110010101110 Encoding of the message requires only 13 bits  more efficient

9 Variable-Length Code If variable-length codes are used the code for one symbol may not be a prefix of the code for another Example The code for a symbol x, c(x)  a prefix of the code of another symbol y, c(y) When c(x) is encountered in a left-to-right scan  It is unclear whether c(x) represents the symbol x or whether it is the first part of c(y).

10 Optimal Encoding Scheme (1) Symbol Frequency A3 B1 C2 D1 Find the two symbols that appear least frequently These are B and D Combine these two symbols into the single symbol BD The frequency of this new symbol is the sum of the frequencies of its two symbols The frequency of BD is 2

11 Optimal Encoding Scheme (2) Symbol Frequency A 3 C 2 BD 2 Again choose the two symbols with smallest frequency These are C and BD Combine these two symbols into the single symbol CBD The frequency of this new symbol is the sum of the frequencies of its two symbols The frequency of CBD is 4

12 Optimal Encoding Scheme (3) Symbol Frequency A3 CBD4 There are now only two symbols remaining These are combined into the single symbol ACBD The frequency of ACBD is 7 Symbol Frequency ACBD 7

13 Optimal Encoding Scheme (4) ACBD (A and CBD) assigned the codes 0 and 1 CBD (C and BD) assigned the codes 10 and 11 BD (B and D) assigned the codes 110 and 111

14 D1 C2 B1 A3 The Huffman’s Algorithm (1)

15 The Huffman’s Algorithm (2) C2 B1D1 A3

16 The Huffman’s Algorithm (3) B1D1 C2 A3 BD2

17 The Huffman’s Algorithm (4) B1D1 A3 BD2 C2

18 The Huffman’s Algorithm (5) B1D1 A3 BD2 C2 CBD4

19 The Huffman’s Algorithm (6) B1D1 A3 BD2 C2 CBD4

20 The Huffman’s Algorithm (7) B1D1 A3 BD2 C2 CBD4 ACBD7

21 The Huffman’s Algorithm (8)  Build a min heap which contains the nodes of all symbols with the frequency values as the keys  Delete two nodes from the heap, concatenate the two symbols, add their frequencies, and put the result back into the heap  Make the two nodes become the two children of the node of the concatenated symbol i.e) if s=s 1 s 2 is the symbol concatenated from s 1 and s 2, then s 1 and s 2 become the left child and right child of s  Continue steps 2 and 3 until priority queue is empty

22 The Huffman’s Algorithm (9) Once the Huffman tree is constructed the code of any symbol can be constructed by starting at the leaf representing that symbol climbing up to the root The code is initialized to null each time that a left branch is climbed  0 is appended to the beginning of the code each time that a right branch is climbed  1 is appended to the beginning of the code

23

24 VAR position[i] : a pointer to the ith symbol n : the number of symbols /*none zero frequency */ frequency[i] : the relative frequency of the ith symbol code[i] : the code assigned to the ith symbol p, p1, p2: a pointer to Min heap's node or huffman tree's node Main Function { initialization; count the frequency of each symbol within the message; // construct a node for each symbol for(i=0; i < n; i++){ = create a node; position[i] = p; //a pointer to the leaf containing the ith symbol insert into Min heap ; }//end for The Huffman’s Algorithm (10)

25 The Huffman’s Algorithm (11) while(Min heap contains more than one item){ = delete Min heap; //combine p1 and p2 as branches of a single tree = create a node; set to be left_child of huffman tree p; set to be right_child of huffman tree p; insert into Min heap; }//end while

26 The Huffman’s Algorithm (12) //the tree is now constructed; use it to find codes = delete Min heap; for(i=0; i<n; i++){ p = position[i]; code[i] = NULL; while(p!=root){ //travel up to the root if(is left ) code[i]= 0 followed by code[i]; else code[i]= 1 followed by code[i]; = move to father node; } // end while }//end for }//end main


Download ppt "Data Structures Week 6: Assignment #2 Problem"

Similar presentations


Ads by Google