Presentation is loading. Please wait.

Presentation is loading. Please wait.

Homework #5 New York University Computer Science Department Data Structures Fall 2008 Eugene Weinstein.

Similar presentations


Presentation on theme: "Homework #5 New York University Computer Science Department Data Structures Fall 2008 Eugene Weinstein."— Presentation transcript:

1 Homework #5 New York University Computer Science Department Data Structures Fall 2008 Eugene Weinstein

2 Homework #4 Review Huffman coding is a variable-length binary encoding for text We implemented Huffman's optimal code finding algorithm (book 389-395) o Builds tree representing shortest possible code Input for HW#4: letters, frequencies: o A 20 E 24... Construct Huffman tree Navigate tree to find code: o c: 0, a: 10, b: 11

3 Homework #5 Overview Given a document o Calculate letter frequencies o Construct Huffman code o Encode document o Calculate memory savings of Huffman binary encoding vs 8-bit ASCII o Correctly decode document We can use Huffman code building algorithm from HW#4 o So we will keep HuffmanTree and HuffmanNode

4 Organization The new code for this assignment should go into HuffmanConverter.java o The filename of file to encode is passed as a parameter on the command line o So if my file is foo.txt, I should be able to run  java HuffmanConverter foo.txt o Then foo.txt show up in args[0] o If you use an IDE, specify command-line options through the menus Test inputs and outputs linked from assignment page (2007 version)2007 version

5 HuffmanConverter Instance Vars String contents - stores file to process o Lines are separated by '\n' - line break character o e.g., twoLines = line1 + '\n' + line2; HuffmanTree huffmanTree - output of HW4 int count[] - frequencies in input file o Indexed on ASCII value of characters, e.g., count[(int)'a'] is frequency of 'a' String code[] - binary string per character o Also indexed on ASCII value, e.g., code[(int)'a'] == "10001"

6 To Implement readContents() - reads in a file and stores in String contents recordFrequencies() - process file stored in contents and store frequencies in count[] frequenciesToTree() - use HW4 code to produce Huffman tree treeToCode() - slight modification of HW4: traverse Huffman tree and populate code[] encodeMessage() - use code[] to encode decodeMessage() - use inverse of code[]

7 Implementation Notes readContents() can use Scanner o Read a line at a time, and append to contents inserting '\n' to separate lines recordFrequencies(): iterate over contents one character at a time frequenciesToTree() o Very similar to main() method of HW4 o Create a BinaryHeap object o For every non-zero-count letter, create a HuffmanNode object, insert into heap o Then run Huffman algorithm

8 Implementation Notes, Cont'd treeToCode() o Similar to printCode() of HW4 o Instead of printing code, store in code[] encodeMessage() o For each character of contents, look up its binary string in code[], append

9 Implementation Notes, Cont'd decodeMessage() o Need to implement inverse mapping of code[]: binary strings to characters o Several possible implementations  Traverse Huffman tree as you read binary string, output character when you reach a leaf  Build HashMap mapping strings to ASCII values of characters

10 HashMap An array maps integers to Objects o e.g., String args[]: args[i] returns ith String A HashMap maps Objects to Objects Access with put() and get(), e.g., o HashMap ids = new HashMap(); o ids.put("Alice", 123456789); o ids.put("Ben", 321654987); o int id = (Integer) ids.get("Alice"); o // id gets 123456789 For decode, map bit Strings to characters

11 Homework #5 Tips Keep checking intermediate results Make use of sample outputs herehere Print out intermediate results! You might need special cases for newline ('\n') Your encoding might differ from the examples o Depends on the BinaryHeap implementation o Same-frequency items are returned in arbitrary order (e.g., in love_poem_58, 'N', '-', '.', 'W', and 'p' all have frequency one) However, Huffman encoding length must match! o Guaranteed to be shortest-length encoding


Download ppt "Homework #5 New York University Computer Science Department Data Structures Fall 2008 Eugene Weinstein."

Similar presentations


Ads by Google