Presentation is loading. Please wait.

Presentation is loading. Please wait.

UTILITIES Group 3 Xin Li Soma Reddy. Data Compression To reduce the size of files stored on disk and to increase the effective rate of transmission by.

Similar presentations


Presentation on theme: "UTILITIES Group 3 Xin Li Soma Reddy. Data Compression To reduce the size of files stored on disk and to increase the effective rate of transmission by."— Presentation transcript:

1 UTILITIES Group 3 Xin Li Soma Reddy

2 Data Compression To reduce the size of files stored on disk and to increase the effective rate of transmission by modems.

3 A Standard coding scheme

4 File Compression Compression –Reducing the number of bits required for data representation. Two phases –The encoding phase (compressing) –The decoding phase (uncompressing) Strategy –Ensure that most-frequent characters have the shortest representation.

5 A Binary Trie A left branch represents 0 and a right branch represents 1. The path to a node indicates its representation.

6 Representation of the original code by a tree

7 A Slightly Better Tree

8 A Full Tree All nodes either are leaves or have two children.

9 A Prefix Code No character code is a prefix of another character code. Guaranteed if the characters are only in leaves. Can be decoded unambiguously.

10 An Optimal Prefix Code Tree

11 Optimal Prefix Code

12 Huffman’s Algorithm Constructs an optimal prefix code. The weight of a tree is the sum of the frequencies of its leaves. Works by repeatedly merging the two minimum weight trees.

13 Initial Stage of Huffman’s Algorithm

14 Huffman’s Algorithm After the First Merge

15 Huffman’s Algorithm After the Second Merge

16 Huffman’s Algorithm After the Third Merge

17 Huffman’s Algorithm After the Fourth Merge

18 Huffman’s Algorithm After the Fifth Merge

19 Huffman’s Algorithm After the Final Merge

20 Implementation BitInputStream Class BitOutputStream Class CharCounter Class HuffmanTree Class Hzip Class HZIPInputStream Class HZIPOutputStream Class

21 BitInputStream Class Wraps an Inputstream and provides bit-at-a-time input Main Methods: readBit reads one bit as a 0 or 1 getBit gets an individual bit in an 8-bit byte close closes underlying stream

22 BitOutputStream Class Wraps an Outputstream and provides bit-at-a-time output Main Methods: writeBit writes one bit (0 or 1) writebits writes array of bits setBit sets an individual bit in an 8-bit byte flush flushes buffered bits close closes underlying stream

23 CharCounter Class Maintains character counts Main Methods: getCount returns the number of occurrences of a character setCount sets the number of occurences of a character

24 HuffmanTree Class (cont) Manipulates Huffman coding trees Main Methods: getCode obtains the code of a given character getCharobtains the character by giving a code createTree constructs the Huffman coding tree

25 HuffmanTree Class Main Methods: writeEncodingTable writes an encoding table to an output stream readEncodingTable reads the encoding table from an input stream

26 Hzip Class Main Methods: compress adds a “.huf” to the filename uncompress adds a “.uc” to the filename main

27 HZIPInputStream Class Contains an uncompression wrapper Main Method: read returns an uncompressed byte from the wrapped input stream

28 HZIPOutputStream Class Contains a compression wrapper Writes to HZIPOutputStream are compressed and sent to the output stream being wrapped. No writing is actually done until close. Main Method: close

29 Programming Project Part 1 Storing the character counts in the encoding table gives the uncompression algorithm the ability to perform extra consistency checks. Code is added to verify that the result of the uncompression has the same character counts as the encoding table claimed.

30 Part 1 Implementation (cont) Add several public methods In HZIPInputputStream class public HuffmanTree getTree () { return codeTree; } In HuffmanTree class public CharCounter getCharCounter() { return theCounts; }

31 Part 1 Implementation In Hzip class, uncompress method HuffmanTree tree = hzin.getTree(); CharCounter newcc1 = tree.getCharCounter(); CharCounter newcc2 = new CharCounter(in); for (int i = 0; i < BitUtils.DIFF_BYTES; i++) { if (newcc2.getCount(i) != newcc1.getCount(i)) { System.out.println( " There is an error in the uncompressing process."); File file1 = new File(inFile); file1.delete(); }

32 Part 2 Check the size of the resulting compressed file and abort if the size is larger than or equal to the original.

33 Part 2 Implementation In Hzip class, compress method File originFile = new File (inFile); File compreFile = new File (compressedFile); if (originFile.length() < compreFile.length()) { System.out.println( "The size of the resulting compressed file is larger than the original."); compreFile.delete(); return; } else if (originFile.length() == compreFile.length()) { System.out.println( "The size of the resulting compressed file is equal to the original."); compreFile.delete(); return; }

34 Run Example To compress a text file whose size is six bytes C:\>set path=c:/j2sdk1.4.1_01/bin C:\>javac Hzip.java C:\>javac HZIPInputStream.java C:\>javac HZIPOutputStream.java C:\>java Hzip -c file1.txt The size of the resulting compressed file is larger than the original. C:\>

35 Conclusion Text compression is an important technique that allows us to increase both effective disk capacity and effective modem speed. It is an area of active research. Huffman’s algorithm typically achieves compression of 25% on text files.


Download ppt "UTILITIES Group 3 Xin Li Soma Reddy. Data Compression To reduce the size of files stored on disk and to increase the effective rate of transmission by."

Similar presentations


Ads by Google