Lossless Data Compression Using run-length and Huffman Compression pages 291-294.

Lossless Data Compression Using run-length and Huffman Compression pages 291-294

Summary The principle of data compression is to store data using as few bits as possible. Often this just involves eliminating redundancy (i.e. instead of storing “AAAAAAAAAAAA,” one method of lossless data compression would store it as “A12”). If the data is in binary and consists only of two symbols—0 and 1—then it is possible to count the number occurrences of one character between another. For example, you can count the number of 0’s between each 1. These two methods are known as “run-length encoding.” Another type of commonly used encoding is Huffman Coding. In Huffman coding, you assign longer designations to symbols that occur more frequently and shorter designations to symbols that occur often.

Run-length encoding with multiple symbols Original Data JJJJJJJAAAAAAAAAAAUUUUURLLLLLLLLLLLL (252 bits) Compressed Data J07A11U05R01L12 (105 bits)

Run-length encoding with two symbols

Huffman Coding In Huffman coding, you assign shorter codes to symbols that occur more frequently and longer codes to symbols that are used less frequently.

Huffman Coding Example Before you can assign bit patterns to each characters, you assign each character a weight based on its frequency of use. Example: CharacterABCDE Frequency1712 2732

Huffman Coding Process Once you have established the weight of each character, you must build a tree based on those values. The process follows three basic steps, as detailed on the following slides.

Huffman Coding Tree Building 1. Put the entire character set in a row. Each character is now a node at the lowest level of the tree.

Huffman Coding Tree Building (cont.) 2. Find the two nodes with the smallest weights and join them to form a third node. The weight of the third node is the combined weight of the two original nodes. This second level node is now eligible for combining with any other node. However, any combination must be of the two lowest value nodes. a. b.

Huffman Coding Tree Building (cont.) 3. Repeat step 2 until all nodes, on every level, are combined into one tree. c.d.e.

Huffman Coding Tree Building (cont.) Once the tree is complete, use it to assign codes to each character. First, assign a bit value to each branch. Starting from the root (top node), assign 0 to the left branch and 1 to the right branch and repeat this pattern at each node.

Huffman Coding Tree Building (cont.) A character’s code is found by starting at the root and following the branches that lead to that character.

Huffman Coding encoding Once the characters’ codes have been determined, they can be used to encode data. For example, using the previous character codes, the string EAEBAECDEA (70 bits) would be encoded as 1100110100011011101100 (22 bits). A:00 D:10 B:010 E:11 C:011

Huffman Coding decoding Just as the character codes can be used to encode strings of data, they codes can be applied to decoding data. The reverse steps are taken—the string of 1’s and 0’s would be applied to the code chart and the original text would be outputted.

Lossless Data Compression Using run-length and Huffman Compression pages 291-294.

Similar presentations

Presentation on theme: "Lossless Data Compression Using run-length and Huffman Compression pages 291-294."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lossless Data Compression Using run-length and Huffman Compression pages 291-294.

Similar presentations

Presentation on theme: "Lossless Data Compression Using run-length and Huffman Compression pages 291-294."— Presentation transcript:

Similar presentations

About project

Feedback