Presentation on theme: "Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna"— Presentation transcript:
1Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna 9.4 Huffman TreesMichael Alves, Patrick Dugan,Robert Daniels, Carlos Vicuna
2EncodingIn computer technology, encoding is the process of putting a sequence of characters into a special format for transmission or storage purposes.Is the term used to reference to the processes of analog-to-digital conversion, and can be used in the context of any type of data such as text, images, audio, video or multimedia.
3Huffman CodeHuffman coding is an entropy encoding algorithm used for lossless data compression. The term refers to the use of a variable-length code table for encoding a source symbol (such as a character in a file) where the variable-length code table has been derived in a particular way based on the estimated probability of occurrence for each possible value of the source symbol.
4Huffman CodeHuffman coding uses a specific method for choosing the representation for each symbol, resulting in a prefix code that expresses the most common source symbols using shorter strings of bits than are used for less common source symbols. Huffman was able to design the most efficient compression method of this type: no other mapping of individual source symbols to unique strings of bits will produce a smaller average output size when the actual symbol frequencies agree with those used to create the code.
5Huffman CodeThe running time of Huffman's method is fairly efficient, it takes O(n log n) operations to construct it. A method was later found to design a Huffman code in linear time if input probabilities (also known as weights) are sorted
6Huffman TreesInitialize n single-node trees labeled with the symbols from the given alphabet. Record the frequency of each symbol in it’s trees root to indicate the tree’s weight.Repeat these steps until a single tree is obtained. Find two trees with the smallest weight. Make them the left and right subtree of a new tree and record the sum of their weights in the root of the new tree as its weight.
7Huffman TreesExample: Construct a Huffman tree from the following dataSymbolABCD_Frequency0.40.10.20.15
81.0In order to generate binary prefix-free codes for each symbol in our alphabet, we label each left edge of our tree with 0 and every right edge with 1.10.610.250.35B0.1D0.15_0.15C0.2A0.411
9Huffman Trees Resulting codewords: Symbol A B C D _ Frequency 0.4 0.1 0.20.15Codeword100111101110Example : CAB_ is encoded asAverage number of bits per symbol = 1 ∙ ∙ ∙ ∙ 0.15 = 1.75Compression ratio = (3 – 1.75)/3 * 100 = 42% less memory used than fixed-length encoding
10PseudocodeHuffmanTree(B[0..n − 1]) Constructs Huffman’s tree Input : An array B[0..n − 1] of weights Output : A Huffman tree with the given weights assigned to its leaves initialize priority queue S of size n with single-node trees and priorities equal to the elements of B[0..n − 1] while S has more than one element do Tl ← the smallest-weight tree in S delete the smallest-weight tree in S Tr ← the smallest-weight tree in S create a new tree T with Tl and Tr as its left and right subtrees and the weight equal to the sum of Tl and Tr weights insert T into S return T
11Huffman EncodingHuffman encoding provides the optimal encoding for all codes using individual letters and those corresponding frequencies. Algorithms that use more then that can lead to better encoding but require more analyzing of the file. The idea is that if we have letters that are more frequent than others we represent those with less bits.
12Compression RatioThe compression ration is a standard used to compare other ways of coding: fixed bit length - coded bit length fixed bit length This will give the percent of memory used on the encoding compared to fixed encoding(same length code strings) If we wanted to compare to different algorithms would substitute their average bit length instead of fixed* 100
13Compression Ratio cont. Event NameProbabilityCodeLengthA0.3002B01C0.131003D0.12101E0.1110F0.05111Huffman avg bits = (.3 * 2) + (.3 * 2) + (.13 * 3) + (.12 * 3) + (.1 * 3) + (.05 * 3) = 2.4 Compression Ratio = (( ) / 3) * 100 = 20% This Huffman encoding uses 20% less memory than it’s fixed length implementation. Extensive testing with Huffman have shown that it typically falls between 20%-80% better depending on text.
14Real Life ApplicationHuffman trees aren’t just used in encoding- they can be used in any sort of problem involving “yes or no” decision making.“Yes or no” refers to asking multiple questions with only 2 possible answers (i.e. true or false, heads or tails, etc.)By breaking the problem down into a series of yes or no questions, you can build a binary tree out of the possible outcomes. This is called a Decision Tree.
15But What Will A Huffman Tree Do? Huffman’s algorithm is designed to create a minimal length path for a tree with a given weighted path length.This means that the binary tree created will have the shortest paths possible.RootRootParentVS.ParentLeafLeafLeafLeafTo get to a given leaf, theaverage path length is 2To get to a given leaf, theaverage path length is 1.5
16So…?A Huffman Tree is essentially an optimal binary tree that gives frequently accessed nodes shorter paths and less frequently accessed nodes longer paths.When applied to a Decision Tree, this means that we will reach the solution, on average, with fewer “questions”. Hence, we solve the problem faster.
17Let’s Try it!Consider the problem of guessing which cup a marble is under. There are 4 cups (c1, c2, c3, and c4), and you do not get to see which cup the marble is placed under.One decision tree could be:Is it c1?c1Is it c2?Is it c3?c2Average length:(.25)(1) + (.25)(2) + (.25)(3)(2)= 2.25c4c3Guide: No <- Question -> Yes
18Let’s Try It!But what if the person, who cannot be truly random, is more likely to put it under c4? Assume that c4 now has a 40% chance, rather than 25%. The other cups now have a 20% chance.Is it c4?Why is this better than the otherTree, which has an average of 2, ratherThan this tree’s seemingly 2.25?0.4Is it under c3?c40.2Is it c2?c3Because of c4’s weight, we are morelikely to pick c4. Therefore, we saveTime and the average weight is(.4)(1) + (.2)(2) + (.4)(3) = 20.20.2c1c2