Data Compression If you’ve ever sent a large file to a friend, you may have compressed it into a zip archive like the one on this slide before doing so.

Slides:



Advertisements
Similar presentations
Functional Programming Lecture 15 - Case Study: Huffman Codes.
Advertisements

Introduction to Computer Science 2 Lecture 7: Extended binary trees
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Greedy Algorithms Amihood Amir Bar-Ilan University.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture3.
Huffman Encoding Dr. Bernard Chen Ph.D. University of Central Arkansas.
Greedy Algorithms (Huffman Coding)
Lecture 10 : Huffman Encoding Bong-Soo Sohn Assistant Professor School of Computer Science and Engineering Chung-Ang University Lecture notes : courtesy.
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Huffman Encoding 16-Apr-17.
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
Optimal Merging Of Runs
A Data Compression Algorithm: Huffman Compression
Is ASCII the only way? For computers to do anything (besides sit on a desk and collect dust) they need two things: 1. PROGRAMS 2. DATA A program is a.
CSE 143 Lecture 18 Huffman slides created by Ethan Apter
Lossless Data Compression Using run-length and Huffman Compression pages
Huffman code uses a different number of bits used to encode characters: it uses fewer bits to represent common characters and more bits to represent rare.
x x x 1 =613 Base 10 digits {0...9} Base 10 digits {0...9}
Huffman Codes Message consisting of five characters: a, b, c, d,e
Data Structures and Algorithms Huffman compression: An Application of Binary Trees and Priority Queues.
Data Structures Arrays both single and multiple dimensions Stacks Queues Trees Linked Lists.
Huffman Codes. Encoding messages  Encode a message composed of a string of characters  Codes used by computer systems  ASCII uses 8 bits per character.
Huffman Encoding Veronica Morales.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
Data Structures Week 6: Assignment #2 Problem
 The amount of data we deal with is getting larger  Not only do larger files require more disk space, they take longer to transmit  Many times files.
Huffman Coding. Huffman codes can be used to compress information –Like WinZip – although WinZip doesn’t use the Huffman algorithm –JPEGs do use Huffman.
Introduction to Algorithms Chapter 16: Greedy Algorithms.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Huffman Code and Data Decomposition Pranav Shah CS157B.
Priority Queues, Trees, and Huffman Encoding CS 244 This presentation requires Audio Enabled Brent M. Dingle, Ph.D. Game Design and Development Program.
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
CPS 100, Spring Huffman Coding l D.A Huffman in early 1950’s l Before compressing data, analyze the input stream l Represent data using variable.
Huffman’s Algorithm 11/02/ Weighted 2-tree A weighted 2-tree T is an extended binary tree with n external nodes and each of the external nodes is.
Trees (Ch. 9.2) Longin Jan Latecki Temple University based on slides by Simon Langley and Shang-Hua Teng.
1 Algorithms CSCI 235, Fall 2015 Lecture 30 More Greedy Algorithms.
Lossless Decomposition and Huffman Codes Sophia Soohoo CS 157B.
1 Huffman Codes. 2 ASCII use same size encoding for all characters. Variable length codes can produce shorter messages than fixed length codes Huffman.
Lecture 12 Huffman Algorithm. In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly.
CSE 143 Lecture 22 Huffman slides created by Ethan Apter
Huffman code and Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Compression and Huffman Coding. Compression Reducing the memory required to store some information. Lossless compression vs lossy compression Lossless.
Lecture on Data Structures(Trees). Prepared by, Jesmin Akhter, Lecturer, IIT,JU 2 Properties of Heaps ◈ Heaps are binary trees that are ordered.
Data Compression: Huffman Coding in Weiss (p.389)
3.3 Fundamentals of data representation
Design & Analysis of Algorithm Huffman Coding
Huffman Codes ASCII is a fixed length 7 bit code that uses the same number of bits to define each character regardless of how frequently it occurs. Huffman.
HUFFMAN CODES.
COMP261 Lecture 22 Data Compression 2.
Assignment 6: Huffman Code Generation
Madivalappagouda Patil
Huffman Coding Based on slides by Ethan Apter & Marty Stepp
Optimal Merging Of Runs
Chapter 8 – Binary Search Tree
Chapter 9: Huffman Codes
Optimal Merging Of Runs
Huffman Coding.
Advanced Algorithms Analysis and Design
Greedy Algorithms Many optimization problems can be solved more quickly using a greedy approach The basic principle is that local optimal decisions may.
Huffman Coding CSE 373 Data Structures.
Huffman Encoding Huffman code is method for the compression for standard text documents. It makes use of a binary tree to develop codes of varying lengths.
Trees Addenda.
Data Structure and Algorithms
File Compression Even though disks have gotten bigger, we are still running short on disk space A common technique is to compress files so that they take.
Huffman Encoding.
Podcast Ch23d Title: Huffman Compression
Algorithms CSCI 235, Spring 2019 Lecture 30 More Greedy Algorithms
Huffman Coding Greedy Algorithm
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Presentation transcript:

Data Compression If you’ve ever sent a large file to a friend, you may have compressed it into a zip archive like the one on this slide before doing so. Basically, file compression relies on using a smaller number of bits than normal to represent data.

Huffman Example ASCII A 01000001 B 01000010 C 01000011 D 01000100 A 01 There are many ways to compress data, but we’ll focus here on a method called Huffman Coding. ASCII is a fixed-length encoding scheme in which files are decoded by looking at file blocks of constant length (8 bits). Huffman encodings on the other hand are variable-length encodings, meaning that each character can have a representation of different length. If we can somehow figure out which characters occur most often, and encode them with fewer than 8 bits, we will be able to save more space than ASCII. However, we will then have to code some of the less frequently occurring characters with more than 8 bits, but that’s okay as long as the net effect is a decrease in total number of bits used!

Prefix Property A 0 B 1 C 01 D 11 A 0 B 10 C 110 D 111 An important property of Huffman coding is that no bit representation for any of the characters is a prefix of any other character’s representation. This is called the prefix property, and an encoding scheme with the prefix property is said to be immediately decodable. Why is the prefix property so important? Consider the following example: If we use the coding on the left to encode “CAB”, we would get the bit-string 0101. However, presented with the bit-string 0101, there is no way to tell whether it represents “ABAB”, “CAB”, “ABC”, or “CC”. In contrast, the coding on the right is unambiguous. The bit-string “110010” can only represent “CAB” (confirm it for yourself!).

Step 1: Frequency Analysis String: “ECEABEADCAEDEEEECEADEEEEEDBAAEABDBBAAEAAACDDCCEABEEDCBEEDEAEEEEEAEEDBCEBEEADEAEEDAEBCDEDEAEEDCEEAEEE” We will use a binary tree to construct Huffman encodings for each character. First, however, we must figure out how often each character occurs in the file by building a frequency table. Test Yourself! After Huffman encoding, which of the letters in this string will be represented by the shortest bit-string? character A B C D E frequency 0.2 0.1 0.15 0.45

Step 2: Forest of Single Trees B 0.1 C 0.1 D 0.15 A 0.2 E 0.45 Next, we’ll create a forest of trees, where each tree is a single node containing a character and the frequency of that character.

Step 3: Combine Trees 0.2 B 0.1 C 0.1 D 0.15 A 0.2 E 0.45 Then, we combine trees until the forest contains only one tree. While there’s still more than one tree: Remove the two trees from the forest that have the smallest frequencies in their roots. Create a new node, to be the root of a new tree with the two trees in step 1 as its children. The frequency of this parent node is set to the sum of the frequencies of its two children. Insert this new tree back into the forest. 0.2 B 0.1 C 0.1 D 0.15 A 0.2 E 0.45

Step 3: Combine Trees 0.35 0.2 B 0.1 C 0.1 D 0.15 A 0.2 E 0.45 While there’s still more than one tree: Remove the two trees from the forest that have the smallest frequencies in their roots. Create a new node, to be the root of a new tree with the two trees in step 1 as its children. The frequency of this parent node is set to the sum of the frequencies of its two children. Insert this new tree back into the forest. 0.2 B 0.1 C 0.1 D 0.15 A 0.2 E 0.45

Step 3: Combine Trees 0.55 0.35 0.2 B 0.1 C 0.1 D 0.15 A 0.2 E 0.45 While there’s still more than one tree: Remove the two trees from the forest that have the smallest frequencies in their roots. Create a new node, to be the root of a new tree with the two trees in step 1 as its children. The frequency of this parent node is set to the sum of the frequencies of its two children. Insert this new tree back into the forest. 0.2 B 0.1 C 0.1 D 0.15 A 0.2 E 0.45

Step 3: Combine Trees 1.0 0.55 0.35 0.2 B 0.1 C 0.1 D 0.15 A 0.2 E While there’s still more than one tree: Remove the two trees from the forest that have the smallest frequencies in their roots. Create a new node, to be the root of a new tree with the two trees in step 1 as its children. The frequency of this parent node is set to the sum of the frequencies of its two children. Insert this new tree back into the forest. Nice! We now only have one tree! Notice that all the frequencies add up to a total of 1.0. 0.2 B 0.1 C 0.1 D 0.15 A 0.2 E 0.45

Step 4: Traverse Tree 1 1 1 1 1.0 0.55 0.35 0.2 B 0.1 C 0.1 D 0.15 A 0.55 1 0.35 1 1 To convert each letter into its Huffman encoding, we traverse the tree. For the letter “E,” we step right just once, so “E”’s encoding is 1. For the letter “A,” we step left once and right once, so “A”’s encoding is 01. “D” is 001. What are the Huffman representations of “C” and “B”? 0.2 1 B 0.1 C 0.1 D 0.15 A 0.2 E 0.45