Huffman Encoding Veronica Morales.

Slides:



Advertisements
Similar presentations
Lecture 4 (week 2) Source Coding and Compression
Advertisements

Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Greedy Algorithms Amihood Amir Bar-Ilan University.
Greedy Algorithms (Huffman Coding)
Lecture 10 : Huffman Encoding Bong-Soo Sohn Assistant Professor School of Computer Science and Engineering Chung-Ang University Lecture notes : courtesy.
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Compression & Huffman Codes
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
Huffman Coding: An Application of Binary Trees and Priority Queues
Optimal Merging Of Runs
CSCI 3 Chapter 1.8 Data Compression. Chapter 1.8 Data Compression  For the purpose of storing or transferring data, it is often helpful to reduce the.
A Data Compression Algorithm: Huffman Compression
Lecture 6: Greedy Algorithms I Shang-Hua Teng. Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization.
Data Structures – LECTURE 10 Huffman coding
Chapter 9: Huffman Codes
Data Compression Gabriel Laden CS146 – Dr. Sin-Min Lee Spring 2004.
Huffman Codes Message consisting of five characters: a, b, c, d,e
CSE Lectures 22 – Huffman codes
Data Structures and Algorithms Huffman compression: An Application of Binary Trees and Priority Queues.
Data Structures Arrays both single and multiple dimensions Stacks Queues Trees Linked Lists.
Algorithm Design & Analysis – CS632 Group Project Group Members Bijay Nepal James Hansen-Quartey Winter
Huffman Codes. Encoding messages  Encode a message composed of a string of characters  Codes used by computer systems  ASCII uses 8 bits per character.
1 Analysis of Algorithms Chapter - 08 Data Compression.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
CS-2852 Data Structures LECTURE 13B Andrew J. Wozniewicz Image copyright © 2010 andyjphoto.com.
Data Structures Week 6: Assignment #2 Problem
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
Huffman Coding. Huffman codes can be used to compress information –Like WinZip – although WinZip doesn’t use the Huffman algorithm –JPEGs do use Huffman.
Introduction to Algorithms Chapter 16: Greedy Algorithms.
Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 8. Greedy Algorithms.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Huffman Code and Data Decomposition Pranav Shah CS157B.
Priority Queues, Trees, and Huffman Encoding CS 244 This presentation requires Audio Enabled Brent M. Dingle, Ph.D. Game Design and Development Program.
Huffman Encodings Section 9.4. Data Compression: Array Representation Σ denotes an alphabet used for all strings Each element in Σ is called a character.
CSCE350 Algorithms and Data Structure Lecture 19 Jianjun Hu Department of Computer Science and Engineering University of South Carolina
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
CPS 100, Spring Huffman Coding l D.A Huffman in early 1950’s l Before compressing data, analyze the input stream l Represent data using variable.
Huffman’s Algorithm 11/02/ Weighted 2-tree A weighted 2-tree T is an extended binary tree with n external nodes and each of the external nodes is.
Bahareh Sarrafzadeh 6111 Fall 2009
1 Algorithms CSCI 235, Fall 2015 Lecture 30 More Greedy Algorithms.
Lossless Decomposition and Huffman Codes Sophia Soohoo CS 157B.
Huffman Codes. Overview  Huffman codes: compressing data (savings of 20% to 90%)  Huffman’s greedy algorithm uses a table of the frequencies of occurrence.
1 Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004.
Lecture 12 Huffman Algorithm. In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly.
Greedy algorithms 2 David Kauchak cs302 Spring 2012.
Compression and Huffman Coding. Compression Reducing the memory required to store some information. Lossless compression vs lossy compression Lossless.
Design & Analysis of Algorithm Huffman Coding
Huffman Codes ASCII is a fixed length 7 bit code that uses the same number of bits to define each character regardless of how frequently it occurs. Huffman.
HUFFMAN CODES.
COMP261 Lecture 22 Data Compression 2.
CSC317 Greedy algorithms; Two main properties:
Assignment 6: Huffman Code Generation
ISNE101 – Introduction to Information Systems and Network Engineering
The Greedy Method and Text Compression
Optimal Merging Of Runs
Data Compression If you’ve ever sent a large file to a friend, you may have compressed it into a zip archive like the one on this slide before doing so.
Chapter 8 – Binary Search Tree
Chapter 9: Huffman Codes
Analysis & Design of Algorithms (CSCE 321)
Advanced Algorithms Analysis and Design
Greedy Algorithms Many optimization problems can be solved more quickly using a greedy approach The basic principle is that local optimal decisions may.
Huffman Coding CSE 373 Data Structures.
Huffman Encoding Huffman code is method for the compression for standard text documents. It makes use of a binary tree to develop codes of varying lengths.
Greedy Algorithms TOPICS Greedy Strategy Activity Selection
Data Structure and Algorithms
File Compression Even though disks have gotten bigger, we are still running short on disk space A common technique is to compress files so that they take.
Podcast Ch23d Title: Huffman Compression
Algorithms CSCI 235, Spring 2019 Lecture 30 More Greedy Algorithms
Huffman Coding Greedy Algorithm
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Presentation transcript:

Huffman Encoding Veronica Morales

Background Introduced by David Huffman in 1952 Method for encoding data by compression Compressions between 20%-90% Variable-length encoding scheme Used in digital imaging and video

Fixed-length vs. Variable-length Encoding Every character code is composed of a “fixed” number of bits, i.e., ASCII code is fixed-length. The ASCII standard uses 7 bits per character Variable-length Character code lengths vary. Huffman encoding uses shorter bit patterns for more common characters, and longer bit patterns for less common characters.

How does it work? The “greedy” approach Relies on frequency of occurrence (probability) of each character to build up an optimal encoding. Each character and its frequency is placed on a leaf of a full tree. The two nodes with the smallest frequencies are added and the sum becomes the frequency of the parent node. This process repeats until the root node of the tree is the sum of all leaves.

Encode: Ileana Streinu Create a table with all characters and their probabilities

Make characters leaf-nodes of a tree.

Combine two smallest weights continuously…

…until all nodes are accounted for and we have a main root …until all nodes are accounted for and we have a main root. The tree is full because every parent has two children. To encode, start from root and as you head down to target letter, use 0 for a left turn and 1 for right turn.

Final tree representation of coding map for “Ileana Streinu”

EXAMPLE 1101110110111001000111 ?

1101110110111001000111 110 – E 111 – A 011 – T 1001 – U 000 – N

What’s the benefit? Huffman encoding done with 22 bits 1101110110111001000111 ASCII coding done with 49 bits 1000101 1000001 1010100 1010100 1010101 1001110 1000001 47% savings in space

Complexity Assume n items Build a priority Queue (using the Build-Heap procedure) to identify the two least-frequent objects O (n)

Build the Huffman Tree Since we have n leaves, we will perform an ‘merging’ operation of two nodes, |n|-1 times and since every heap operation ,i.e., extract the two minimum nodes and then add a node, is O (log n), we have that Huffman’s algorithm is O (n log n)

Encoding using Huffman Tree Traverse tree from root to leaf is O (log n)

Real Life Application of Huffman Codes GNU gzip Data Compression Internet standard for data compression Consists of short header a number of compressed “blocks” an 8 byte trailer

Compressed “Blocks” Three compressed “blocks”: stored, static, dynamic. Static and Dynamic blocks use an alphabet that is encoded using Huffman Encoding http://www.daylight.com/meetings/mug2000/Sayle/gzip.html