MA/CSSE 473 Day 31 Student questions Data Compression Minimal Spanning Tree Intro.

Slides:



Advertisements
Similar presentations
Lecture 4 (week 2) Source Coding and Compression
Advertisements

Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Binary Trees CSC 220. Your Observations (so far data structures) Array –Unordered Add, delete, search –Ordered Linked List –??
Greedy Algorithms (Huffman Coding)
Lecture 10 : Huffman Encoding Bong-Soo Sohn Assistant Professor School of Computer Science and Engineering Chung-Ang University Lecture notes : courtesy.
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Compression & Huffman Codes
Huffman Encoding 16-Apr-17.
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
Huffman Coding: An Application of Binary Trees and Priority Queues
1 Assignment 2: (Due at 10:30 a.m on Friday of Week 10) Question 1 (Given in Tutorial 5) Question 2 (Given in Tutorial 7) If you do Question 1 only, you.
A Data Compression Algorithm: Huffman Compression
Is ASCII the only way? For computers to do anything (besides sit on a desk and collect dust) they need two things: 1. PROGRAMS 2. DATA A program is a.
CS 206 Introduction to Computer Science II 04 / 29 / 2009 Instructor: Michael Eckmann.
Data Structures – LECTURE 10 Huffman coding
Chapter 9: Huffman Codes
CSE 143 Lecture 18 Huffman slides created by Ethan Apter
Fundamentals of Multimedia Chapter 7 Lossless Compression Algorithms Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
Lossless Data Compression Using run-length and Huffman Compression pages
Data Compression and Huffman Trees (HW 4) Data Structures Fall 2008 Modified by Eugene Weinstein.
1 Lossless Compression Multimedia Systems (Module 2) r Lesson 1: m Minimum Redundancy Coding based on Information Theory: Shannon-Fano Coding Huffman Coding.
Data Compression Gabriel Laden CS146 – Dr. Sin-Min Lee Spring 2004.
Huffman code uses a different number of bits used to encode characters: it uses fewer bits to represent common characters and more bits to represent rare.
CSE Lectures 22 – Huffman codes
Data Structures and Algorithms Huffman compression: An Application of Binary Trees and Priority Queues.
CS 46B: Introduction to Data Structures July 30 Class Meeting Department of Computer Science San Jose State University Summer 2015 Instructor: Ron Mak.
Data Structures Arrays both single and multiple dimensions Stacks Queues Trees Linked Lists.
Data Compression1 File Compression Huffman Tries ABRACADABRA
Huffman Encoding Veronica Morales.
1 Analysis of Algorithms Chapter - 08 Data Compression.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
CS-2852 Data Structures LECTURE 13B Andrew J. Wozniewicz Image copyright © 2010 andyjphoto.com.
Data Structures Week 6: Assignment #2 Problem
1 Huffman Codes Drozdek Chapter Objectives You will be able to Construct an optimal variable bit length code for an alphabet with known probability.
 The amount of data we deal with is getting larger  Not only do larger files require more disk space, they take longer to transmit  Many times files.
© Jalal Kawash 2010 Trees & Information Coding: 3 Peeking into Computer Science.
© Jalal Kawash 2010 Trees & Information Coding Peeking into Computer Science.
Huffman Coding. Huffman codes can be used to compress information –Like WinZip – although WinZip doesn’t use the Huffman algorithm –JPEGs do use Huffman.
Homework #5 New York University Computer Science Department Data Structures Fall 2008 Eugene Weinstein.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
UTILITIES Group 3 Xin Li Soma Reddy. Data Compression To reduce the size of files stored on disk and to increase the effective rate of transmission by.
Trees (Ch. 9.2) Longin Jan Latecki Temple University based on slides by Simon Langley and Shang-Hua Teng.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Priority Queues, Trees, and Huffman Encoding CS 244 This presentation requires Audio Enabled Brent M. Dingle, Ph.D. Game Design and Development Program.
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
Building Java Programs Priority Queues, Huffman Encoding.
CS654: Digital Image Analysis Lecture 34: Different Coding Techniques.
Trees (Ch. 9.2) Longin Jan Latecki Temple University based on slides by Simon Langley and Shang-Hua Teng.
1 Algorithms CSCI 235, Fall 2015 Lecture 30 More Greedy Algorithms.
1 Huffman Codes. 2 ASCII use same size encoding for all characters. Variable length codes can produce shorter messages than fixed length codes Huffman.
Lecture 12 Huffman Algorithm. In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly.
CSE 143 Lecture 22 Huffman slides created by Ethan Apter
D ESIGN & A NALYSIS OF A LGORITHM 12 – H UFFMAN C ODING Informatics Department Parahyangan Catholic University.
Design & Analysis of Algorithm Huffman Coding
HUFFMAN CODES.
Representing Sets (2.3.3) Huffman Encoding Trees (2.3.4)
ISNE101 – Introduction to Information Systems and Network Engineering
Huffman Coding Based on slides by Ethan Apter & Marty Stepp
Data Compression If you’ve ever sent a large file to a friend, you may have compressed it into a zip archive like the one on this slide before doing so.
Chapter 8 – Binary Search Tree
Chapter 11 Data Compression
Huffman Coding CSE 373 Data Structures.
Huffman Encoding Huffman code is method for the compression for standard text documents. It makes use of a binary tree to develop codes of varying lengths.
Trees Addenda.
Data Structure and Algorithms
Huffman Encoding.
Podcast Ch23d Title: Huffman Compression
CSE 589 Applied Algorithms Spring 1999
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Presentation transcript:

MA/CSSE 473 Day 31 Student questions Data Compression Minimal Spanning Tree Intro

DATA COMPRESSION More important than ever … This presentation repeats my CSSE 230 presentation

Letter frequencies Data (Text) Compression YOU SAY GOODBYE. I SAY HELLO. HELLO, HELLO. I DON'T KNOW WHY YOU SAY GOODBYE, I SAY HELLO. There are 90 characters altogether. How many total bits in the ASCII representation of this string? We can get by with fewer bits per character (custom code) How many bits per character? How many for entire message? Do we need to include anything else in the message? How to represent the table? 1. count 2. ASCII code for each character How to do better?

Named for David Huffman – –Invented while he was a graduate student at MIT. –Huffman never tried to patent an invention from his work. Instead, he concentrated his efforts on education. –In Huffman's own words, "My products are my students." Principles of variable-length character codes: –Less-frequent characters have longer codes –No code can be a prefix of another code We build a tree (based on character frequencies) that can be used to encode and decode messages Compression algorithm: Huffman encoding

Assume that we have some routines for packing sequences of bits into bytes and writing them to a file, and for unpacking bytes into bits when reading the file –Weiss has a very clever approach: BitOutputStream and BitInputStream methods writeBit and readBit allow us to logically read or write a bit at a time Variable-length Codes for Characters

A Huffman code: HelloGoodbye message Draw part of the Tree Decode a "message"

I 1 R 1 N 2 O 3 A 3 T 5 E 8 Build the tree for a smaller message Start with a separate tree for each character (in a priority queue) Repeatedly merge the two lowest (total) frequency trees and insert new tree back into priority queue Use the Huffman tree to encode NATION. Huffman codes are provably optimal among all single-character codes

When we send a message, the code table can basically be just the list of characters and frequencies –Why? Three or four bytes per character –The character itself. –The frequency count. End of table signaled by 0 for char and count. Tree can be reconstructed from this table. The rest of the file is the compressed message. What About the Code Table?

This code provides human-readable output to help us understand the Huffman algorithm. We will deal with Huffman at the abstract level; "real" code to do actual file compression is found in Weiss chapter 12. I am confident that you can figure out the other details if you need them. Based on code written by Duane Bailey, in his book JavaStructures. JavaStructures A great thing about this example is the use of various data structures (Binary Tree, Hash Table, Priority Queue). Huffman Java Code Overview I do not want to get caught up in lots of code details in class, so I will give a quick overview; you should read details of the code on your own.

Leaf: Represents a leaf node in a Huffman tree. –Contains the character and a count of how many times it occurs in the text. HuffmanTree: Each node contains the total weight of all characters in the tree, and either a leaf node or a binary node with two subtrees that are Huffman trees. –The contents field of a non-leaf node is never used; we only need the total weight. –compareTo returns its result based on comparing the total weights of the trees. Some Classes used by Huffman

Huffman: Contains main The algorithm: –Count character frequencies and build a list of Leaf nodes containing the characters and their frequencies –Use these nodes to build a sorted list (treated like a priority queue) of single-character Huffman trees –do Take two smallest (in terms of total weight) trees from the sorted list Combine these nodes into a new tree whose total weight is the sum of the weights of the new tree Put this new tree into the sorted list while there is more than one tree left The one remaining tree will be an optimal tree for the entire message Classes used by Huffman, part 2

Leaf node class for Huffman Tree The code on this slide (and the next four slides) produces the output shown on the A Huffman code: HelloGoodbye message slide.

Highlights of the HuffmanTree class

Printing a HuffmanTree

Highlights of Huffman class part 1

Remainder of the main() method

The Huffman code is provably optimal among all single-character codes for a given message. Going farther: –Look for frequently occurring sequences of characters and make codes for them as well. Compression for specialized data (such as sound, pictures, video). –Okay to be "lossy" as long as a person seeing/hearing the decoded version can barely see/hear the difference. Summary