3.3 Fundamentals of data representation

Slides:



Advertisements
Similar presentations
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Advertisements

Binary Trees CSC 220. Your Observations (so far data structures) Array –Unordered Add, delete, search –Ordered Linked List –??
Huffman Encoding Dr. Bernard Chen Ph.D. University of Central Arkansas.
Greedy Algorithms (Huffman Coding)
Lecture 10 : Huffman Encoding Bong-Soo Sohn Assistant Professor School of Computer Science and Engineering Chung-Ang University Lecture notes : courtesy.
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Lecture04 Data Compression.
Huffman Encoding 16-Apr-17.
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
CSCI 3 Chapter 1.8 Data Compression. Chapter 1.8 Data Compression  For the purpose of storing or transferring data, it is often helpful to reduce the.
Text Operations: Coding / Compression Methods. Text Compression Motivation –finding ways to represent the text in fewer bits –reducing costs associated.
A Data Compression Algorithm: Huffman Compression
Is ASCII the only way? For computers to do anything (besides sit on a desk and collect dust) they need two things: 1. PROGRAMS 2. DATA A program is a.
Q&A II – Sunday Feb 13 th 2011 BITS. Signed binary  What are the following numbers in signed binary?     
Lossless Data Compression Using run-length and Huffman Compression pages
Data Compression Basics & Huffman Coding
Data Compression and Huffman Trees (HW 4) Data Structures Fall 2008 Modified by Eugene Weinstein.
1 Lossless Compression Multimedia Systems (Module 2) r Lesson 1: m Minimum Redundancy Coding based on Information Theory: Shannon-Fano Coding Huffman Coding.
Spring 2015 Mathematics in Management Science Binary Linear Codes Two Examples.
Huffman code uses a different number of bits used to encode characters: it uses fewer bits to represent common characters and more bits to represent rare.
x x x 1 =613 Base 10 digits {0...9} Base 10 digits {0...9}
Management Information Systems Lection 06 Archiving information CLARK UNIVERSITY College of Professional and Continuing Education (COPACE)
CSE Lectures 22 – Huffman codes
Data Structures and Algorithms Huffman compression: An Application of Binary Trees and Priority Queues.
Data Structures Arrays both single and multiple dimensions Stacks Queues Trees Linked Lists.
Huffman Encoding Veronica Morales.
1 Analysis of Algorithms Chapter - 08 Data Compression.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
 The amount of data we deal with is getting larger  Not only do larger files require more disk space, they take longer to transmit  Many times files.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
Huffman Coding. Huffman codes can be used to compress information –Like WinZip – although WinZip doesn’t use the Huffman algorithm –JPEGs do use Huffman.
Introduction to Algorithms Chapter 16: Greedy Algorithms.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
Chapter 3 Data Representation. 2 Compressing Files.
1 Algorithms CSCI 235, Fall 2015 Lecture 30 More Greedy Algorithms.
Characters CS240.
Lecture 12 Huffman Algorithm. In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly.
بسم الله الرحمن الرحيم My Project Huffman Code. Introduction Introduction Encoding And Decoding Encoding And Decoding Applications Applications Advantages.
Compression techniques Adaptive and non-adaptive.
Huffman code and Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Lecture on Data Structures(Trees). Prepared by, Jesmin Akhter, Lecturer, IIT,JU 2 Properties of Heaps ◈ Heaps are binary trees that are ordered.
Chapter 3 Data Representation Text Characters
Arithmetic Shifts and Character Representation
Huffman Codes ASCII is a fixed length 7 bit code that uses the same number of bits to define each character regardless of how frequently it occurs. Huffman.
Expression Tree The inner nodes contain operators while leaf nodes contain operands. a c + b g * d e f Start of lecture 25.
HUFFMAN CODES.
Assignment 6: Huffman Code Generation
3.3 Fundamentals of data representation
Data Compression.
Data Compression CS 147 Minh Nguyen.
Data Compression If you’ve ever sent a large file to a friend, you may have compressed it into a zip archive like the one on this slide before doing so.
Chapter 8 – Binary Search Tree
Chapter 9: Huffman Codes
Advanced Algorithms Analysis and Design
Huffman Coding CSE 373 Data Structures.
Communication Technology in a Changing World
Communication Technology in a Changing World
Fundamentals of Data Representation
Huffman Encoding Huffman code is method for the compression for standard text documents. It makes use of a binary tree to develop codes of varying lengths.
Chapter Nine: Data Transmission
Chapter 3 DataStorage Foundations of Computer Science ã Cengage Learning.
Data Structure and Algorithms
File Compression Even though disks have gotten bigger, we are still running short on disk space A common technique is to compress files so that they take.
Huffman Encoding.
Podcast Ch23d Title: Huffman Compression
Algorithms CSCI 235, Spring 2019 Lecture 30 More Greedy Algorithms
Huffman Coding Greedy Algorithm
ASCII LP1.
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Presentation transcript:

3.3 Fundamentals of data representation 3.3.8 Data compression (Huffman coding) Lesson

Objectives Be able to: Analyse a piece of plain ASCII text in order to determine the frequency of specific characters occurring. Transform examples of uncompressed ASCII text into a compressed format using the Huffman coding method.

Objectives Be able to: Convert a specific example of compressed text into ASCII characters using a given Huffman tree. Describe conditions where the Huffman coding method of compression will provide optimal benefits.

Introduction Continuing the theme of data compression… In the previous lesson, we looked at why it is beneficial to be able to compress digital files. We looked at one specific method of carrying out data compression: RLE Today, we are going to look at another method: Huffman coding

How often do different characters occur in text? This method works by recording the frequencies by which characters appear in text. Some characters appear far more often than others. Which do you think occur most frequently? How about less frequently? Try Quiz 1 Optional – demonstrate the different frequencies of different characters using your own document – the longer the better.

Focus on Huffman coding There are two steps when using Huffman’s technique: Create a Huffman tree – this gives each character used a unique code. Encode the character sequence into a binary stream.

Huffman coding – worked example Look at this sentence: an easy example What do you notice? For the sake of simplicity we have chosen an example where there are only letters present and there are no capital letters used or any other punctuation. We must however take note of spaces.

Huffman coding – worked example Now create a table showing the frequency of occurrence for each character. an easy example Character Frequency a 3 p 1 e s l x m y n space 2 Click to reveal Click to reveal We only need to show characters that are actually present.

Huffman coding – worked example Now we need to create the graphical node representation used by the Huffman method. To begin this process, we have to start with the lowest frequencies at the top of the list – the actual alphabetical order does not matter. The Huffman tree generated allows you to assign a unique code to each character used. The wider the range of characters used, the bigger the Huffman tree!

Huffman coding – worked example At the start, the characters are placed in order of occurrence, from top to bottom. Where the frequency is the same, the order does not matter. ( ) indicates the space character. Step 1 The top two entries’ frequencies are added together and a new node is inserted. Please note – for each step you are asked to refer to the separate printed notes for this exercise, the diagrams used here are reproduced from there.

Huffman coding – worked example Step 2 Same as Step 1 with 1 (n) and 1 (p).

Huffman coding – worked example Step 3 Same as Step 2 with 1 (s) and 1 (x).

Huffman coding – worked example Step 4 Look what happens to 1 (y).

Huffman coding – worked example Step 5 The top two nodes combine and move to the end of the list as to their combined value exceeds that of the other elements.

Huffman coding – worked example Step 6 The pattern continues…

Huffman coding – worked example Step 7 More of the same…

Huffman coding – worked example Step 8 The objective is to end up with one single node.

Huffman coding – worked example Step 9 We made it! The value of the left-hand node should be equal to the number of characters in the original string. an easy example = 15 characters Success!

Huffman coding – worked example Step 10 Put ‘1’ on the upward directions and ‘0’ on the downward directions. How would we denote the ‘n’? Down one, up one, up one, up one = 0111

Huffman coding – worked example

Huffman coding – worked example s y x m p l 11 10 001 0100 0110 0001 0101 0111 00001 00000 Full binary representation = 11 0111 001 10 11 00001 0001 001 10 00000 11 0100 0110 0101 10

You must be in possession of the Huffman tree to decode the text! Decoding with the Huffman method – worked example Our encoded text would be received as follows… 110111001101100001000100110000001101000110010110 You must be in possession of the Huffman tree to decode the text! Given that we have seen how to create the Huffman tree which of course gives us the binary encoding scheme specific to the text it is being used to compress, it is of course important to show how this may be used to recover the text from its binary encoding.

Decoding with the Huffman method – worked example First character Start at beginning of string and follow binary until it terminates: 110111001101100001000100110000001101000110010110 Following ‘1’ then ‘1’ takes us to a.

Decoding with the Huffman method – worked example Second character Move to the next set of digits: 110111001101100001000100110000001101000110010110 = an

Decoding with the Huffman method – worked example Third character Continue the previous process – this time we find a space character… 110111001101100001000100110000001101000110010110 = an

Decoding with the Huffman method – worked example Fourth character 110111001101100001000100110000001101000110010110 = an e Continue until no binary digits remain.

Huffman coding – conclusion Look at the table. You can see that the characters at the top generally: appear most frequently have short binary representations. This is the advantage of Huffman coding. l 0101 m 0100 n 0111 p 0110 s 00001 x 00000 y 0001 space 001 a 11 e 10 If conventional ASCII uses eight bits to represent a character. Huffman coding can reduce that to as little as two bits.

Huffman coding – conclusion Note: Huffman coding method will incur some ‘overhead’ when used. What do you think this is? It is necessary to transmit a representation of the encoding ‘tree’ along with the encoded data. Each Huffman tree is unique as it is dependent upon the piece of text which it used to encode. The Huffman method works best when there is a good distribution of frequently occurring ‘common’ characters.

To round things off… Complete Quiz 2 and Quiz 3