ISNE101 – Introduction to Information Systems and Network Engineering

Slides:



Advertisements
Similar presentations
Functional Programming Lecture 15 - Case Study: Huffman Codes.
Advertisements

Lecture 4 (week 2) Source Coding and Compression
Applied Algorithmics - week7
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
SIMS-201 Compressing Information. 2  Overview Chapter 7: Compression Introduction Entropy Huffman coding Universal coding.
Huffman Encoding Dr. Bernard Chen Ph.D. University of Central Arkansas.
Lecture 10 : Huffman Encoding Bong-Soo Sohn Assistant Professor School of Computer Science and Engineering Chung-Ang University Lecture notes : courtesy.
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Lecture04 Data Compression.
Huffman Encoding 16-Apr-17.
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
A Data Compression Algorithm: Huffman Compression
Data Structures – LECTURE 10 Huffman coding
Chapter 9: Huffman Codes
Fundamentals of Multimedia Chapter 7 Lossless Compression Algorithms Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
Lossless Data Compression Using run-length and Huffman Compression pages
x x x 1 =613 Base 10 digits {0...9} Base 10 digits {0...9}
Data Structures and Algorithms Huffman compression: An Application of Binary Trees and Priority Queues.
Algorithm Design & Analysis – CS632 Group Project Group Members Bijay Nepal James Hansen-Quartey Winter
Huffman Codes. Encoding messages  Encode a message composed of a string of characters  Codes used by computer systems  ASCII uses 8 bits per character.
Huffman Encoding Veronica Morales.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
CS-2852 Data Structures LECTURE 13B Andrew J. Wozniewicz Image copyright © 2010 andyjphoto.com.
Data Structures Week 6: Assignment #2 Problem
 The amount of data we deal with is getting larger  Not only do larger files require more disk space, they take longer to transmit  Many times files.
Communication Technology in a Changing World Week 2.
Data Structures and Algorithms Lecture (BinaryTrees) Instructor: Quratulain.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
ICS 220 – Data Structures and Algorithms Lecture 11 Dr. Ken Cosh.
ALGORITHMS FOR ISNE DR. KENNETH COSH WEEK 13.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
Introduction to Algorithms Chapter 16: Greedy Algorithms.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
Bahareh Sarrafzadeh 6111 Fall 2009
1 Algorithms CSCI 235, Fall 2015 Lecture 30 More Greedy Algorithms.
Lossless Decomposition and Huffman Codes Sophia Soohoo CS 157B.
1 Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004.
1 Huffman Codes. 2 ASCII use same size encoding for all characters. Variable length codes can produce shorter messages than fixed length codes Huffman.
Chapter 7 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information Theory 7.3 Run-Length Coding 7.4 Variable-Length Coding (VLC) 7.5.
Lecture 12 Huffman Algorithm. In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly.
CSE 143 Lecture 22 Huffman slides created by Ethan Apter
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
Compression and Huffman Coding. Compression Reducing the memory required to store some information. Lossless compression vs lossy compression Lossless.
Design & Analysis of Algorithm Huffman Coding
Huffman Codes ASCII is a fixed length 7 bit code that uses the same number of bits to define each character regardless of how frequently it occurs. Huffman.
HUFFMAN CODES.
COMP261 Lecture 22 Data Compression 2.
Assignment 6: Huffman Code Generation
Algorithms for iSNE Dr. Kenneth Cosh Week 13.
Data Compression If you’ve ever sent a large file to a friend, you may have compressed it into a zip archive like the one on this slide before doing so.
Chapter 8 – Binary Search Tree
The Huffman Algorithm We use Huffman algorithm to encode a long message as a long bit string - by assigning a bit string code to each symbol of the alphabet.
Chapter 9: Huffman Codes
Huffman Coding.
Advanced Algorithms Analysis and Design
Chapter 11 Data Compression
Huffman Coding CSE 373 Data Structures.
Communication Technology in a Changing World
Communication Technology in a Changing World
Huffman Encoding Huffman code is method for the compression for standard text documents. It makes use of a binary tree to develop codes of varying lengths.
Trees Addenda.
Data Structure and Algorithms
Huffman Encoding.
Podcast Ch23d Title: Huffman Compression
Algorithms CSCI 235, Spring 2019 Lecture 30 More Greedy Algorithms
Huffman Coding Greedy Algorithm
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Analysis of Algorithms CS 477/677
Presentation transcript:

ISNE101 – Introduction to Information Systems and Network Engineering Week 2

Recap Counting in Base 2 - Binary Introduction to Encoding Morse Code     Bits Bytes and Binary Coding

Remember Morse? E is ".", T is "-", but Q is "--.-" Common letters have a short (quick!) code, while longer letters have a longer code. •All symbols mi forming the set M, have probabilities of occurrence P(mi) such that P(mi) + … + P(mn) =1 Infrequently occurring symbols can be assigned a long code word, while short code words are reserved for frequent symbols.

Each codeword corresponds to exactly one symbol. Encoding Objectives Each codeword corresponds to exactly one symbol. Decoding should not require any look ahead. –This is known as the ‘prefix’ property.

Prefix Property •Symbols: A, B, C •Codes: 1, 2, 12 •Message: 12 •Is it ‘AB’? Is it ‘C’? In Morse code, how do we know "--.-" is Q and not "TTET"?

Prefix Property •Symbols: A, B, C, •Codes: 1, 22, 12 •Message: 1222 •Read in 1, is it an A?  •Read in 2, was it a C? •Read in 2, Should it be AB? •Read in 2, Ah, finally we can assume it was CB.

Code Optimisation The length of a code for one symbol should not exceed the length of a less likely symbol; –            if P(mi)< P(mj) then L(mi) < L(mj) There should be no unused short codes, either as stand alone encodings or as prefixs for longer codes. –            01, 000, 001, 100, 101 is not ideal as 11 is not used. 

Huffman Coding Huffman coding is a method for choosing a representation for each symbol, resulting in a prefix-free code –The bit string representing some particular symbol is never a prefix of the bit string representing any other symbol The most common characters are expressed using shorter strings of bits than are used for less common symbols.  

Huffman Coding Huffman creates a "Heap" based on the frequencies of each symbol. What is a "Heap"?     A heap is a special kind of Binary Tree! Great! - What is a "Binary Tree"?     It's a tree where each node has at most 2 children... Hmmm... - What is a "Tree"?     OK, lets simplify!

A Tree

A Binary Tree A Tree where each node has 0,1 or 2 children.

A Heap A Binary Tree where the root node has the highest value, and every parent's value is greater than their children. 12 4 8 3 1

Huffman Coding Begins by constructing a Heap based on the frequencies of each member of the set to be encoded. Each member is a leaf node, with parent nodes being the sum of their children. •Take the set (with corresponding occurrence frequencies out of 120); • A(10) B(15) C(5) D(15) E(20) F(5) G(15) H(30) I(5)

Huffman's Heap

Huffman Coding Each letter's code is then read based on its position from the root - 0 for left, 1 for right. A = 000 B = 010 C = 0010 D = 011 E = 111 F = 00110 G = 110 H = 10 I = 00111

Creating the Heap? Based on frequencies, such as in the British National Corpus? Based on frequencies within the specified text (or image etc.)     Standard Approach to Huffman What if we don't know the frequencies?     Adaptive Huffman