United International University

United International University
Department of Computer Science and Engineering

Presented By

Presenters Team Athena Jannatul Ferdaus Saimoom Safayet Akash
Roushan Ara Hanif Reshma Khatun Muktadir Hossain Sandip Kumar Kabiraz Saimoom Safayet Akash

Lossless Data Compression Huffman And Shannon-Fano
Presentation On Lossless Data Compression Huffman And Shannon-Fano

Data compression In computer science data compression is the technique of reducing the number of binary digits required to represent data.

Why Data Compression?? To save the memory space acquired by the files
To reduce cost by using minimal memory storage devices specially for databases. To handle data more effectively. For faster data transfer. Amazingly effective for web services.

Types of data compression
Lossless Lossy

Lossy Data Compression
“Lossy" compression is a data encoding method that compresses data by losing some of it.

Lossless Data Compression
Lossless data compression is a class of data compression algorithms that allows the exact original data to be reconstructed from the compressed data.

Example of Lossy data compression

Example of Lossless data compression

Lossless Huffman ShanNon-fano

In our today’s presentation we are going to focus on Lossless Compression Method

Creators of Shannon-Fano Coding

Claude Elwood Shannon (1916-2001)
Born : April 30, Petoskey,Michigan,United States. Died : February 24, 2001 (aged 84) Medford, Massachusetts,United States. Residence : United States Nationality : American Fields : Mathmetics and electronic engineering. Institutions : Bell Laboratories Massachusetts Institute of Technology Institute for Advanced Study. Known for : Information Theory,Shannon–Fano coding,Shannon– Hartleylaw,Nyquist Shannon sampling theorem,Noisy channel coding theorem,Shannon switchin game,Shannon number,Shannon index,Shannon's source coding theorem, Shannon's expansion,Shannon-Weaver model of communication,Whittaker–Shannon interpolation formula. Notable awards: IEEE Medal of Honor,Kyoto Prize,Harvey Prize (1972).

Robert Mario Fano Born : 1917 (age 94–95)Turin, Italy
Citizenship : United States Fields : computer science, information theory Institutions : Massachusetts Institute of Technology Known for : Shannon-Fano coding, founder of Project MAC. Thesis : Theoretical Limitations on the Broadband Matching of Arbitrary Impedances (1947). Notable awards : Shannon Award, 1976; IEEE Fellow, 1954.

History of Shannon Fano coding

Shannon-Fano Coding

In the field of data compression, Shannon–Fano coding is a technique for constructing a prefix code based on a set of symbols and their probabilities(estimated or measured). It is suboptimal in the sense that it does not achieve the lowest possible expected code word length like Huffman coding.

Main Features Statistic compression method.
Two-pass compression method. Semi-adaptive compression method. Asymetric compression method. The result compression is worse in comparison with Huffman coding. Optimality of output is not guaranteed.

Frequency Count Sorting Code Generation Code Conversion
We are going to discuss the process of shanon-fano coding in four steps. Frequency Count Sorting Code Generation Code Conversion

Now, It’s time to focus on frequency count.

ENGINEERING Character Count E 3 G 2 R 1 I N

After sorting in descending order
Character Count E 3 N G 2 I R 1

Now Code Generation

Index Characters Count Code 1 E 3 2 N G 4 I 5 R 1 1 1

Index Characters Count Code 1 E 3 2 N G 4 I 5 R 1 1 1 1

Index Characters Count Code 1 E 3 2 N G 4 I 5 R 1 1 1 1 1 1

Index Characters Count Code 1 E 3 2 N G 4 I 5 R 1 1 1 1 1 1 1

Code Replacement ENGINEERING 0001101100100001111100110 Source.txt
Code.txt

Code Conversion -!ó0 Compress.txt Code.txt

Algorithm of Shannon-Fano Coding
For a given list of symbols, develop a corresponding list of probabilities or frequency counts so that each symbol’s relative frequency of occurrence is known. Sort the lists of symbols according to frequency, with the most frequently occurring symbols at the left and the least common at the right. Divide the list into two parts, with the total frequency counts of the left part being as close to the total of the right as possible. The left part of the list is assigned the binary digit 0, and the right part is assigned the digit 1. This means that the codes for the symbols in the first part will all start with 0, and the codes in the second part will all start with 1. Recursively apply the steps 3 and 4 to each of the two halves, subdividing groups and adding bits to the codes until each symbol has become a corresponding code leaf on the tree.

HUFFMAN CODING

David Albert Huffman (August 9, 1925 – October 7, 1999)

Claude Elwood Shannon Robert Mario Fano

Awards and Honors 1955 The Louis E.Levy Medal 1973
The W.Wallace McDowell Award 1981 Charter recipient of the Computer Pioneer Award 1998 A Golden Jubilee Award 1999 The IEEE Richard W.Hamming Medal

Mechanism of Huffman Coding
R I N G Frequency Count Character Count E 3 N 3 G 2 I 2 R 1

HUFFMAN TREE CONSTRUCTION
G:2 I:2 R:1

R:1 G:2 I:2 E:3 N:3 :3

I:2 E:3 N:3 :5 :3 R:1 G:2

E:3 N:3 :5 :6 :3 I:2 R:1 G:2

:11 :5 :6 :3 I:2 E:3 N:3 R:1 G:2

CODE GENERATION 1 1 1 1 E N I R G I 1 R N 1 1 E 1 G 1 1

CODE REPLACEMENT E 10 N 01 I 00 R 010 ENGINEERING G 011 10 01 011 00 01 10 10 010 00 01 011

Decimal value of this binary value
CODE CONVERTION Decimal value of this binary value ASCII Character 1 = 1 133 150 52 NULL 4

Algorithm of Huffman Coding
Create a leaf node for each symbol and add it to frequency of occurrence. While there is more than one node in the queue: Remove the two nodes of lowest probability or frequency from the queue Prepend 0 and 1 respectively to any code already assigned to these nodes Create a new internal node with these two nodes as children and with probability equal to the sum of the two nodes' probabilities. Add the new node to the queue. The remaining node is the root node and the tree is complete.

Shannon-Fano or Huffman?? Which one is better??

Let’s see some practical implementations

Our First Test Case “I failed in some subjects in exam, but my friend passed in all. Now he is an engineer in Microsoft and I am the owner of Microsoft.” Most inspiring words from Bill Gates !!! ;) Let’s see what happen if we apply huffman and shannon fano .

Huffman Shannon-Fano Data compression-41% Characters before compression:132 After Compression:78 Data compression-47% Characters before compression:132 After Compression:70

‘Merry had a little lamp’
Second Test Case ‘Merry had a little lamp’

Huffman Shannon-Fano Characters before compression: 23 After Compression : 11 Data compression- 53% Characters before compression: 23 After Compression : 12 Data compression- 48%

We see that for small amount of texts, the compression rates of both algorithm is not quite identically different

What will happen if we take much bigger text files
Like a novel !!! ?

Third Test Case Othello Shakespeare

Huffman Shannon-Fano Characters before compression: After Compression : Data compression- 34% Characters before compression: After Compression : Data compression- 17%

‘ A Study in Scarlet’ Sir Arthur Conan Doyle
Fourth Test Case ‘ A Study in Scarlet’ Sir Arthur Conan Doyle

Huffman Shannon-Fano Characters before compression:
After Compression : Data compression- 30% Characters before compression: After Compression : Data compression-20%

On the Origin of Species Charles Darwin
Fifth Test Case On the Origin of Species Charles Darwin

After Compression : Data compression- 32% Characters before compression: After Compression : Data compression- 18%

Sixth Test Case Politics Aristotle

After Compression : Data compression- 32% Characters before compression: After Compression: Data compression- 21%

7th Test Case Massive Letter Repeatation
ABASSAMM MAMMSSS BBBBAAAA MMMSSSS IIIISSS

41 After Compression : 13 Data compression- 69% Characters before compression: 41 After Compression : 14 Data compression- 66%

At a glance

United International University

Similar presentations

Presentation on theme: "United International University"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

United International University

Similar presentations

Presentation on theme: "United International University"— Presentation transcript:

Similar presentations

About project

Feedback