Lesson Objectives Aims You should know about: 1.3.1:

Lesson Objectives Aims You should know about: 1.3.1:
(a) Lossy vs Lossless compression. (b) Run length encoding and dictionary coding for lossless compression.

Compression Means: Because: Reducing file size
Used to be down to storage limitations Now is down to a need to make network transfers quicker

Compression The level of compression is measured as a ratio between 0 and 1 1 = original file size 0 = The best compression tool ever – the delete key.

Compression software and algorithms
7ZIP RAR ZIP TAR TAR.GZ …

Example

How compression is achieved
Virtually all data contains “unnecessary” bits By removing redundant information, file size (total number of bits) can be reduced We’re basically chucking bits in the bin, however…

Where can data to lose come from?
Consider a video, 30fps Does the background change? What actually moves in that second of video? It may actually be possible to throw away 90% of the data because it’s duplicated!

Lossy Lossy compression will reduce file size but also some data will be lost The original file/quality cannot be restored The file size/quality is a trade off of acceptability JPG, MP3

An example - Bitmap

File Format Differences
BMP – 644Kb PNG – 9.05Kb (lossless!) JPG – 30.6KB GIF – 7.71Kb

Side By Side BMP JPG

Lossy Compression Why lose some data? Why not lose data?
Can be acceptable – sounds you can’t hear, un-noticable colour loss etc Could provide much smaller file size Why not lose data? When loss would be unacceptable – i.e. text documents or original sound recordings

Lossless Data is compressed but the file can be restored to its original quality Methods: Run length encoding Dictionary coding

Run length encoding There is a lot of repetition in data Such as sequences of pixels in an image that are the same colour Or text which has repeated letters/words.

Run length encoding simply counts them up and replaces them with:
A flag symbol The character/pixel The number of times it’s repeated No data is lost, but file size is reduced No point if the encoded data is the same length as the actual data! No good in situations where there is little repetition (think of a checker board design)

Dictionary Encoding Think of a 5000 word essay. Just think of that. Mmm. Essays. You didn’t use 5000 unique words did you? This means there is clearly repetition

A dictionary of unique elements is built from the document
Then a series of pointers are created to the correct entry in the dictionary Every time a word is repeated, data is saved.

Huffman Encoding Look at the frequency of letter appearance in the English language:

In ASCII or UTF-8 a minimum of 8 bits is used to encode each character, yet this creates a lot of redundant data Huffman encoding rearranges this and assigns codes to the most common letters E = 0

As a result, these characters have the shortest possible codes – 1 bit!
This drastically reduces data use/size

Summary Lossy compression reduces the size of a file but results in data being lost. Lossless compression reduces the file size without losing any data. Run-length encoding replaces a sequence of repeated characters with a flag character, followed by the character itself and the number of times it is repeated. In dictionary coding a dictionary of commonly occurring sequences of characters is created. In the text these sequences are replaced by pointers to the relevant place in the dictionary. The references are shorter than the words they replace, so the file is reduced in size.

Review/Success Criteria

Lesson Objectives Aims You should know about: 1.3.1:

Similar presentations

Presentation on theme: "Lesson Objectives Aims You should know about: 1.3.1:"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lesson Objectives Aims You should know about: 1.3.1:

Similar presentations

Presentation on theme: "Lesson Objectives Aims You should know about: 1.3.1:"— Presentation transcript:

Similar presentations

About project

Feedback