Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multimedia – Data Compression

Similar presentations


Presentation on theme: "Multimedia – Data Compression"— Presentation transcript:

1 Multimedia – Data Compression
Dr. Lina A. Nimri Lebanese University Faculty of Economic Sciences and Business Administration 1st branch

2 Why compress data? Nowadays, the computing power of processors increases more quickly than storage capacities, and is much faster than network band-widths, because this requires enormous changes in the telecommunication infrastructures. Thus, to compensate for this, it is usual to rather reduce the size of the data by exploiting the computing power of the processors rather than by increasing storage and data transmission capacities.

3 What is data compression?
Compression consists in reducing the physical size of information blocks. A compressor uses an algorithm which is used to optimize the data by using suitable considerations for the type as data to be compressed; A decompressor is thus necessary to reconstruct the original data using an algorithm that is the opposite to that used for compression. The compression method depends essentially on the type of data to be compressed: an image will not be compressed in the same way as an audio file?

4 What is data compression?
A compression that does not lead to loss of information is lossless A compression that leads to loss of information is lossy Compression can be defined by the compression factor, that is, the number of bits in the compressed image divided by the number of bits in the original image. The compression ratio, which is often used, is the inverse of the compression factor; it is usually expressed as a percentage. Finally, the compression gain, also expressed as a percentage, is equivalent to 1 minus the compression factor:

5 Types of compressions and methods
Physical and logical Compression Physical compression acts directly on the data; it is thus a question of storing the redundant data from one bit pattern to another. Logical compression on the other hand is carried out by a logical reasoning, substituting this information with equivalent information.

6 Types of compressions and methods
Symmetrical and Asymmetrical Compression In the case of symmetrical compression, the same method is used to compress and to decompress the data. The same amount of work is thus needed for each of these operations. It is this type of compression which is generally used in data transmission. Asymmetrical compression requires more work to be done for one of the two operations, it is usual to seek algorithms for which compression is slower than decompression. Algorithms that perform compression faster than decompression may be necessary in the case of data files which are seldom accessed (for security reasons for example), because this creates compact files.

7 Lossy compression Lossy compression, as opposed to lossless compression, eliminates some information in order to achieve the best possible compression ratio, while keeping a result which is as close as possible to the original data. It is the case, for example, of certain image or sound compressions, such as MP3 format. Since this type of compression removes information contained in the data that is to be compressed, it is usual to speak of irreversible compression methods. Executable files, for example, cannot be compressed using this method, because they particularly need to preserve their integrity in order to be able to run. Indeed, it is not conceivable to roughly reconstruct a program by omitting bits and then adding some. On the other hand, multimedia data (audio, video) can tolerate a certain level of degradation without the sensory organs (eye, tympanum, etc) distinguishing any significant degradation.

8 Adaptive, semi-adaptive and non-adaptive encoding
Certain compression algorithms are based on dictionaries that are for a specific type of data: these are non-adaptive encoders. The occurrence of letters in a text file, for example, depends on the language in which it is written. An adaptive encoder adapts to the data which it will have to compress, it does not start out with an already prepared dictionary for a given type of data. A semi-adaptive encoder will build a dictionary according to the data to be compressed, it builds the dictionary by going through the file and then compresses the latter.

9 RLE Compression The RLE compression method (Run Length Encoding, sometimes written as RLC for Run Length Coding) It is used by many image formats (BMP, PCX, TIFF). It is based on the repetition of consecutive elements.

10 RLE Compression: basics
The basic principle consists in coding a first element by giving the number of repetitions of a value and then the value to be repeated. Thus, according to this principle, chain “AAAAAHHHHHHHHHHHHHH” when compressed yields “5A14H”. The compression gain is thus (19-5) /19, that is, approximately 73.7% On the other hand, for the chain “CORRECTLY”, where there is little character repetition, the result of the compression is “1C1O2R1E1C1T1L1Y”, thus compression proves to be very expensive here, with a negative compression gain of (9-16)/9 that is, -78%!

11 RLE Compression: compression rules
Actually, RLE compression is governed by particular rules which allow compression to be carried out when necessary and the chain to be left as it is when compression causes a waste. These rules are the following: If three or more elements are repeated consecutively, the RLE compression method is used If not, a control character (00) is inserted, followed by the number of elements of the non-compressed chain and then the latter If the number of elements of the chain is odd, the control character (00) is added on the end Finally, specific control characters were defined in order to code: an end of line (00 01) the end of the image (00 00) a pointer displacement over the image of XX columns and YY rows in the reading direction (00 02 XX YY).

12 RLE Compression: compression rules
Thus, RLE compression makes no sense except for data with many consecutive repeated elements, in particular images with large uniform areas. This method however has the advantage of not being very difficult to implement. There are alternatives in which the image is encoded by blocks of pixels, in rows, or even in zigzag.

13 RLE Compression: compression rules

14 Huffman coding In 1952, David Huffman proposed a statistical method
It allows a binary code word to be assigned to the various symbols to be compressed (pixels or characters for example). The length of each code word is not identical for all the symbols: the most frequent symbols (those which appear most often) are coded with short code words, while the most uncommon symbols receive longer binary codes. The expression Variable Length Code (VLC) is used to indicate this type of coding because no code is the prefix of another. Thus, the final succession of coded words with variable lengths will be on average smaller than that obtained with a constant length coding.

15 Huffman coding: algorithm
a bottom-up approach 1. Initialization: Put all symbols on a list sorted according to their frequency counts. 2. Repeat until the list has only one symbol left: (1) From the list pick two symbols with the lowest frequency counts. Form a Huffman subtree that has these two symbols as child nodes and create a parent node. (2) Assign the sum of the children's frequency counts to the parent and insert it into the list such that the order is maintained. (3) Delete the children from the list. 3. Assign a codeword for each leaf based on the path from the root.

16 Huffman coding The Huffman coder creates an ordered tree from all the symbols and their frequency of appearance. The branches are built recursively starting with the least frequent symbols. Ex: the word “wikipedia” The frequencies of letters: In binary : , 24 bits instead of 72 using ASCII code. i a d e k p w 3 1

17 Huffman coding Consider the following sentence: "COMMENT_CA_MARCHE".
M A C E _ H O N T R This is the corresponding tree: The construction of the tree is done by initially ordering the symbols by frequency of appearance. The two symbols with the lowest appearance frequency are Successively removed from the list and are attached to a node whose weight is equal to the sum of the frequencies of the two symbols. The symbol with the least weight is assigned to branch 1, the other to branch 0 and so on, by considering each node formed as a new symbol, until only one parent node called the root is obtained. The code for each symbol corresponds to the succession of codes along the way starting from this character to the root. Thus, the “deeper” inside the tree the symbol is, the longer the code word will be. Consider the following sentence: "COMMENT_CA_MARCHE". The following are the appearance frequencies for the letters: M A C E _ H O N T R This is the corresponding tree: The corresponding codes for each character are such, that the codes for the most frequent characters are short and those corresponding to the least frequent symbols are long: Compressions based on this type of coding yield good compression ratios, particularly for monochrome images (faxes for example). It is particularly used in the T4 and T5 recommendations used in ITU-T

18 Huffman coding "this is an example of a Huffman tree"

19 Huffman coding Coding Tree for “HELLO” using the Huffman Algorithm
In Fig. 7.5, new symbols P1, P2, P3 are created to refer to the parent nodes in the Huffman coding tree. The contents in the list are illustrated below: After initialization: L H E O After iteration (a): L P1 H After iteration (b): L P2 After iteration (c): P3

20 Huffman coding: variants
There are three variants of Huffman algorithm, each of which defines a method for tree creation static : each byte has a predefined code by the software. The tree doesn’t need to be transmitted for decompression The compression is not standard for all types of files ( e.g. a text in French, in which, the frequency of occurrence of “e” is very high; this latter should have a very short code)

21 Static Huffman Coding Uses codes predefined according to the frequencies of letters in a given language. Example: Z K M C U D L E 2 7 24 32 37 42 120

22 Static Huffman Coding Leaves

23 Static Huffman Coding

24 Static Huffman Coding Decode 1011001110111101 Letter Freq Code Bits C
32 1110 4 D 42 101 3 E 120 1 K 7 6 L M 20 5 U 37 Z 2 C: 1110 D: 101 E: 0 K: L: 110 M: 11111 U: 100 Z: Decode

25 Static Huffman Coding Coder :101 0 0 101 Decoder: 1011001110111101
Cost in bits per letter: 2.57 = (1 * * * * * 9)/ 306 785/306 = 2.57 Code for DEED: Decode of : DUCK Expected cost: (1 * * * * * 9)/ 306 = 785/306 = 2.57 number of digits per letter * appearance probability in a message of 306 character Start decoding from left to right see p.183

26 Huffman coding: variants
semi-adaptive : The file is first read to calculate the occurrences of each byte. After, the tree is constructed according to the weight of each byte. This tree remains the same till the end of the compression It will be necessary to transmit the tree to decompress the file adaptive : This method provides the best rates of compression, because the tree is constructed dynamically as far as the flow compression is going on But, this method has a main disadvantage consisting of continuously modifying the tree, which implies a very long time of compression On the other hand, the compression is always optimal and the file type should not be known before the compression, and also the file should not be read previous to compression. You should not transmit or store the table of frequencies of symbols

27 Adaptive Human Coding statistics are gathered and up-dated dynamically as the data stream arrives.

28 Adaptive Huffman Coding
Initial code: assigns symbols with some initially agreed codes, without any prior knowledge of the frequency counts. update tree: constructs an Adaptive Huffman tree. It basically does two things: (a) increments the frequency counts for the symbols (including any new ones). (b) updates the configuration of the tree. The encoder and decoder must use exactly the same initial code and update tree routines.

29 Adaptive Huffman Coding
Nodes are numbered in order from left to right, bottom to top. The numbers in parentheses indicate the count. The tree must always maintain its sibling property, i.e., all nodes (internal and leaf) are arranged in the order of increasing counts. If the sibling property is about to be violated, a swap procedure is invoked to update the tree by rearranging the nodes. When a swap is necessary, the farthest node with count N is swapped with the node whose count has just been increased to N + 1.

30 Adaptive Huffman Coding

31 Adaptive Huffman Coding: example
This is to clearly illustrate more implementation details. We show exactly what bits are sent, as opposed to simply stating how the tree is updated. An additional rule: if any character/symbol is to be sent the first time, it must be preceded by a special symbol, NEW. The initial code for NEW is 0. The count for NEW is always kept as 0 (the count is never increased); Initial code assignment for AADCCDD using adaptive Human coding.

32 Tree Construction AADCCDD

33 Tree Construction

34 Adaptive Huffman Coding
Sequence of symbols and codes sent to the decoder: It is important to emphasize that the code for a particular symbol changes during the adaptive Human coding process. For example, after AADCCDD, when the character D overtakes A as the most frequent symbol, its code changes from 101 to 0.

35 Huffman Coding: properties
Unique Prefix Property: No Huffman code is a prefix of any other Huffman code precludes any ambiguity in decoding. Optimality: minimum redundancy code - proved optimal for a given data model (i.e., a given, accurate, probability distribution): The two least frequent symbols will have the same length for their Huffman codes, differing only at the last bit. Symbols that occur more frequently will have shorter Huffman codes than symbols that occur less frequently.

36 Huffman Coding: utilisation
The coding is independent of the compressed data. It simply codes a sequence of bits in the most compact form. Almost all compressors uses this coding as a second stage of compression  re-compression by Huffman what was compressed using another technique. This is the case of: JPEG, MPEG, Gzip and Winzip

37 Dictionary-based Coding
History Abraham Lempel and Jakob Ziv are the creators of the LZ77 compressor, invented in This compressor was then used for filing (ZIP, ARJ and LHA formats use it). In 1978 they created the LZ78 compressor specialized in image compression (or that of any binary file). In 1984, Terry Welch of Unisys modified it, in order to use it in hard drive controllers; his surname initial was thus added to the LZ abbreviation yielding LZW. LZW is a very fast algorithm both for compression and for decompression, based on the occurrence multiplicity of character sequences in the string to be encoded. Its principle consists in substituting patterns with an index code, by progressively building a dictionary.

38 Dictionary-based Coding
LZW works on bits and not on bytes, thus, it does not depend on the way in which the processor codes information. It is one of the most popular algorithms and is particularly used in TIFF and GIF formats. LZW compression method is the copyright-free LZ77 algorithm which is used in PNG images. LZW uses fixed-length codes to represent variable- length strings of symbols/characters that commonly occur together, e.g., words in English text. the LZW encoder and decoder build up the same dictionary dynamically while receiving the data. LZW places longer and longer repeated entries into a dictionary, and then emits the code for an element, rather than the string itself, if the element has already been placed in the dictionary.

39 LZW Compression: algorithm
Construction of the dictionary: The dictionary is initialized with the 256 values of the ASCII table. The file to be compressed is split into strings of bytes, each of these strings is compared with the dictionary and is added if not found there. BEGIN s = next input character; while not EOF { c = next input character; if s + c exists in the dictionary s = s + c; else { output the code for s; add string s + c to the dictionary with a new code; s = c; } output the code for s; END

40 LZW compression for string: “ABABBABCABABBA”
Let's start with a very simple dictionary (also referred to as a “string table"), initially containing only 3 characters, with codes as follows: code string 1 A 2 B 3 C Now if the input string is “ABABBABCABABBA", the LZW compression algorithm works as follows:

41 LZW compression for string: “ABABBABCABABBA”
s c output code string 1 A 2 B 3 C A B AB B A BA A B AB B ABB B A BA B BAB B C BC C A CA AB A ABA AB B ABB A ABBA A EOF 1 The output codes are: Instead of sending 14 characters, only 9 codes need to be sent (compression ratio = 14/9 = 1.56).

42 LZW Decompression: algorithm (simple version)
During decompression, the algorithm rebuilds the dictionary in the opposite direction; it thus does not need to be stored. BEGIN s = NIL; while not EOF { k = next input code; entry = dictionary entry for k; output entry; if (s != NIL) add string s + entry[0] to dictionary with a new code; s = entry; } END

43 LZW decompression for string: “ABABBABCABABBA"
Input codes to the decoder are The initial string table is identical to what is used by the encoder. s k entry/output code string 1 A 2 B 3 C NIL 1 A A 2 B 4 AB B 4 AB 5 BA AB 5 BA 6 ABB BA 2 B 7 BAB B 3 C 8 BC C 4 AB 9 CA AB 6 ABB ABA ABB 1 A ABBA A EOF Apparently, the output string is “ABABBABCABABBA", a truly lossless result!


Download ppt "Multimedia – Data Compression"

Similar presentations


Ads by Google