compress! From theoretical viewpoint...

Name: compress! From theoretical viewpoint...
Uploaded: 2017-08-13T19:46:16+00:00
Duration: PTM26S53
Channel: Bryce Davis
Description: compress! From theoretical viewpoint...

compress! From theoretical viewpoint...
block Huffman codes achieve the best efficiency. A B prob. 0.8 0.2 cdwd 1 AAA AAB ABA ABB BAA BAB BBA BBB prob. 0.512 0.128 0.032 0.008 cdwd 100 101 11100 110 11101 11110 11111 𝐿 1 =1.0 for one symbol 𝐿 3 =2.184 for three symbols lim 𝑛→∞ 𝐿 𝑛 /𝑛→𝐻(𝑋) 𝐿 3 /3=0.728 for one symbol

problem of block Huffman codes
From practical viewpoint... block Huffman codes have some problems: a large table is needed for the encoding/decoding  run-length Huffman code  arithmetic code probabilities must be known in advance  Lempel-Ziv codes AAA AAB ABA ABB BAA BAB BBA BBB prob. 0.512 0.128 0.032 0.008 cdwd 100 101 11100 110 11101 11110 11111 three coding techniques

run-length Huffman code
1/3 run-length Huffman code a coding scheme which is good for “biased” sequences we focus binary information source alphabet = {𝐴,𝐵}, with 𝑃 𝐴 ≫𝑃(𝐵) data compression in the facsimile system

run and run-length run = a sequence of consecutive identical symbol
A B B A A A A A B A A A B run of length = 1 run of length = 5 run of length = 3 run of length = 0 of A The message is recovered if the lengths of runs are given.  encode the length of runs, not the pattern itself

upper-bound the run-length
small problem? ... there can be very, very, very long run  put an upper-bound limit : run-length limited (RLL) coding upper-bound = 3 run length 1 2 3 4 5 6 7 : representation 1 2 3+0 3+1 3+2 3+3+0 3+3+1 : ABBAAAAABAAAB = one “A” followed by B zero “A” followed by B three or more “A”s two “A”s followed by B

run-length Huffman code
... is a Huffman code defined to encode the length or runs effective when there is bias of symbol probabilities p(A) = 0.9, p(B) = 0.1 run length 1 2 3 or more block pattern B AB AAB AAA prob. 0.1 0.09 0.081 0.729 codeword 10 110 111 ABBAAAAABAAAB: 1, 0, 3+, 2, 3+, 0 ⇒ AAAABAAAAABAAB: 3+, 1, 3+, 2, 2 ⇒ AAABAAAAAAAAB: 3+, 0, 3+, 3+, 2 ⇒

comparison P(A) = 0.9, p(B) = 0.1
the entropy of X: H(X) = –0.9log20.9 – 0.1log20.1=0.469 bit code 1: a naive Huffman code average codeword length = 1 symbol A B prob. 0.9 0.1 codeword 1 code 2: blocked (3bit) average codeword length = 1.661/3symbols = 0.55/symbol AAA AAB ABA ABB 0.729 0.081 0.009 100 110 1010 BAA BAB BBA BBB 0.081 0.009 1110 1011 11110 11111

comparison (cnt’d) code 3: run-length Huffman (upper-bound = 8) length
1 2 3 prob. 0.1 0.09 0.081 0.073 codeword 110 1000 1001 1010 4 5 6 7+ 0.066 0.059 0.053 0.478 1011 1110 1111 consider typical 𝑛 runs... before: 0.1𝑛×1 +⋯ 𝑛×7 = 5.215𝑛; A or Bs after: 0.1𝑛×3 +⋯ 𝑛×1 = 2.466𝑛 ; 0 or 1s the average codeword length per symbol = / = 0.47 RLL is a small trick, but it fully utilizes Huffman coding technique

2/3 arithmetic code a coding scheme which does not use the translation table table-lookup is replaced by “on-the-fly” computation translation table is not needed slightly complicated computation is needed It is proved that its average codeword length →𝐻(𝑋) arithmetic codes a coding scheme which is advantageous for the implementation

preliminary 𝑛-th order extended source with 𝑃 𝐴 =𝑝, 𝑃 𝐵 =1−𝑝
we encode one of 2 𝑛 patterns in 𝐴,𝐵 𝑛 𝑝 = 0.7, 𝑛 = 3: 8 data patterns 𝑤𝟎,…, 𝑤7 in the dictionary order 𝑃(𝑤𝑖) :prob. that 𝑤𝑖 occurs 𝐴(𝑤𝑖) :accumulation of probs. 𝐴 𝑤 𝑖 = 𝑗=0 𝑖−1 𝑃 𝑤 𝑗 =𝐴 𝑤 𝑖−1 +𝑃 𝑤 𝑖−1 # 1 2 3 4 5 6 7 𝑤𝑖 AAA AAB ABA ABB BAA BAB BBA BBB 𝑃(𝑤𝑖) 0.343 0.147 0.063 0.027 𝐴(𝑤𝑖) 0.343 0.490 0.637 0.700 0.847 0.910 0.973 For simplicity, we consider a binary case with A and B occur with probabilities p and 1-p, respectively. Also assume that we are going to encode a sequence with n messages (the number of possible message sequence is 2^n in total). If p = 0.7 and n = 3, we have 2^3 = 8 sequences which can occur. Now order the sequences, for example in the dictionary order, and name them as w_0, ..., w_7. We write S(w_i) for the probability that w_i occurs, and write L(w_i) for the probability that w_j with j < i occur. Thus L(w_i) is the sum of S(w_j) with j < i, and we have L(w_i) = L(w_{i-1}) + S(w_{i-1}). ↑ accumulation of 𝑃 before 𝑤 𝑖

illustration of probabilities
the 8 data patterns define a partition of the interval [0, 1]; 0.5 1.0 AAA AAB ABA ABB BAA BAB BBA BBB 0.343 0.147 0.147 0.063 0.147 0.063 0.063 0.027 # 1 2 3 4 5 6 7 𝑤𝑖 AAA AAB ABA ABB BAA BAB BBA BBB 𝑃(𝑤𝑖) 0.343 0.147 0.063 0.027 𝐴(𝑤𝑖) 0.490 0.637 0.700 0.847 0.910 0.973 A(ABB) A(BAA) = A(ABB)+P(ABB) 𝑤𝑖 occupies the interval 𝐼 𝑖 = 𝐴 𝑤 𝑖 , 𝐴 𝑤 𝑖 +𝑃( 𝑤 𝑖 ) basic idea: represent 𝑤𝑖 by a value 𝑥∈ 𝐼 𝑖 problem to solve: need a translation between 𝑤𝑖 and 𝑥 ↑ ↑ size & left-end of the interval

about the translation P(wA) P(wB) P(w) A(w) A(wA) A(wB)
two directions of the translation: [encode] the translation from 𝑤𝑖 to 𝑥 [decode] the translation from 𝑥 to 𝑤𝑖 ...use recursive computation instead of a static table 0.343 0.147 0.063 0.027 AAA AAB ABA ABB BAA BAB BBB AA AB BA A B  “a land of a parent is divided & inherited to two children”

[encode] the translation from 𝑤𝑖 to 𝑥
recursively determine 𝑃( ) and 𝐴( ) for prefixes of 𝑤 𝑖 𝑃() = 1, 𝐴() = 0（ is a null string） for 𝑤𝐴, 𝑃(𝑤𝐴)=𝑃(𝑤)𝑝, 𝐴(𝑤𝐴)=𝐴(𝑤) for 𝑤𝐵, 𝑃 𝑤𝐵 =𝑃 𝑤 1 – 𝑝 , 𝐴(𝑤𝐵)=𝐴(𝑤)+𝑃(𝑤)𝑝 - determine the subsection which corresponds to a given sequence of messages traverse the tree on-the-fly with computing values of S( ) and L( ). At the beginning, we are at the root node epsilon (e), with S(e) = 1 and L(e) = 0 (intuitively, the size of the considered section is S(e) = 1 and the left-end of the section is L(e) = 0, meaning that we have not yet partitioned the section [0, 1]). The S( ) and L( ) values of the subsequence nodes are computed when they are needed. The equations in the gray box show the rule for the computation. Again remind that S( ) is the size of the investigated section and L( ) is the left-end of the section. The tree in the bottom shows the computation example for the sequence ABB/ the interval of ABB? 𝑃(𝑤𝐴) 𝑃(𝑤𝐵) 𝑃(𝑤) 𝐴(𝑤) 𝐴(𝑤𝐴) 𝐴(𝑤𝐵) 𝑃() = 1 𝐴() = 0  𝑃(𝐴)=0.7 𝐴(𝐴) = 0 A B AA AB 𝑃(𝐴𝐵)=0.21 𝐴(𝐴𝐵) = 0.49 ABB inherits [0.637, ) ABA ABB 𝑃(𝐴𝐵𝐵)=0.063 𝐴(𝐴𝐵𝐵) = 0.637 𝑝 = 𝑃(𝐴) = 0.7

[encode] the translation from 𝑤𝑖 to 𝑥 (cnt’d)
We know the interval 𝐼𝑖; which of 𝑥∈ 𝐼 𝑖 should we choose? 𝑥 should have the shortest binary representation choose 𝑥=𝐴(𝑤𝑖)+𝑃(𝑤𝑖) but trim at ⌈–log2𝑃 𝑤𝑗 ⌉ places We next consider the representation of x. In the encoding procedure, we determine the section which corresponds to the sequence of messages which are to be encoded. However, the section contains infinitely many points. Which points should be chosen as x? Of course, we would like to choose x so that the representation of x is as short as possible. We would like to choose x which has the smallest representation (in binary) within the section. For this sake, we can choose \lceil –log S(w_j) \rceil bits of the decimal fraction of the binary representation of L(w_{j+1}) as x. Note that L(w_j) and L(w_{j+1}) are differ at the \lceil –log S(w_j) \rceil –th bits, and the above choice allow us distinguish L(w_j) and L(w_{j+1}). The average codeword length of the arithmetic code is approximately evaluaetd as the equation, and it can achieve almost the same efficiency as the Huffman codes. 𝐴(𝑤𝑖) + 𝑃(𝑤𝑖) 𝐴( 𝑤 𝑖+1 ) 0.aa...aaa...a b...b 0.aa...acc...c 0.aa...ac0...0 the length of 𝑥 ≈ – log2𝑃(𝑤𝑖) 𝑥 = most significant non-zero place of 𝑃(𝑤𝑗) ⌈–log2𝑃 𝑤𝑗 ⌉ 0.aa...ac almost ideal! 0.aa...aaa...a 0.aa...acc...c

choice of 𝑥 (sketch in decimal notation)
Find 𝑥∈[ , ) that is the shortest in decimal. 0.1265 0.126 0.12 round off some digits of , but not too many... −) # of fraction places that 𝑥 must have = the most significant nonzero place of − = log 10 (the size of the interval)

[decode] the translation from 𝑥 to 𝑤𝑖
given 𝑥, determine the leaf node whose interval contains 𝑥 almost the same as the first half of the encoding translation compute, compare, and move to the left or right 𝑃(𝑤𝐴) 𝑃(𝑤𝐵) 𝑃(𝑤) 𝐴(𝑤) 𝐴(𝑤𝐴) 𝐴(𝑤𝐵) 𝑥 = 0.600 𝑃() = 1 𝐴() = 0  𝑃(𝐴)=0.7 𝐴(𝐴) = 0 A B 𝐴(𝐵) = 0.7 AA AB 𝑃(𝐴𝐵)=0.21 𝐴(𝐴𝐵) = 0.49 threshold value ABA ABB 𝑃(𝐴𝐵𝐴)=0.147 𝐴(𝐴𝐵𝐴) = 0.49 𝐴(𝐴𝐵𝐵) = 0.637 0.600 is contained in the interval of ABA...decoding completed

performance, summary an 𝑛-symbol pattern 𝑤 with probability 𝑃(𝑤)
 encoded to a codeword with length − log 2 𝑃( 𝑤 𝑖 ) the average codeword length per symbol is 1 𝑛 𝑤∈ 𝑉 𝑛 𝑃(𝑤) − log 2 𝑃(𝑤) ≈ 1 𝑛 𝑤∈ 𝑉 𝑛 −𝑃 𝑤 log 2 𝑃 𝑤 =𝐻(𝑋) almost optimum coding without using a translation table however... we need much computation with good precision ( use approximation?)

3/3 Lempel-Ziv codes a coding scheme which does not need probability distribution the encoder learns the statistical behavior of the source the translation table is constructed in an adaptive manner works finely even for information sources with memory Lempel-Ziv code a coding in which we don’t have to know the probability of the messages in advance

probability in advance?
so far, we assumed that the probabilities of symbols are known... in the real world... the symbol probabilities are often not known in advance scan the data twice? first scan...count the number of symbol occurrences second scan...Huffman coding delay of the encoding operation... overhead to transmit the translation table...

Lempel-Ziv algorithms
for information sources whose symbol probability is not known... LZ77 lha, gzip, zip, zoo, etc. LZ78 compress, arc, stuffit, etc. LZW GIF, TIFF, etc. work fine for any information sources  universal coding

LZ77 L proposed by A. Lempel and J. Ziv in 1977
represent a data substring by using a substring which has been occurred previously algorithm overview process the data from the beginning partition the data to blocks in a dynamic manner represent a block by a three-tuple (𝑖, 𝑙, 𝑥) “rewind 𝑖 symbols, copy 𝑙 symbols, and append 𝑥” Z 𝑥 –1 –𝑖+𝑙 –𝑖 𝑙–1 𝑙 encoding completed

encoding example of LZ77 consider to encode ABCBCDBDCBCD symbol A B C
history first time = (here) – 2 ≠ (here) – 2 = (here) – 3 ≠ (here) – 3 = (here) – 6 codeword (0, 0, A) (0, 0, B) (0, 0, C) (2, 2, D) (3, 1, D) (6, 4, *)

decoding example of LZ77 decode (0, 0, A), (0, 0, B), (0, 0, C), (2, 2, D), (3, 1, D), (6, 4, *) possible problem: large block is good, because we can copy more symbols large block is bad, because a codeword contains a large integer ... the trade-off degrades the performance.

LZ78 proposed by A. Lempel and J. Ziv in 1978
represent a block by a thw-tuple (𝑏, 𝑥) “copy the 𝑏-th block before, and append 𝑥” encoding completed 𝑥 –1 –𝑏

encoding example of LZ78 consider to encode ABCBCBCDBCDE symbol A B C
history first time = (here) – 2 block = (here) – 1 block codeword (0, A) (0, B) (0, C) (2, C) (1, D) (1, E) block # 1 2 3 4 5 6

decoding example of LZ78 decode (0, A), (0, B), (0, C), (2, C), (1, D), (1, E) advantage against LZ77: large block is good, because we can copy more symbols is there anything wrong with large blocks?  the performance slightly better than LZ78

summary of LZ algorithms
in LZ algorithms, the translation table is constructed adoptively information sources with unknown symbol probabilities information sources with memory LZW: good material to learn intellectual property (知的財産) UNISYS, CompuServe, GIF format, ...

summary of today’s class
Huffman codes are good, but not practical sometimes... run-length Huffman code simple but effective for certain types of sources arithmetic code not so practical, but has strong back-up from theory LZ codes practical, practical, practical

compress! From theoretical viewpoint...

Similar presentations

Presentation on theme: "compress! From theoretical viewpoint..."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

compress! From theoretical viewpoint...

Similar presentations

Presentation on theme: "compress! From theoretical viewpoint..."— Presentation transcript:

Similar presentations

About project

Feedback