# Modeling Delta Encoding of Compressed Files S.T. Klein, T.C. Serebro, D. Shapira.

## Presentation on theme: "Modeling Delta Encoding of Compressed Files S.T. Klein, T.C. Serebro, D. Shapira."— Presentation transcript:

Modeling Delta Encoding of Compressed Files S.T. Klein, T.C. Serebro, D. Shapira

Delta Encoding Example: S=The Prague Stringology Club T=The Prague Stringology Conference 06 Δ=(1, 24)onferenc(3,2)06

Compressed Differencing Goal- Create a delta file of S and T, without decompressing the compressed files. ST Δ (S,T) E(S) Delta encoding: Semi Compressed Differencing: E(T)SE(S) Full Compressed Differencing:

LZW compression STR = input character WHILE there are input characters { C = input character IF STR  C is in T then STR = STR  C ELSE { output the code for STR add STR  C to T STR = C } output the code for STR

S =abccbaaabccba Example E(S) =1233219571

construct the trie of E(S) i  1 while i ≤ u{ P  Starting at the root, traverse the trie using P When a leaf v is reached k  depth of v in trie output the position in S corresponding to v i  i+ k } Semi Compressed Differencing Algorithm

E(S) =1233219571, T =ccbbabccbabccbba. (3,2)b(5,2)(9,3)(5,2) (9,3)b (5,2) Example Δ(S,T)=

Full Compressed Differencing Algorithm 1 construct the trie of E(S) 2 flag  0 // output character k 3 counter  1 // position in T 4 input oldcw from E(T) 5 while oldcw  NULL // still processing E(T) { 5.1 input cw from E(T) 5.2 node  Dictionary[oldcw] 5.3 if (Dictionary[cw]  NULL) 5.3.1 k  first character of string corresponding to Dictionary[cw] 5.4 else 5.4.1 k  first character of string corresponding to node 5.5 if ((node has a child k) and (cw  NULL)) 5.5.1 output (pos+flag,len-flag) corresponding to child k of node 5.5.2 flag  1 5.6 else 5.6.1 output (pos+flag, len-flag) corresponding to node 5.6.2 create a new child of node corresponding to k 5.6.3 flag  0 5.7 pos of child k of node  counter 5.8 oldcw  cw 5.9 counter  counter + len - flag }

E(S) =1233219571 E(T) =33221247957 Example

E(S) =1233219571 E(T) =33221247957 S =abccbaaabccba T= E(S) =1233219571 E(T) =33221247957 S =abccbaaabccba T=c oldcw=3 E(S) =1233219571 E(T) =33221247957 S =abccbaaabccba T=cc oldcw=3 cw=3 E(S) =1233219571 E(T) =33221247957 S =abccbaaabccba T=cc oldcw=3 cw=3 k=c 3

Example 4 (1,2,c) E(S) =1233219571 E(T) =33221247957 S =abccbaaabccba T=cc oldcw=3 cw=3 k=c Δ(S,T)= <3, 2> 3 E(S) =1233219571 E(T) =33221247957 S =abccbaaabccba T=cc oldcw=3 cw=3 k=c

Example 4 (1,2,c) E(S) =1233219571 E(T) =33221247957 S =abccbaaabccba T=cc oldcw=3 cw=3flag=1 k=c Δ(S,T)= E(S) =1233219571 E(T) =33221247957 S =abccbaaabccba T=ccb oldcw=3 cw=3flag=1 k=c

Example 4 (1,2,c) Δ(S,T)= E(S) =1233219571 E(T) =33221247957 S =abccbaaabccba T=ccb oldcw=3 cw=2flag=1 k=c E(S) =1233219571 E(T) =33221247957 S =abccbaaabccba T=ccb oldcw=3 cw=2flag=1 k=b <5, 1> 5 (2,2,c)

Example 4 (1,2,c) Δ(S,T)= E(S) =1233219571 E(T) =33221247957 S =abccbaaabccba T=ccb oldcw=3 cw=2flag=1 k=b 5 (2,2,c) E(S) =1233219571 E(T) =33221247957 S =abccbaaabccba T=ccbb oldcw=3 cw=2flag=1 k=b E(S) =1233219571 E(T) =33221247957 S =abccbaaabccba T=ccbb oldcw=2 cw=2flag=1 k=b 6 (3,2,b) b E(S) =1233219571 E(T) =33221247957 S =abccbaaabccba T=ccbb oldcw=2 cw=2flag=0 k=b

Example 4 (1,2,c) 5 (2,2,c) E(S) =1233219571 E(T) =33221247957 S =abccbaaabccba T=ccbb oldcw=2 cw=2flag=0 k=b E(S) =1233219571 E(T) =33221247957 S =abccbaaabccba T=ccbba oldcw=2 cw=2flag=0 k=b E(S) =1233219571 E(T) =33221247957 S =abccbaaabccba T=ccbba oldcw=2 cw=1flag=0 k=a 6 (3,2,b) Δ(S,T)= 4 (1,2,c) 5 (2,2,c) 7 (4,2,b) <5, 2> E(S) =1233219571 E(T) =33221247957 S =abccbaaabccba T=ccbba oldcw=2 cw=1flag=1 k=a b

Example 4 (1,2,c) Δ(S,T)= 5 (2,2,c) E(S) =1233219571 E(T) =33221247957 S =abccbaaabccba T=ccbba oldcw=2 cw=1flag=1 k=a E(S) =1233219571 E(T) =33221247957 S =abccbaaabccba T=ccbbab oldcw=2 cw=1flag=1 k=a E(S) =1233219571 E(T) =33221247957 S =abccbaaabccba T=ccbbab oldcw=1 cw=2flag=1 k=b 6 (3,2,b) 4 (1,2,c) 5 (2,2,c) 7 (4,2,b) 8 (5,2,a) <2,1> b

Example 4 (1,2,c) Δ(S,T)= 5 (2,2,c) E(S) =1233219571 E(T) =33221247957 S =abccbaaabccba T=ccbbabcc oldcw=2 cw=4flag=1 k=c 6 (3,2,b) b 4 (1,2,c) 5 (2,2,c) 7 (4,2,b) 8 (5,2,a) <2,1><2,1> 9 (6,2,b) <3, 1>

Example 4 (1,2,c) Δ(S,T)= 5 (2,2,c) E(S) =1233219571 E(T) =33221247957 S =abccbaaabccba T=ccbbabccba oldcw=4 cw=7flag=1 k=b 6 (3,2,b) 7 (4,2,b) 8 (5,2,a) <2,1><2,1> 9 (6,2,b) 10 (7,3,c) b (2, 1) E(S) =1233219571 E(T) =33221247957 S =abccbaaabccba T=ccbbabccba oldcw=4 cw=7flag=0 k=b b

Example 4 (1,2,c) Δ(S,T)= 5 (2,2,c) 6 (3,2,b) b 7 (4,2,b) 8 (5,2,a) <2,1><2,1> 9 (6,2,b) 10 (7,3,c) b (2, 1) E(S) =1233219571 E(T) =33221247957 S =abccbaaabccba T=ccbbabccbabc oldcw=7 cw=9flag=0 k=b 11 (9,3,b) b (4, 2) E(S) =1233219571 E(T) =33221247957 S =abccbaaabccba T=ccbbabccbabccb oldcw=9 cw=5flag=0 k=c <9, 3> 12 (11,3,b) E(S) =1233219571 E(T) =33221247957 S =abccbaaabccba T=ccbbabccbabccb oldcw=9 cw=5flag=1 k=c E(S) =1233219571 E(T) =33221247957 S =abccbaaabccba T=ccbbabccbabccbba oldcw=5 cw=7flag=1 k=b 13 (13,3,c) b (3, 1) E(S) =1233219571 E(T) =33221247957 S =abccbaaabccba T=ccbbabccbabccbba oldcw=7 cw=Nullflag=0 k=b (4, 2)

Combination of Pairs Δ(S,T)= <2,1><2,1> (2, 1)(4, 2) (3, 1)(4, 2) S =abccbaaabccba S =abccbaaabccba If two consecutive ordered pairs are of the form and, we combine them into a single ordered pair

Combination of Pairs If two consecutive ordered pairs are of the form and, we combine them into a single ordered pair Δ(S,T)= <2,1><2,1> (2, 1)(4, 2) (3, 1)(4, 2) S =abccbaaabccba <2,1><2,1> Δ(S,T)= (4, 2) (4, 2) <2,2 ><2,2 >cb

Encoding the delta file Δ(S,T)= (4, 2) (4, 2) <2,2 ><2,2 >cb File consists of: (pos, len) in S (pos, len) in T Characters flags

Experiments: S = xfig.3.2.1 T = xfig.3.2.2 |T| = 812K |Gzip(T)| = 325K |LZW(T)| = 497K |Δ(S,T)|  3K

Download ppt "Modeling Delta Encoding of Compressed Files S.T. Klein, T.C. Serebro, D. Shapira."

Similar presentations