Hideo Bannai, Shunsuke Inenaga, Masayuki Takeda Kyushu University, Japan SPIRE Cartagena, Colombia.

Slides:



Advertisements
Similar presentations
Suffix Trees Construction and Applications João Carreira 2008.
Advertisements

On-line Linear-time Construction of Word Suffix Trees Shunsuke Inenaga (Japan Society for the Promotion of Science & Kyushu University) Masayuki Takeda.
Sparse Compact Directed Acyclic Word Graphs
HABATAKITAI Laboratory Everything is String. Computing palindromic factorization and palindromic covers on-line Tomohiro I, Shiho Sugimoto, Shunsuke Inenaga,
Bar Ilan University And Georgia Tech Artistic Consultant: Aviya Amir.
Two implementation issues Alphabet size Generalizing to multiple strings.
1 Suffix tree and suffix array techniques for pattern analysis in strings Esko Ukkonen Univ Helsinki Erice School 30 Oct 2005 Modified Alon Itai 2006.
Suffix Trees and Suffix Arrays
Suffix Sorting & Related Algoritmics Martin Farach-Colton Rutgers University USA.
15-853Page : Algorithms in the Real World Suffix Trees.
296.3: Algorithms in the Real World
© 2004 Goodrich, Tamassia Tries1. © 2004 Goodrich, Tamassia Tries2 Preprocessing Strings Preprocessing the pattern speeds up pattern matching queries.
1 Prof. Dr. Th. Ottmann Theory I Algorithm Design and Analysis (12 - Text search: suffix trees)
Suffix Trees Suffix trees Linearized suffix trees Virtual suffix trees Suffix arrays Enhanced suffix arrays Suffix cactus, suffix vectors, …
21/05/2015Applied Algorithmics - week51 Off-line text search (indexing)  Off-line text search refers to the situation in which a preprocessed digital.
Tries Standard Tries Compressed Tries Suffix Tries.
Combinatorial Pattern Matching CS 466 Saurabh Sinha.
Tries Search for ‘bell’ O(n) by KMP algorithm O(dm) in a trie Tries
Refining Edits and Alignments Υλικό βασισμένο στο κεφάλαιο 12 του βιβλίου: Dan Gusfield, Algorithms on Strings, Trees and Sequences, Cambridge University.
Complexity 15-1 Complexity Andrei Bulatov Hierarchy Theorem.
Goodrich, Tamassia String Processing1 Pattern Matching.
Full-Text Indexing via Burrows-Wheeler Transform Wing-Kai Hon Oct 18, 2006.
Dynamic Text and Static Pattern Matching Amihood Amir Gad M. Landau Moshe Lewenstein Dina Sokol Bar-Ilan University.
Sequence Alignment Variations Computing alignments using only O(m) space rather than O(mn) space. Computing alignments with bounded difference Exclusion.
Algorithms for Regulatory Motif Discovery Xiaohui Xie University of California, Irvine.
6/26/2015 7:13 PMTries1. 6/26/2015 7:13 PMTries2 Outline and Reading Standard tries (§9.2.1) Compressed tries (§9.2.2) Suffix tries (§9.2.3) Huffman encoding.
Document Retrieval Problems S. Muthukrishnan. Storyline Zvi Galil gave a talk on the 13 th on 13 open problems he posed 13 years ago in string matching.
Building Suffix Trees in O(m) time Weiner had first linear time algorithm in 1973 McCreight developed a more space efficient algorithm in 1976 Ukkonen.
1 Exact Matching Charles Yan Na ï ve Method Input: P: pattern; T: Text Output: Occurrences of P in T Algorithm Naive Align P with the left end.
Survey: String Matching with k Mismatches Moshe Lewenstein Bar Ilan University.
1 Exact Set Matching Charles Yan Exact Set Matching Goal: To find all occurrences in text T of any pattern in a set of patterns P={p 1,p 2,…,p.
On the Use of Regular Expressions for Searching Text Charles L.A. Clarke and Gordon V. Cormack Fast Text Searching.
© 2004 Goodrich, Tamassia Tries1. © 2004 Goodrich, Tamassia Tries2 Preprocessing Strings Preprocessing the pattern speeds up pattern matching queries.
Computing Left-Right Maximal Generic Words Takaaki Nishimoto, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda Kyushu University, Japan.
Finding Characteristic Substrings from Compressed Texts Shunsuke Inenaga Kyushu University, Japan Hideo Bannai Kyushu University, Japan.
String Matching with k Mismatches Moshe Lewenstein Bar Ilan University Modified by Ariel Rosenfeld.
Multiple Pattern Matching in LZW Compressed Text Takuya KIDA Masayuki TAKEDA Ayumi SHINOHARA Masamichi MIYAZAKI Setsuo ARIKAWA Department of Informatics.
Improved string matching with k mismatches (The Kangaroo Method) Galil, R. Giancarlo SIGACT News, Vol. 17, No. 4, 1986, pp. 52–54 Original: Moshe Lewenstein.
Constant-Time LCA Retrieval Presentation by Danny Hermelin, String Matching Algorithms Seminar, Haifa University.
Computing longest common substring and all palindromes from compressed strings Wataru Matsubara 1, Shunsuke Inenaga 2, Akira Ishino 1, Ayumi Shinohara.
Tries1. 2 Outline and Reading Standard tries (§9.2.1) Compressed tries (§9.2.2) Suffix tries (§9.2.3)
Speeding up pattern matching by text compression Department of Informatics, Kyushu University, Japan Department of AI, Kyushu Institute of Technology,
Suffix trees. Trie A tree representing a set of strings. a b c e e f d b f e g { aeef ad bbfe bbfg c }
String Matching String Matching Problem We introduce a general framework which is suitable to capture an essence of compressed pattern matching according.
Multiple Pattern Matching Algorithms on Collage System T. Kida, T. Matsumoto, M. Takeda, A. Shinohara, and S. Arikawa Department of Informatics, Kyushu.
A Unifying Framework for Compressed Pattern Matching Takuya Kida, Masayuki Takeda, Ayumi Shinohara, Yusuke Shibata, Setsuo Arikawa Department of Informatics,
Sets of Digital Data CSCI 2720 Fall 2005 Kraemer.
Semi-dynamic compact index for short patterns and succinct van Emde Boas tree 1 Yoshiaki Matsuoka 1, Tomohiro I 2, Shunsuke Inenaga 1, Hideo Bannai 1,
Keisuke Goto, Hideo Bannai, Shunsuke Inenaga, Masayuki Takeda
Everything is String. Closed Factorization Golnaz Badkobeh 1, Hideo Bannai 2, Keisuke Goto 2, Tomohiro I 2, Costas S. Iliopoulos 3, Shunsuke Inenaga 2,
Faster Approximate String Matching over Compressed Text By Gonzalo Navarro *, Takuya Kida †, Masayuki Takeda †, Ayumi Shinohara †, and Setsuo Arikawa.
Costas Busch - LSU1 Parsing. Costas Busch - LSU2 Compiler Program File v = 5; if (v>5) x = 12 + v; while (x !=3) { x = x - 3; v = 10; } Add v,v,5.
Computing smallest and largest repetition factorization in O(n log n) time Hiroe Inoue, Yoshiaki Matsuoka, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai,
Tries 4/16/2018 8:59 AM Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and.
Data Compression.
Tries 07/28/16 11:04 Text Compression
Tries 5/27/2018 3:08 AM Tries Tries.
Andrzej Ehrenfeucht, University of Colorado, Boulder
Reducing the Space Requirement of LZ-index
Tries 9/14/ :13 AM Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and.
Parsing Costas Busch - LSU.
Suffix trees.
Reachability on Suffix Tree Graphs
String Data Structures and Algorithms
String Data Structures and Algorithms
Tries 2/23/2019 8:29 AM Tries 2/23/2019 8:29 AM Tries.
Tries 2/27/2019 5:37 PM Tries Tries.
String Matching with k Mismatches
Presentation transcript:

Hideo Bannai, Shunsuke Inenaga, Masayuki Takeda Kyushu University, Japan SPIRE Cartagena, Colombia

Outline SPIRE Cartagena, Colombia  Background  LZ78 Factorization  Straight Line Programs (SLP)  Algorithms  LZ78 factorization using suffix trees  SLP to LZ78  Improvements

Background SPIRE Cartagena, Colombia Compr essed Repres entatio n of String BIG String This work: LZ78 factorization of grammar compressed strings Compressed String Processing (CSP)  compress string for storage … but … don’t decompress all of it when using it!  can be faster than processing the uncompressed text, by exploiting regularities identified by compression  regard compression as a generic preprocessing! Pattern Matching process directly Edit Distance Pattern Mining etc.

LZ78 Factorization [Ziv&Lempel ’78] SPIRE Cartagena, Colombia The LZ78-factorization of string S is a factorization S = f 1 f 2... f m where f i is the longest prefix of f i... f m such that f i = f j c for some 0 ≤ j < i (let f 0 = ε) S = a l a b a r a l a l a b a r d a $ a 2 2 l 3 3 b 4 4 r 5 5 l 7 7 b 6 6 a 8 8 d 9 9 $ LZ78 trie of S (0, a ) f1f1 (0, l ) f2f2 (1, b ) f3f3 (1, r ) f4f4 (1, l ) f5f5 (5, a ) f6f6 (0, b ) f7f7 (5, d ) f8f8 (1, $ ) f9f9 O(N log σ) time O(m) space

Straight Line Programs SPIRE Cartagena, Colombia CFG in Chomsky normal form that derives single string. Can efficiently model outputs of many compression algorithms: REPAIR, SEQUITUR, LZ78, etc. Straight Line Program X 1 = a X 2 = b X 3 = X 1 X 2 X 4 = X 1 X 3 X 5 = X 4 X 3 X 6 = X 4 X 5 X 7 = X 6 X 5 SLP, n=7 Derivation tree S X7X7 X2X2 X1X1 X6X6 X2X2 X1X1 X1X1 X3X3 X2X2 X1X1 X1X1 X3X3 X4X4 X3X3 X4X4 X5X5 aabaababaabab X2X2 X1X1 X1X1 X3X3 X2X2 X1X1 X4X4 X3X3 X5X5

Problem: SLP to LZ78 SPIRE Cartagena, Colombia Input: SLP Output: LZ78 Factorization (Trie) X 1 = a X 5 = X 4 X 3 X 2 = b X 6 = X 4 X 5 X 3 = X 1 X 2 X 7 = X 6 X 5 X 4 = X 1 X a a b a b b Why “re-compress” a compressed representation?  Convert the representation  Some CSP algorithms require specific compression  Re-compress an SLP modified by ad-hoc edits  Dynamic compressed texts  Compute Normalized Compression Distance [Li et al. 2004]  Clustering & classification w/o decompression C LZ78 (x), C LZ78 (y), C LZ78 (xy) from SLPs of x, y Computer Scientist Make Sleeping Files Walk in their Sleep!

Our Results SPIRE Cartagena, Colombia Algorithms to compute LZ78 from SLP AlgorithmTimeSpace Direct (uncompressed) O(N log σ ) O(m) Decompress + Direct O(N log σ ) O(n+m) SLP (partial decompressions) O(nN ½ + m log N)O(nN ½ + m) SLP + Doubling O(nL + m log N)O(nL + m) SLP + Redundancy Reduction O(N α + m log N)O(N α + m) N : length of uncompressed string S σ: alphabet size n : size of SLP representing SL : length of longest LZ78 factor N α = N – α ≤ Nm : # of LZ78 factors (O(N/log N) for constant σ) α ≥ 0 is a quantity that represents the amount of redundancy in the string that is captured by the SLP

LZ78 Factorization using a Suffix Tree SPIRE Cartagena, Colombia

Suffix Tree & LZ78 SPIRE Cartagena, Colombia The LZ78 trie can be superimposed on the suffix tree S suffix tree of S LZ78 trie of S aabaababaabab 10 a b a a b a b a a b a b a a b a b b a a b a b b a b a b a a b a b a a b a b a b a b a a b a b b a a b a b a a b a b b a a b a b b

10 a b a a b a b a a b a b a a b a b b a a b a b b a b a b a a b a b a a b a b a b a b a a b a b b a a b a b LZ78 Factorization on Suffix Tree SPIRE Cartagena, Colombia aabaababaabab S Build LZ78 trie on top of suffix tree ST Nodes corresponding to LZ78 trie are marked Find longest prefix of S[i:N] in LZ78 trie  O(1) time by dynamic nearest marked ancestor queries [Westbrook, ‘92] Make new node of LZ78 trie on ST  O(1) time by level ancestor query on ST [Berkman & Vishkin ‘94] Compute next position i  i + |f i | LZ78 factorization in O(m) time, given suffix tree preprocessed for nma & la queries i Next factor is prefix of S[i:N]. Find node in ST corresponding to S[i:N]

SLP to LZ78 SPIRE Cartagena, Colombia

Our algorithm: SLP to LZ78 SPIRE Cartagena, Colombia We only need a suffix tree that contains all distinct substrings of S with length at most c N  Build GST from a set of substrings of S that contain all distinct length-c N substrings of S Main Idea For any string of length N, the length of any LZ78 factor f i satisfies: |f i | ≤ c N = (2N+¼) ½ – ½ = O(N ½ ) For any string of length N, the length of any LZ78 factor f i satisfies: |f i | ≤ c N = (2N+¼) ½ – ½ = O(N ½ ) Key Observation

Important Concept: Stabbing SPIRE Cartagena, Colombia X i stabs an interval [u:v] of S, when it is the shortest variable that derives the interval (any interval is stabbed by a unique variable) X 1 = a X 2 = b X 3 = X 1 X 2 X 4 = X 1 X 3 X 5 = X 4 X 3 X 6 = X 4 X 5 X 7 = X 6 X 5 e.g.: aaba at [9:12] is stabbed by X 5 X7X7 X2X2 X1X1 X6X6 X2X2 X1X1 X1X1 X3X3 X2X2 X1X1 X1X1 X3X3 X4X4 X3X3 X4X4 X5X5 aabaababaabab X2X2 X1X1 X1X1 X3X3 X2X2 X1X1 X4X4 X3X3 X5X5 X5X

Substrings stabbed by X i SPIRE Cartagena, Colombia All length-q substrings stabbed by X i are contained in a string t i (q) of length at most 2(q – 1) Xl(i)Xl(i) Xr(i)Xr(i) XiXi q – 1 q q Any length-q substring of S is stabbed by some unique variable X i, and therefore is a substring of some t i (q) { t i (c N ) : |X i | ≥ c N, 1 ≤ i ≤ n } will contain all distinct length-c N substrings of S ti(q)ti(q)

LZ78 Factorization from SLP SPIRE Cartagena, Colombia Algorithm: 1. Compute { t i (c N ) : |X i | ≥ c N, 1 ≤ i ≤ n } 2. Build generalized suffix tree (GST) for strings { t i (c N ) : |X i | ≥ c N, 1 ≤ i ≤ n } 3. Run LZ78 Factorization algorithm using GST O(nc N ) time/space

Example SPIRE Cartagena, Colombia  N = 13, c N = 4, n = 7  { t 5 (4), t 6 (4), t 7 (4) } = { aabab, aabaab, babaab } S X7X7 X2X2 X1X1 X6X6 X2X2 X1X1 X1X1 X3X3 X2X2 X1X1 X1X1 X3X3 X4X4 X3X3 X4X4 X5X5 aabaababaabab X2X2 X1X1 X1X1 X3X3 X2X2 X1X1 X4X4 X3X3 X5X

GST & LZ78 Factors SPIRE Cartagena, Colombia The LZ78 trie superimposed on GST of {t 5 (4), t 6 (4), t 7 (4)} aabaababaabab S a a b a b a b b b a a 3 3 8,14 b 7,13 9,15 4,10,16 5,11, a b b a b a b GST of {t 5 (4),t 6 (4),t 7 (4)} LZ78 trie of S a a b a b b a a b a b b a a b a b a a b a a b b a b a a b t 5 (4) t 6 (4) t 7 (4)

Find longest prefix of S[i:N] in LZ78 trie Make new node for LZ78 trie on ST Compute next position i  i + |f i | Next factor is prefix of S[i:N]. Find node in GST corresponding to S[i:N] a a b a b a a b a a b b a b a a b t 5 (4) t 6 (4) t 7 (4) S X7X7 X2X2 X1X1 X6X6 X2X2 X1X1 X1X1 X3X3 X2X2 X1X1 X1X1 X3X3 X4X4 X3X3 X4X4 X5X5 aabaababaabab X2X2 X1X1 X1X1 X3X3 X2X2 X1X1 X4X4 X3X3 X5X a a b a b a b b b a a 3 3 8,14 b 7,13 9,15 4,10,16 5,11, a b b a b a b 1 1 LZ78 Factorization on GST SPIRE Cartagena, Colombia 0 0 c N = 4 i O(log N) time w/ random access on SLP [Bille et al. 2011] O(1) time w/ dynamic nma queries

a a b a b a a b a a b b a b a a b t 5 (4) t 6 (4) t 7 (4) a a b a b a b b b a a 3 3 8,14 b 7,13 9,15 4,10,16 5,11, a b b a b a b LZ78 Factorization on GST SPIRE Cartagena, Colombia 0 0 S X7X7 X2X2 X1X1 X6X6 X2X2 X1X1 X1X1 X3X3 X2X2 X1X1 X1X1 X3X3 X4X4 X3X3 X4X4 X5X5 aabaababaabab X2X2 X1X1 X1X1 X3X3 X2X2 X1X1 X4X4 X3X3 X5X c N = 4 i Find longest prefix of S[i:N] in LZ78 trie Make new node for LZ78 trie on ST Compute next position i  i + |f i | Next factor is prefix of S[i:N]. Find node in GST corresponding to S[i:N] O(log N) time w/ random access on SLP [Bille et al. 2011] O(1) time w/ dynamic nma queries

a a b a b a a b a a b b a b a a b t 5 (4) t 6 (4) t 7 (4) S X7X7 X2X2 X1X1 X6X6 X2X2 X1X1 X1X1 X3X3 X2X2 X1X1 X1X1 X3X3 X4X4 X3X3 X4X4 X5X5 aabaababaabab X2X2 X1X1 X1X1 X3X3 X2X2 X1X1 X4X4 X3X3 X5X a a b a b a b b b a a 3 3 8,14 b 7,13 9,15 4,10,16 5,11, a b b a b a b LZ78 Factorization on GST SPIRE Cartagena, Colombia 0 0 c N = 4 i LZ78 factorization can be computed in O(mlogN) time, given GST preprocessed for nma & la, and SLP preprocessed for random access queries Find longest prefix of S[i:N] in LZ78 trie Make new node for LZ78 trie on ST Compute next position i  i + |f i | Next factor is prefix of S[i:N]. Find node in GST corresponding to S[i:N] O(log N) time w/ random access on SLP [Bille et al. 2011] O(1) time w/ dynamic nma queries

a a b a b a a b a a b b a b a a b t 5 (4) t 6 (4) t 7 (4) S X7X7 X2X2 X1X1 X6X6 X2X2 X1X1 X1X1 X3X3 X2X2 X1X1 X1X1 X3X3 X4X4 X3X3 X4X4 X5X5 aabaababaabab X2X2 X1X1 X1X1 X3X3 X2X2 X1X1 X4X4 X3X3 X5X a a b a b a b b b a a 3 3 8,14 b 7,13 9,15 4,10,16 5,11, a b b a b a b LZ78 Factorization on GST SPIRE Cartagena, Colombia 0 0 c N = 4 i LZ78 factorization can be computed in O(mlogN) time, given GST preprocessed for nma & la, and SLP preprocessed for random access queries 4 4 Find longest prefix of S[i:N] in LZ78 trie Make new node for LZ78 trie on ST Compute next position i  i + |f i | Next factor is prefix of S[i:N]. Find node in GST corresponding to S[i:N] O(log N) time w/ random access on SLP [Bille et al. 2011] O(1) time w/ dynamic nma queries

a a b a b a a b a a b b a b a a b t 5 (4) t 6 (4) t 7 (4) S X7X7 X2X2 X1X1 X6X6 X2X2 X1X1 X1X1 X3X3 X2X2 X1X1 X1X1 X3X3 X4X4 X3X3 X4X4 X5X5 aabaababaabab X2X2 X1X1 X1X1 X3X3 X2X2 X1X1 X4X4 X3X3 X5X a a b a b a b b b a a 3 3 8,14 b 7,13 9,15 4,10,16 5,11, a b b a b a b LZ78 Factorization on GST SPIRE Cartagena, Colombia 0 0 c N = 4 i LZ78 factorization can be computed in O(mlogN) time, given GST preprocessed for nma & la, and SLP preprocessed for random access queries Find longest prefix of S[i:N] in LZ78 trie Make new node for LZ78 trie on ST Compute next position i  i + |f i | Next factor is prefix of S[i:N]. Find node in GST corresponding to S[i:N] O(log N) time w/ random access on SLP [Bille et al. 2011] O(1) time w/ dynamic nma queries

a a b a b a a b a a b b a b a a b t 5 (4) t 6 (4) t 7 (4) S X7X7 X2X2 X1X1 X6X6 X2X2 X1X1 X1X1 X3X3 X2X2 X1X1 X1X1 X3X3 X4X4 X3X3 X4X4 X5X5 aabaababaabab X2X2 X1X1 X1X1 X3X3 X2X2 X1X1 X4X4 X3X3 X5X a a b a b a b b b a a 3 3 8,14 b 7,13 9,15 4,10,16 5,11, a b b a b a b LZ78 Factorization on GST SPIRE Cartagena, Colombia 0 0 c N = 4 i LZ78 factorization can be computed in O(mlogN) time, given GST preprocessed for nma & la, and SLP preprocessed for random access queries Find longest prefix of S[i:N] in LZ78 trie Make new node for LZ78 trie on ST Compute next position i  i + |f i | Next factor is prefix of S[i:N]. Find node in GST corresponding to S[i:N] O(log N) time w/ random access on SLP [Bille et al. 2011] O(1) time w/ dynamic nma queries

Summary of Basic Algorithm SPIRE Cartagena, Colombia Extreme Cases:  If the string is compressible, n = O(log N), m = O(N ½ ), so O(nc N + m log N) = O(N ½ log N) = o(N)  If the string is not compressible, n, m = O(N) and O(nc N + m log N) = O(N 1.5 ) AlgorithmTimeSpace Direct (uncompressed) O(N log σ)O(m) Decompress + Direct O(N log σ)O(n+m) SLP O(nc N + m log N)O(nc N + m) c N = O(N ½ ) can we do better than just revert to decompress & process?

(1) Improving nc N term to nL ≤ nc N SPIRE Cartagena, Colombia Let L denote length of longest LZ78 factor of S  We built GST for distinct substrings of length at most c N but actually, we only need substrings of length at most L  However, L is not known beforehand… O(nc N + mlogN) time, O(nc N + m) space  O(nL + mlogN) time, O(nL + m) space  Assume L = 2 and run algorithm.  If LZ78 trie expands beyond GST, L  2×L, rebuild GST and LZ78 trie, and continue  Total time complexity for rebuild: Σ i=1..log L O(n2 i +m) = O(nL+mlogL) Doubling Technique:

(2) Improving nc N term to N α ≤ N SPIRE Cartagena, Colombia We can replace GST with suffix tree of trie for q = c N Given SLP for string S, the set of length-q substrings of S can be represented as paths in a reverse trie of size N α = N – α (q) ≤ N,where α (q) = Σ i:|X i | ≥ q (vOcc(X i ) – 1) (|t i (q)| – (q – 1)) ≥ 0 vOcc(X i ) : # of times X i occurs in derivation tree Lemma [Goto et al. CPM 2012] The suffix tree of a reverse trie can be constructed in linear time. Lemma [Shibuya 2003] O(nc N + mlogN) time, O(nc N + m) space  O(N α + mlogN) time, O(N α + m) space The trie can be computed in time linear of its size. N α = O(nc N )

Example: Trie of size N α for q = 4 SPIRE Cartagena, Colombia X7X7 X2X2 X1X1 X6X6 X2X2 X1X1 X1X1 X3X3 X2X2 X1X1 X1X1 X3X3 X4X4 X3X3 X4X4 X5X5 aabaababaabab S aabab aab bab X2X2 X1X1 X1X1 X3X3 X2X2 X1X1 X4X4 X3X3 X5X5 Σ|t i (q)| : 17 Text size: 13 Trie size: 11 We can aggregate all t i (q) into a trie of size at most the text size

Summary SPIRE Cartagena, Colombia  Showed algorithm for SLP  LZ78 factorization  at least as fast as naïve decompress & process  better when string is compressible AlgorithmTimeSpace Direct (uncompressed) O(N log σ ) O(m) Decompress + Direct O(N log σ ) O(n+m) SLP (partial decompressions) O(nN ½ + m log N)O(nN ½ + m) SLP + Doubling O(nL + m log N)O(nL + m) SLP + Redundancy Reduction O(N α + m log N)O(N α + m) N : length of uncompressed string S σ: alphabet size n : size of SLP representing SL : length of longest LZ78 factor N α = N – α(c N ) ≤ Nm : # of LZ78 factors (O(N/log N) for constant σ)