Presentation on theme: "Succinct Data Structures for Permutations, Functions and Suffix Arrays Ian Munro University of Waterloo Joint work with F. Fich, M. He, J. Horton, A. López-"— Presentation transcript:
Succinct Data Structures for Permutations, Functions and Suffix Arrays Ian Munro University of Waterloo Joint work with F. Fich, M. He, J. Horton, A. López- Ortiz, S. Srinivasa Rao, Rajeev Raman, Venkatesh Raman How do we encode a permutation or generalization … function or specialization … suffix array in a small amount of space and still perform queries in constant time ???
Permutations: a Shortcut Notation Let P be a simple array giving π; P[i] = π[i] Also have B[i] be a pointer t positions back in (the cycle of) the permutation; B[i]= π -t [i].. But only define B for every t th position in cycle. (t is a constant; ignore cycle length “round-off”) So array representation P = [ x x 3 x 2 x 10 1]
Representing Shortcuts In a cycle there is a B every t positions … But these positions can be in arbitrary order Which i’s have a B, and how do we store it? Keep a vector of all positions 0 indicates no B1 indicates a B Rank gives the position of B[“i”] in B array So: π(i) and π -1 (i) in O(1) time & (1+ε)n lg n bits Theorem: Under a pointer machine model with space (1+ ε) n references, we need time 1/ε to answer π and π -1 queries; i.e. this is as good as it gets.
Getting n lg n Bits: an Aside This is the best we can do for O(1) operations But using Benes networks: 1-Benes network is a 2 input/2 output switch r+1-Benes network … join tops to tops R-Benes Network
A Benes Network Realizing the permutation ( )
What can we do with it? Divide into blocks of lg lg n gates … encode their actions in a word. Taking advantage of regularity of address mechanism and also Modify approach to avoid power of 2 issue Can trace a path in time O(lg n/(lg lg n) This is the best time we are able get for π and π -1 in minimum space. Observe: This method “violates” the pointer machine lower bound by using “micropointers”.
Back to the main track: Powers of π Consider the cycles of π ( 2 6 8)( )( 4 1 7) Keep a bit vector to indicate the start of each cycle ( ) Ignoring parentheses, view as new permutation, ψ. Note: ψ -1 (i) is position containing i … So we have ψ and ψ -1 as before Use ψ -1 (i) to find i, then bit vector (rank, select) to find π k or π -k
Functions Now consider arbitrary functions [n] → [n] “A function is just a hairy permutation” All tree edges lead to a cycle
Challenges here Essentially write down the components in a convenient order and use the n lg n bits to describe the mapping (as per permutations) To get f k (i): Find the level ancestor (k levels up) in a tree Or Go up to root and apply f the remaining number of steps around a cycle
Level Ancestors There are several level ancestor techniques using O(1) time and O(n) WORDS. Adapt Bender & Farach-Colton to work in O(n) bits But going the other way …
f -k is a set Moving Down the tree requires care f -3 ( ) = ( ) The trick: Report all nodes on a given level of a tree in time proportional to the number of nodes, and Don’t waste time on trees with no answers
Final Function Result Given an arbitrary function f: [n] → [n] With an n lg n + O(n) bit representation we can compute f k (i) in O(1) time and f -k (i) in time O(1 + size of answer).
Back to Text … And Suffix Arrays Text T[1..n] over (a,b)*# (a<#
Ascending to Max M is a permutation so M -1 is its inverse i.e. M -1 [i] says where i is in M Ascending-to-Max: 1 i n-2 i) M -1 [i] < M -1 [n] and M -1 [i+1] < M -1 [n] M -1 [i] < M -1 [i+1] ii) M -1 [i] > M -1 [n] and M -1 [i+1] > M -1 [n] M -1 [i] > M -1 [i+1] OK NO
Non-Nesting Non-Nesting: 1 i,j n-1 and M -1 [i] M -1 [i+1] and M -1 [j] > M -1 [j+1] M -1 [i+1] < M -1 [j+1] OK NO
Characterization Theorem for Suffix Arrays on Binary Texts Theorem: Ascending to Max & Non-nesting Suffix Array Corollary: Clean method of breaking SA into segments Corollary: Linear time algorithm to check whether SA is valid
Cardinality Queries T= a b a a a b b a a a b a a b b # Remember lengths longest run of a’s and of b’s SA (broken by runs, but not stored explicitly ) 8 3 | | |16 | |6 14 B a, bit vector.. If SA -1 [i-1] in an “a” section store 1 in B a,[SA -1 [i]], else 0 B a Create rank structure on B a, and similarly B b, (Note these are reversed except at #) Algorithm Count(T,P) s ← 1; e ← n; i ← m; while i>0 and se do if P[i[=a then s ← rank 1 (B a,s-1)+1; e ← rank 1 (B a,e) else s ← n a rank 1 (B b,s-1); e ← n a + 1 +rank 1 (B b,e) i ← i-1 Return max(e-s+1,0) Time: O(length of query)
Listing Queries Complex methods Key idea: for queries of length at least d, index every d th position.. For T and forT(reversed) So we have matches for T[i..n] and T[1,i-1] View these as points in 2 space (Ferragina & Manzini and Grossi & Vitter) Do a range query (Alstrup et al) Variety of results follow
General Conclusion Interesting, and useful, combinatorial objects can be: Stored succinctly … O(lower bound) +o() So that Natural queries are performed in O(1) time (or at least very close) This can make the difference between using them and not …