Succinct Data Structures for Permutations, Functions and Suffix Arrays

Slides:



Advertisements
Similar presentations
Introduction to Algorithms
Advertisements

Analysis of Computer Algorithms
Bounded-depth circuits: Separating wires from gates Michal Koucký Joint work with: Pavel Pudlák and Denis Thérien.
Graph Algorithms Algorithm Design and Analysis Victor AdamchikCS Spring 2014 Lecture 11Feb 07, 2014Carnegie Mellon University.
Two algorithms for checking emptiness. How to check for emptiness? Is L (A) = ; ? Need to check if there exists an accepting computation (passes through.
1 Succinct Representation of Labeled Graphs Jérémy Barbay, Luca Castelli Aleardi, Meng He, J. Ian Munro.
Introduction to Algorithms Quicksort
Cell-Probe Lower Bounds for Succinct Partial Sums Mihai P ă trașcuEmanuele Viola.
David Luebke 1 8/25/2014 CS 332: Algorithms Red-Black Trees.
Definitions and Bottom-Up Insertion
Introduction to Algorithms Red-Black Trees
110/6/2014CSE Suprakash Datta datta[at]cse.yorku.ca CSE 3101: Introduction to the Design and Analysis of Algorithms.
I/O and Space-Efficient Path Traversal in Planar Graphs Craig Dillabaugh, Carleton University Meng He, University of Waterloo Anil Maheshwari, Carleton.
Problems and Their Classes
Complexity Analysis (Part II)
February 12, 2007 WALCOM '2007 1/22 DiskTrie: An Efficient Data Structure Using Flash Memory for Mobile Devices N. M. Mosharaf Kabir Chowdhury Md. Mostofa.
Succinct Representations of Dynamic Strings Meng He and J. Ian Munro University of Waterloo.
Parametrized Matching Amir, Farach, Muthukrishnan Orgad Keller.
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
Analysis of Algorithms
5th July 2004CPM A Simple Optimal Representation for Balanced Parentheses Richard Geary, Naila Rahman, Rajeev Raman (University of Leicester, UK)
Succinct Representation of Balanced Parentheses, Static Trees and Planar Graphs J. Ian Munro & Venkatesh Raman.
An Improved Succinct Dynamic k-Ary Tree Representation (work in progress) Diego Arroyuelo Department of Computer Science, Universidad de Chile.
Lower bound: Decision tree and adversary argument
MS 101: Algorithms Instructor Neelima Gupta
Mudasser Naseer 1 5/1/2015 CSC 201: Design and Analysis of Algorithms Lecture # 9 Linear-Time Sorting Continued.
A New Compressed Suffix Tree Supporting Fast Search and its Construction Algorithm Using Optimal Working Space Dong Kyue Kim 1 andHeejin Park 2 1 School.
Compressed Compact Suffix Arrays Veli Mäkinen University of Helsinki Gonzalo Navarro University of Chile compact compress.
A Categorization Theorem on Suffix Arrays with Applications to Space Efficient Text Indexes Meng He, J. Ian Munro, and S. Srinivasa Rao University of Waterloo.
Succinct Data Structures Ian Munro University of Waterloo Joint work with David Benoit, Andrej Brodnik, D, Clark, F. Fich, M. He, J. Horton, A. López-Ortiz,
Advanced Algorithm Design and Analysis (Lecture 4) SW5 fall 2004 Simonas Šaltenis E1-215b
BTrees & Bitmap Indexes
Lecture 5: Linear Time Sorting Shang-Hua Teng. Sorting Input: Array A[1...n], of elements in arbitrary order; array size n Output: Array A[1...n] of the.
Comp 122, Spring 2004 Lower Bounds & Sorting in Linear Time.
FALL 2004CENG 351 Data Management and File Structures1 External Sorting Reference: Chapter 8.
FALL 2006CENG 351 Data Management and File Structures1 External Sorting.
Lecture 5: Master Theorem and Linear Time Sorting
Compact Representations of Separable Graphs From a paper of the same title submitted to SODA by: Dan Blandford and Guy Blelloch and Ian Kash.
Computer Algorithms Lecture 11 Sorting in Linear Time Ch. 8
Mike 66 Sept Succinct Data Structures: Techniques and Lower Bounds Ian Munro University of Waterloo Joint work with/ work of Arash Farzan, Alex Golynski,
Succinct Representations of Trees
Space Efficient Data Structures for Dynamic Orthogonal Range Counting Meng He and J. Ian Munro University of Waterloo.
Introduction n – length of text, m – length of search pattern string Generally suffix tree construction takes O(n) time, O(n) space and searching takes.
CSC 41/513: Intro to Algorithms Linear-Time Sorting Algorithms.
Summer School '131 Succinct Data Structures Ian Munro.
Sorting Fun1 Chapter 4: Sorting     29  9.
Succinct Orthogonal Range Search Structures on a Grid with Applications to Text Indexing Prosenjit Bose, Carleton University Meng He, Unversity of Waterloo.
Succinct Data Structures Ian Munro University of Waterloo Joint work with David Benoit, Andrej Brodnik, D, Clark, F. Fich, M. He, J. Horton, A. López-Ortiz,
Constant-Time LCA Retrieval Presentation by Danny Hermelin, String Matching Algorithms Seminar, Haifa University.
CSC 211 Data Structures Lecture 13
Compressed Prefix Sums O’Neil Delpratt Naila Rahman Rajeev Raman.
Succinct Ordinal Trees Based on Tree Covering Meng He, J. Ian Munro, University of Waterloo S. Srinivasa Rao, IT University of Copenhagen.
Basic Linear Algebra Subroutines (BLAS) – 3 levels of operations Memory hierarchy efficiently exploited by higher level BLAS BLASMemor y Refs. FlopsFlops/
Joint Advanced Student School Compressed Suffix Arrays Compression of Suffix Arrays to linear size Fabian Pache.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
Discrete Methods in Mathematical Informatics Kunihiko Sadakane The University of Tokyo
Succinct Data Structures
Lower Bounds & Sorting in Linear Time
Tries 07/28/16 11:04 Text Compression
Data Structures: Disjoint Sets, Segment Trees, Fenwick Trees
Succinct Data Structures
Succinct Data Structures
Decision trees Polynomial-Time
Succinct Data Structures: Upper, Lower & Middle Bounds
Data Structures: Segment Trees, Fenwick Trees
Discrete Methods in Mathematical Informatics
Lower Bounds & Sorting in Linear Time
CENG 351 Data Management and File Structures
CS 583 Analysis of Algorithms
Succinct Data Structures
Presentation transcript:

Succinct Data Structures for Permutations, Functions and Suffix Arrays Ian Munro University of Waterloo Joint work with F. Fich, M. He, J. Horton, A. López-Ortiz, S. Srinivasa Rao, Rajeev Raman, Venkatesh Raman How do we encode a permutation or generalization … function or specialization … suffix array in a small amount of space and still perform queries in constant time ???

Permutations: a Shortcut Notation Let P be a simple array giving π; P[i] = π[i] Also have B[i] be a pointer t positions back in (the cycle of) the permutation; B[i]= π-t[i] .. But only define B for every tth position in cycle. (t is a constant; ignore cycle length “round-off”) So array representation P = [8 4 12 5 13 x x 3 x 2 x 10 1] 1 2 3 4 5 6 7 8 9 10 11 12 13 2 4 5 13 1 8 3 12 10

Representing Shortcuts In a cycle there is a B every t positions … But these positions can be in arbitrary order Which i’s have a B, and how do we store it? Keep a vector of all positions 0 indicates no B 1 indicates a B Rank gives the position of B[“i”] in B array So: π(i) and π -1(i) in O(1) time & (1+ε)n lg n bits Theorem: Under a pointer machine model with space (1+ ε) n references, we need time 1/ε to answer π and π -1 queries; i.e. this is as good as it gets.

Getting n lg n Bits: an Aside This is the best we can do for O(1) operations But using Benes networks: 1-Benes network is a 2 input/2 output switch r+1-Benes network … join tops to tops 1 2 3 4 5 6 7 8 3 5 7 8 1 6 4 2 R-Benes Network R-Benes Network

A Benes Network Realizing the permutation (3 5 7 8 1 6 4 2) 1 2 3 4 5

What can we do with it? Divide into blocks of lg lg n gates … encode their actions in a word. Taking advantage of regularity of address mechanism and also Modify approach to avoid power of 2 issue Can trace a path in time O(lg n/(lg lg n) This is the best time we are able get for π and π-1 in minimum space. Observe: This method “violates” the pointer machine lower bound by using “micropointers”.

Back to the main track: Powers of π Consider the cycles of π ( 2 6 8)( 3 5 9 10)( 4 1 7) Keep a bit vector to indicate the start of each cycle ( 2 6 8 3 5 9 10 4 1 7) Ignoring parentheses, view as new permutation, ψ. Note: ψ-1(i) is position containing i … So we have ψ and ψ-1 as before Use ψ-1(i) to find i, then bit vector (rank, select) to find πk or π-k

Functions Now consider arbitrary functions [n]→[n] “A function is just a hairy permutation” All tree edges lead to a cycle

Challenges here Essentially write down the components in a convenient order and use the n lg n bits to describe the mapping (as per permutations) To get fk(i): Find the level ancestor (k levels up) in a tree Or Go up to root and apply f the remaining number of steps around a cycle

Level Ancestors There are several level ancestor techniques using O(1) time and O(n) WORDS. Adapt Bender & Farach-Colton to work in O(n) bits But going the other way …

f-k is a set Moving Down the tree requires care f-3( ) = ( ) The trick: Report all nodes on a given level of a tree in time proportional to the number of nodes, and Don’t waste time on trees with no answers

Final Function Result Given an arbitrary function f: [n]→[n] With an n lg n + O(n) bit representation we can compute fk(i) in O(1) time and f-k(i) in time O(1 + size of answer).

Back to Text … And Suffix Arrays Text T[1..n] over (a,b)*# (a<#<b) There are 2n-1 such texts, which of the n! suffix arrays are valid? 1 2 3 4 5 6 7 8 SA= 4 7 5 1 8 3 6 2 is a b b a a b a # SA-1= 4 8 6 1 3 7 2 5 M= 4 7 1 5 8 2 3 6 isn’t ..why?

Ascending to Max M is a permutation so M-1 is its inverse i.e. M-1[i] says where i is in M Ascending-to-Max:  1  i  n-2 M-1[i] < M-1[n] and M-1[i+1] < M-1[n]  M-1[i] < M-1[i+1] M-1[i] > M-1[n] and M-1[i+1] > M-1[n]  M-1[i] > M-1[i+1] 4 7 5 1 8 3 6 2 OK 4 7 1 5 8 2 3 6 NO

Non-Nesting Non-Nesting: 1  i,j  n-1 and M-1[i]<M-1[j] M-1[i] < M-1[i+1] and M-1[j] < M-1[j+1]  M-1[i+1] < M-1[j+1] M-1[i] > M-1[i+1] and M-1[j] > M-1[j+1]  M-1[i+1] < M-1[j+1] 4 7 5 1 8 3 6 2 OK 4 7 1 5 8 2 3 6 NO

Characterization Theorem for Suffix Arrays on Binary Texts Theorem: Ascending to Max & Non-nesting  Suffix Array Corollary: Clean method of breaking SA into segments Corollary: Linear time algorithm to check whether SA is valid

Cardinality Queries T= a b a a a b b a a a b a a b b # Remember lengths longest run of a’s and of b’s SA (broken by runs, but not stored explicitly) 8 3 | 9 4 12| 1 10 5 13|16 | 7 2 11 15|6 14 Ba,  bit vector .. If SA-1[i-1] in an “a” section store 1 in Ba,[SA-1[i]], else 0 Ba 0 0 1 1 0 0 1 1 1 0 0 1 1 0 1 1 Create rank structure on Ba, and similarly Bb, (Note these are reversed except at #) Algorithm Count(T,P) s ← 1; e ←n; i ← m; while i>0 and se do if P[i[=a then s← rank1(Ba,s-1)+1; e←rank1(Ba,e) else s← na + 2 + rank1(Bb,s-1); e←na + 1 +rank1(Bb,e) i ← i-1 Return max(e-s+1,0) Time: O(length of query)

Listing Queries Complex methods Key idea: for queries of length at least d, index every dth position .. For T and forT(reversed) So we have matches for T[i..n] and T[1,i-1] View these as points in 2 space (Ferragina & Manzini and Grossi & Vitter) Do a range query (Alstrup et al) Variety of results follow

General Conclusion Interesting, and useful, combinatorial objects can be: Stored succinctly … O(lower bound) +o() So that Natural queries are performed in O(1) time (or at least very close) This can make the difference between using them and not …