Presentation is loading. Please wait.

Presentation is loading. Please wait.

Succinct Indexes for Strings, Binary Relations and Multi-labeled Trees Jérémy Barbay, Meng He, J. Ian Munro, University of Waterloo S. Srinivasa Rao, IT.

Similar presentations


Presentation on theme: "Succinct Indexes for Strings, Binary Relations and Multi-labeled Trees Jérémy Barbay, Meng He, J. Ian Munro, University of Waterloo S. Srinivasa Rao, IT."— Presentation transcript:

1 Succinct Indexes for Strings, Binary Relations and Multi-labeled Trees Jérémy Barbay, Meng He, J. Ian Munro, University of Waterloo S. Srinivasa Rao, IT University of Copenhagen

2 Background: Succinct Data Structures What are succinct data structures Jacobson 1989 Why succinct data structures Large data sets in modern applications: textual, genomic, spatial or geometric An implementation: Delpratt et al. 2006 Succinct integrated encodings Main data and auxiliary data structures

3 Our Problem: Succinct Indexes Use of the concept in previous work Compact PAT trees: Clark & Munro 1996 Lower bounds: Demaine & López-Ortiz 2001; Miltersen 2005 Upper bounds: Sadakane & Grossi 2006 Definition of succinct indexes in data structure design ADT: primitive access operators Succinct index: more powerful operators

4 Succinct Integrated Encodings + Navigational Operations Auxiliary Data Structures X Main Data

5 Succinct Indexes + Navigational Operations Succinct IndexMain Data

6 Succinct Indexes vs. Integrated Encodings Maximizing the freedom of the encoding of the main data Allowing incremental design Supporting implicit data

7 Strings: Definitions Notation Alphabet: [σ]={1, 2, …, σ} String: S[1..n] Operations: string_access(x): S[x] string_rank( α, x): number of occurrences of α in S[1..x] string_select( α, r): position of the r th occurrence of α in S

8 Strings: An Example S = a a b a c c c d a d d a b b b c string_access(8) =d string_rank(a, 8) =3 string_select(b, 3) =14

9 Strings: Previous Results Succinct Integrated Encodings Wavelet trees: Grossi et al. 2003 Space: nH 0 + o(n)∙lg σ bits Time: O(lg σ) time for all three operations Golynski et al. 2006 Space: n (lg σ + o(lg σ)) bits Time: O(lglg σ) time for string_access and string_rank, O(1) time for string_select

10 Strings: Our Results Succinct Indexes ADT string_access: f(n, σ) time Space: n∙o(lg σ) bits Operations string_rank: O(lglg σ lglglg σ (f(n, σ)+lglg σ)) string_select: O(lglglg σ (f(n, σ)+lglg σ)) Other operations: negations

11 Binary Relations: Definitions Notation Binary relation: R ⊆ [n] x [σ] Number of objects: n; number of labels: σ Number of object-label pairs: t Operations object_access(x, r): r th label associated with x label_access(x, α ): whether x is associated with α label_rank( α, x): number of objects labeled α up to object x label_select( α, r): r th object labeled α

12 Binary Relations: An Example σ n object_access(1, 2) = label_access(2, 3) = label_rank(3, 4) = label_select(4, 3) = 4 false 3 5 0 1 0 1 0 0 0 0 1 0 1 0 1 1 0 1 1 0 0 1

13 Binary Relations: Previous Results Succinct Integrated Encodings Barbay et al., 2006 Space: t (lg σ + o(lg σ)) bits Time: O(lglg σ) time for object_access, label_rank and label_access, O(1) time for label_select

14 Binary Relations: Our Results Succinct Indexes ADT: object_access: f(n,σ,t) Space: t∙o(lg σ) bits Time: label_rank and label_access: O(lglg σ lglglg σ (f(n,σ,t) + lglg σ)) label_select: O(lglglg σ (f(n,σ,t) + lglg σ))

15 Multi-labeled Trees: Definitions Notation Number of nodes: n Number of labels: σ Number of node-label pairs: t Operations α -descendant α -child α -ancestor

16 Multi-labeled Trees: An Example 1 2 37 56 4 8 91011 {a, c, d} {c, d} {a} {a, c} {a, b}{b,d} {a, b}{b} {c}{c,d}{b,c,d} Node 2 is a c-ancestor of node 6 Node 6 is a b-descendant of node 2 Node 10 is a d-child of node 8

17 Multi-labeled Trees: Previous Results Labeled trees Geary et al. 2004 Ferragina et al. 2005 Barbay et al. 2006 Multi-labeled trees Barbay et al. 2006

18 3 Multi-labeled Trees: Our Approach Traversal Orders Preorder DFUDS order Ordinal Trees: DFUDS Benoit et al. 1999 & 2005 Jansson et al. 2007 2 Binary Relations Nodes in preorder & labels Nodes in DFUDS order & labels 1 2 7 56 4 8 91011 3 456 78

19 Multi-labeled Trees: Our Results Succinct Indexes ADT: node_label(x, r) Supporting α -child/descendant queries: t∙o(lg σ) bits Supporting α -child/descendant/ancestor queries: t∙(lg ρ + o(lg ρ) + o(lg σ))bits (ρ: recursivity) Supporting α -child/descendant/ancestor queries of node x after another node y

20 Applications Compressed Succinct Encodings Strings Space: nH k + o(nlg σ) bits Operations: string_access: O(1) String_rank: O((lglg σ) 2 lglglg σ) string_select: O(lglg σ lglglg σ) First high-order entropy-compressed encoding supporting rank/select efficiently Other Data Structures

21 Applications (Continued) High-order entropy-compressed text indexes for large alphabets Notations: n-text size, σ-alphabet size, m- pattern length, occ-number of occurrences Our results Space: n H k +o(n lg σ) bits Pattern searching: O(m lglg σ+occ lg 1+ε n lglg σ) Previous results: a lg σ factor instead of lglg σ or incompressible

22 Conclusions We showed the importance of succinct indexes in the design of succinct data structures by designing: Succinct representation of multi-labeled trees that supports efficient retrieval of ancestors / children / descendants by label First high-order entropy compressed representation of strings supporting rank/select High-order entropy compressed text indexes for large alphabets

23 Conclusions (Continued) The concept of succinct indexes is useful in designing succinct data structures … it maximizes the freedom of the encoding of the main data and leads to a rich choice of design tradeoffs.

24 Thank you!


Download ppt "Succinct Indexes for Strings, Binary Relations and Multi-labeled Trees Jérémy Barbay, Meng He, J. Ian Munro, University of Waterloo S. Srinivasa Rao, IT."

Similar presentations


Ads by Google