Improvements on the Range-Minimum-Query- Problem Johannes Fischer Volker Heun Universität München, Institut für Informatik
Introduction
RMQA(l,r) = argminl≤i≤r A[i] Introduction given: array A of size n Task: preprocess A such that RMQA(l,r) = argminl≤i≤r A[i] can be answered efficiently l l r r 1 2 3 4 5 6 7 8 9 10 11 A = 1 2 3 4 5 6 7 8 9 10 11 A = 1 2 3 4 5 6 7 8 9 10 11 A = min ⇒ return 1 min ⇒ return 5 Break ties to the left
Applications Lowest Common Ancestors (LCA) A = I D B E J F K C G H A = B C C 1 D E F G G H 2 I J J K 3 A = I D B E J F K C G H A = I D B E J F K C G H H = 3 2 1 H = 3 2 1
Applications Longest common extensions of strings (LCE) abba x abba z i j RMQs on the LCP-table of suffix array Other applications Document Retrieval (Muthukrishnan SODA’02) Suffix links in ESA (Abouldhoda et al. WABI’02) Maximum-Sum Queries (Chen/Chao ISAAC‘04) … ⇒ basic ingredient! for suffixes ti..n and ti‘..n return max{k : ti..k = ti‘..k}
Previous Results for RMQ Berkman/Vishkin FOCS‘89: Preprocessing O(n) Query time O(1) Rediscovered & simplified by Bender/Farach-Colton (LATIN’00) Reduction Chain: RMQ ➾ LCA ➾ ±1RMQ 4-Russians Trick Euler Tour 4-Russians Trick Cartesian Tree cf. suffix array vs. suffix tree text ➾ suffix tree ➾ suffix array
Cartesian Tree Cartesian Tree for A[1,n]: Root: minimal element of A[1,n] at pos i Left Child: Cartesian Tree for A[1,i-1] Right Child: Cartesian Tree for A[i+1,n] 1 2 3 4 5 6 7 8 9 10 11 A = 5 O(n2) 1 9 3 7 11 2 4 6 8 10
The New Algorithm
Overview Divide A into blocks B1,…,Bn/s of size s = log(n)/4 Answer queries seperately Long queries than span several blocks (O(1)) Short in-block-queries (O(1)) return position where overall minimum occurs (O(1))
Answering Long Queries (B/F-C’00) Bi Bn/s M[i,0] M[i,1] M[i,2] M[i,3] Precompute all RMQs that span 2k blocks M[i][k] = position of min in Bi,…,Bi+2^k-1 Filled in optimal time with Dyn. Prog. Query: select 2 blocks covering interval Size of M: n/s · log(n/s) =O(n/logn·log(n/logn)) =O(n)
Answering In-block-queries n/s·s2 =O(n logn) Answering In-block-queries Computing the in-block-queries for all n/s occurring blocks is too much Really necessary? 3 4 2 8 11 10 -5 1 -4 4 4 1 6 1 6 3 5 7 3 5 7 2 2 Fact: B and B‘ have the same answers to all RMQs iff they have the same Cartesian Tree.
Answering In-block-queries Number of unlabelled bin. trees with n nodes: n’th Catalan number Cn Cn=O(4n/n3/2) Theorem: We can store answers to all in-block-queries in space O(n) Proof: O(4s/s3/2)·s2 = O(22s·s1/2) = O(2log(n)/2·log1/2n) = O(n1/2·log1/2n)
Answering In-block-queries One problem remains: For each block Bi we need to know its type in time O(s) Type: bijection t from arrays of size s to {0,…,Cs-1} with t(B)= t(B’) iff B and B’ have same Cartesian Tree build Cartesian Tree for each block Bj give tree a number 0 ≤ t(Bj) < Cs
O(n)-Algo for Cartesian Tree Let Ti be the Cartesian Tree for B[1,i] Ti obtained from Ti-1 as follows: B[x] ≤ B[i] x x > B[i] i y ⇒ y
Computing the block type Don’t have to calculate tree! just keep “rightmost path” p on stack compute sequence of numbers l1,…,ls: li=# nodes deleted from p in step i l1,…,ls satisfies “prefix property” 0 ≤ ∑1≤k≤i lk<i ...because one cannot delete more elements than have been inserted… … and each element is removed from p at most once!
Computing the block type l1,…,ls with ∑1≤k≤i lk<i corresponds to path from to in s s 0 0 In step i: Go up li cells, go one to the left 0 0 0 1 0 2 0 3 1 1 1 2 1 3 2 2 2 3 3 3 Cn,n= Cn # paths from to given by Cp,q= Cp-1,q + Cp,q-1 (“ballot numbers”) p q 0 0
Computing the block type q Paths with greater numbers than path q: at some point above q ⇒ add # paths from current cell before going upwards
Computing the block type Precompute ballot numbers up to s=logn/4. For all blocks Bj: let S be an empty stack, push(S,-∞) q ← s, N ← 0 for i ← 1,…, s while top(S)>Bj[i] pop(S) N ← N + C(s-i) q q ← q - 1 push(S, Bj[i]) return N
Summary and Outlook Direct construction algorithm for RMQ no dynamic data structures never uses more space than in the end not the first… see Alstrup et al. SPAA’02 Our method can be augmented with techniques from Sadakane SODA’02 to give a succinct data structure (2n+o(n) bits) with direct construction algorithm
Any Questions?