Presentation is loading. Please wait.

Presentation is loading. Please wait.

On the Range Maximum-Sum Segment Query Problem

Similar presentations


Presentation on theme: "On the Range Maximum-Sum Segment Query Problem"— Presentation transcript:

1 On the Range Maximum-Sum Segment Query Problem
Kuan-Yu Chen and Kun-Mao Chao Department of Computer Science and Information Engineering, National Taiwan University, Taiwan 2019/2/21 Chen and Chao

2 An example – Locating GC-rich regions (1)
One reasonable scoring expression to measure the richness of a region is x-p×l , where x is the C+G count of the region, l is the length of the region, and p is a positive ratio constant. The goal is to design an algorithm to report the region that maximizes the expression x-p×l 2019/2/21 Chen and Chao

3 An example – Locating GC-rich regions (2)
Let x be the C+G count of the region, and y be the A+T count of the region Hence, we have x-p×l = x-p×(x+y) = (1-p)×x - p×y Therefore, to calculate the value of x-p×l, one can assign w(G)= w(C)=1-p w(A)=w(T)=-p 2019/2/21 Chen and Chao

4 The Maximum-Sum Segment
Also called the maximum-sum interval or the maximum-scoring region Given a sequence of numbers, the maximum-sum segment is simply the contiguous subsequence having the greatest total sum. <5, -5.1, 1, 3, -4, 2, 3, -4, 7> With greatest total sum = 8 Zero prefix-/suffix-sums are possible. 2019/2/21 Chen and Chao

5 A Relevant Problem - RMQ
Range Minima (Maxima) Query Problem (also called Discrete Range Searching) Given a sequence of numbers, by preprocessing the sequence we wish to retrieve the minimum (maximum) value within a given querying interval efficiently <5, -5.1, 1, 3, -4, 2, 3, -4, 7> Minimum Maximum 2019/2/21 Chen and Chao

6 Range Maximum-Sum Segment Query Problem
Definition: The input is a sequence <a1,a2, …… an> of real numbers which is to be preprocessed. A query is comprised of two intervals S and E. Our goal is to return the maximum-sum segment whose starting index lies in S and end index lies in E. 2019/2/21 Chen and Chao

7 A Nonoverlapping Example
Input Sequence: 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -5, 3 Total sum = 6 Starting region End region 2019/2/21 Chen and Chao

8 An Overlapping Example
Input Sequence: 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -5, 3 Total sum = 8 Starting region End region 2019/2/21 Chen and Chao

9 Our Results We propose an algorithm that runs in O(n) preprocessing time and O(1) query time under the unit-cost RAM model. In fact, we show that RMSQ and RMQ are computationally linearly equivalent. We show that the RMSQ techniques yield alternative O(n) time algorithms for the following problems: The maximum-sum segment with length constraints All maximal-sum segments 2019/2/21 Chen and Chao

10 Strategy Reduce the RMSQ to the RMQ problem
Theorem. If there is a <f(n), g(n)>-time solution for the RMQ problem, then there is a <f(n)+O(n), g(n)+O(1)>-time solution for the RMSQ problem. O(n) RMSQ RMQ O(1) 2019/2/21 Chen and Chao

11 Computing sum(i,j) in O(1) time
prefix-sum(i) = a1+a2+…+ai all n prefix sums are computable in O(n) time. sum(i, j) = prefix-sum(j) – prefix-sum(i-1) i j prefix-sum(j) prefix-sum(i-1) 2019/2/21 Chen and Chao

12 Find the highest point here Find the lowest point here
Case 1: Nonoverlapping Maximize Maximize Minimize sum(i, j) = prefix-sum(j) – prefix-sum(i-1) Prefix-sum sequence: 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -5, 3 Range Minima Query Find the highest point here Find the lowest point here 2019/2/21 Chen and Chao

13 Find the highest point here Find the lowest point here
Case 2: Overlapping Some problems may occur Prefix-sum sequence 9, -10, 4, -2, 5, -5, 4, -3, 6, -11, 8, -3, 4, -5, 3 Negative Sum !! Find the highest point here Find the lowest point here 2019/2/21 Chen and Chao

14 Case 2: Overlapping Divide into 3 possible cases: Prefix-sum sequence:
9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -5, 3 Range Minima Query Preprocessing time = f(n) Query time = g(n) Range Minima Query Preprocessing time = f(n) Query time = g(n) Find the highest point here Find the highest point here What should we do? Find the lowest point here Find the lowest point here 2019/2/21 Chen and Chao

15 Dealing with the Special Case: Single Range Query
Input Sequence: 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -5, 3 Challenge: Can this special case be reduced to the RMQ problem? Total sum = 6 2019/2/21 Chen and Chao

16 Reduction Procedure Step 1. Find a partner for each index.
Step 2. Record the sum of each pair in an array Step 3. Retrieve the maximum-sum pair by applying the RMQ techniques 2019/2/21 Chen and Chao

17 Find a partner within this region
Our First Attempt (1) Step 1: For each index i, we define the lowest point preceding i as its partner Prefix-sum sequence: i Lowest point Find a partner within this region 2019/2/21 Chen and Chao

18 Our First Attempt (2) Step 2: Record sum(partner(i), i) in an array i
Lowest point sum(partner(i), i) 2019/2/21 Chen and Chao

19 Applying RMQ to this sequence The maximum-sum pair can be retrieved
Our First Attempt (3) Step 3: Apply the RMQ techniques to the array i Applying RMQ to this sequence Querying this interval The maximum-sum pair can be retrieved Lowest point sum(partner(i), i) 2019/2/21 Chen and Chao

20 Bump into Difficulties
What if its partners go beyond the querying interval? i We might have to update every pair! Needs to be updated partner(i) sum(partner(i), i) 2019/2/21 Chen and Chao

21 Find the nearest point at least as large as prefix-sum(i)
A Better Partner How? Prefix-sum sequence Find the nearest point at least as large as prefix-sum(i) i Left_bound(i) Find the lowest point New partner(i) 2019/2/21 Chen and Chao

22 Why Is It Better? (1) It remains the best choice.
It saves lots of update steps. It turns out that zero or one point needs to be updated. 2019/2/21 Chen and Chao

23 Why Is It Better? (2) -- Remains the Best
Find the nearest point at least as large as prefix-sum(i) i Left_bound(i) Find the lowest point partner(i) 2019/2/21 Impossible region Chen and Chao

24 Why Is It Better? (3) -- Minimal-Maximal Property
Height(partner(i))< Height(j) < Height(i), for all partner(i)< j< i Next higher point Maximal point Minimal point i partner(i) No one higher than i No one lower than partner(i) 2019/2/21 Chen and Chao

25 Why Is It Better? (4) -- Save Some Updates
Prefix-sum sequence Next higher point Can not be the right end of the maximum-sum segment Querying interval i partner(i) No one higher than i 2019/2/21 Chen and Chao

26 Why Is It Better? (5) -- Nesting Property
For two indices i < j, it cannot be the case that partner(i)<partner(j) ≦i<j Maximal point i j Minimal point Minimal point Maximal point partner(j) partner(i) 2019/2/21 Chen and Chao

27 Why Is It Better? (6) -- An example
No overlapping is allowed 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -5, 3 Nesting Property 2019/2/21 Chen and Chao

28 When a Query Comes -- Case 1: No Exceeding
The maximum pair (partner(i), i) lies in the querying interval Retrieve the maximum pair Querying interval i partner(i) We are done. Output (partner(i), i). 2019/2/21 Chen and Chao

29 When a Query Comes -- Case 2: Exceeding
The maximum pair (partner(i), i) goes beyond the querying interval Retrieve the maximum pair Retrieve the maximum pair Querying interval j i Maximal Minimal partner(i) Update partner(i) partner(j) (Partner(i), i) is the maximum pair. Compare (new_partner(i), i) and (partner(j), j) Can not be the right end of the maximum-sum segment. Nesting property 2019/2/21 Chen and Chao

30 Time Complexity RMSQ can be reduced to the RMQ problem in O(n) time
Since under the unit-cost RAM model, there is a <O(n), O(1)>-time solution for the RMQ problem, there is a <O(n), O(1)>-time solution for the RMSQ problem. Preprocessing: O(n) RMQ RMSQ Query: O(1) 2019/2/21 Chen and Chao

31 RMQ  RMSQ On the other hand, RMQ can be reduced to the RMSQ problem in O(n) time, too. (Range Maxima Query: For each two adjacent elements, we augment a negative number whose absolute value is larger than them or simply a negative number larger than the maximum number of the sequence.) RMQ Instance: RMSQ Instance: or RMSQ Instance: 2019/2/21 Chen and Chao

32 Use RMSQ Techniques to Solve Two Relevant Problems
1. Finding the Maximum-Sum Segment with length constraints in O(n) time. - Y.-L. Lin, T. Jiang, K.-M. Chao, 2002 - T.-H Fan et al., 2003 2. Finding all maximal scoring subsequences in O(n) time. - W. L. Ruzzo & M. Tompa, 1999 2019/2/21 Chen and Chao

33 Problem 1:The Maximum-Sum Segment with Length Constraints
Lin, Jiang, and Chao [JCSS 2002] and Fan et al. [CIAA 2003] gave O(n)-time algorithms for this problem. Length at least L, and at most U L U 2019/2/21 Chen and Chao

34 Problem 1: Finding the Maximum-Sum Segment with Length Constraints
Length at least L, at most U For each index i, find the maximum-sum segment whose starting point lies in [i-U+1, i-L+1] and end point is i i RMSQ query L U Runs in O(n) time since each query costs O(1) time 2019/2/21 Chen and Chao

35 Problem 2: All Maximal-Sum Segments
Ruzzo and Tompa [ISMB 1999] gave a O(n)-time algorithm for this problem. Recursive definition. L(S) R(S) S 2019/2/21 Chen and Chao

36 Problem 2: Finding All Maximal Scoring Subsequences
Recursive calls. Input sequence: L(S) R(S) S RMSQ query Runs in O(n) time since each query costs O(1) time 2019/2/21 Chen and Chao


Download ppt "On the Range Maximum-Sum Segment Query Problem"

Similar presentations


Ads by Google