Presentation is loading. Please wait.

Presentation is loading. Please wait.

On the R ange M aximum-Sum S egment Q uery Problem Kuan-Yu Chen and Kun-Mao Chao Department of Computer Science and Information Engineering, National Taiwan.

Similar presentations


Presentation on theme: "On the R ange M aximum-Sum S egment Q uery Problem Kuan-Yu Chen and Kun-Mao Chao Department of Computer Science and Information Engineering, National Taiwan."— Presentation transcript:

1 On the R ange M aximum-Sum S egment Q uery Problem Kuan-Yu Chen and Kun-Mao Chao Department of Computer Science and Information Engineering, National Taiwan University, Taiwan 2004/12

2 Outline Motivation Motivation Problems that raised from Bioinformatics applications Problems that raised from Bioinformatics applications Definition of our research problem (RMSQ) Definition of our research problem (RMSQ) Our main idea Our main idea Finding partners for each indices Finding partners for each indices Reduce the problem to the Range Minima Query problem (RMQ) Reduce the problem to the Range Minima Query problem (RMQ) Conclusions and applications Conclusions and applications Solving three relevant problems in O(n) time Solving three relevant problems in O(n) time

3 Applications to biomolecular sequence analysis Locating conserved regions or GC-rich regions Locating conserved regions or GC-rich regions Assign a real number (also called scores) to each residue Assign a real number (also called scores) to each residue Looking for the maximum-sum or maximum-average segments Looking for the maximum-sum or maximum-average segments With length constraints or average lower bound With length constraints or average lower bound

4 What is a Maximum-Sum Segment? Also called maximum-sum intervals or maximum scoring regions Also called maximum-sum intervals or maximum scoring regions Given a sequence of numbers, the maximum-sum segment is simply the continuous subsequence having the greatest total sum. Given a sequence of numbers, the maximum-sum segment is simply the continuous subsequence having the greatest total sum. zero prefix/suffix sum is not allowed Total sum = 8

5 Finding the maximum-sum segment with length constraints Lin, Jiang, and Chao [JCSS 2002] and Fan et al. [CIAA 2003] gave the O(n)-time algorithm for this problem, respectively. Lin, Jiang, and Chao [JCSS 2002] and Fan et al. [CIAA 2003] gave the O(n)-time algorithm for this problem, respectively. Length at least L, at most U Length at least L, at most U L U

6 Finding all maximal-sum segments Ruzzo and Tompa [ISMB 1999] gave a O(n) time algorithm for this problem. Ruzzo and Tompa [ISMB 1999] gave a O(n) time algorithm for this problem. Recursive calls. Recursive calls. S RL

7 Finding the longest segment with average constraints Wang and Xu [Bioinformatics 2003] gave a linear time algorithm Wang and Xu [Bioinformatics 2003] gave a linear time algorithm

8 Our results We propose an algorithm that runs in O(n) preprocessing time and O(1) query time We propose an algorithm that runs in O(n) preprocessing time and O(1) query time We use the RMSQ techniques we developed to solve the three problems mentioned above in O(n) time We use the RMSQ techniques we developed to solve the three problems mentioned above in O(n) time

9 Problem Definition R ange M aximum-Sum S egment Q uery problem R ange M aximum-Sum S egment Q uery problem The input is a sequence of real numbers which is to be preprocessed. A query is comprised of two intervals [i, j] and [k, l], our goal is to return the maximum-sum segment whose starting index lies in [i, j] and ending index lies in [k, l]. The input is a sequence of real numbers which is to be preprocessed. A query is comprised of two intervals [i, j] and [k, l], our goal is to return the maximum-sum segment whose starting index lies in [i, j] and ending index lies in [k, l].

10 A Nonoverlapping Example Input Sequence: Input Sequence: 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -5, 3 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -5, 3 Total sum = 6 Startin g region End region

11 An Overlapping Example Input Sequence: Input Sequence: 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -5, 3 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -5, 3 Total sum = 8 Startin g region End region

12 Main Idea Reduce to the RMQ problem Reduce to the RMQ problem Theorem. If there is a -time solution for the RMQ problem, then there is a -time solution for the RMSQ problem. Theorem. If there is a -time solution for the RMQ problem, then there is a -time solution for the RMSQ problem. RMSQ RMQ O(n) O(1)

13 A relevant problem - RMQ Range Minima Query Problem (also called Discrete Range Searching) Range Minima Query Problem (also called Discrete Range Searching)

14 Cumulative sum

15 Case 1: Nonoverlapping sum(i, j ) = prefix-sum(j) – prefix-sum(i-1) 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -5, 3 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -5, 3 Find a lowest point here Find a highest point here Can be reduced to the RMQ problem Maximize Minimize

16 Case 2: Overlapping Some problems occur in the overlapping case: Some problems occur in the overlapping case: 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -5, 3 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -5, 3 Find a lowest point here Find a highest point here Negative Sum !!

17 Case 2: Overlapping Divide into 3 possible cases: Divide into 3 possible cases: 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -5, 3 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -5, 3 Find a lowest point here Find a highest point here Find a lowest point here Find a highest point here

18 A special case of RMSQ: single range query Input Sequence: Input Sequence: 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -5, 3 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -5, 3 Challenge: Can this special case be reduced to the RMQ problem? Challenge: Can this special case be reduced to the RMQ problem? Total sum = 6

19 Idea Step 1. Find a partner for each index. Step 1. Find a partner for each index. Step 2. Record the sum of each pair in an array Step 2. Record the sum of each pair in an array Step 3. Reduce to the RMQ problem -- retrieve the maximum-sum pair within the querying interval Step 3. Reduce to the RMQ problem -- retrieve the maximum-sum pair within the querying interval

20 Our First Attempt (1) Step 1: For each index i, we define the lowest point preceding i as its partner Step 1: For each index i, we define the lowest point preceding i as its partner i partner(i)

21 Our First Attempt (2) Step 2: Record sum(i, partner(i)) in an array Step 2: Record sum(i, partner(i)) in an array i partner(i) sum(i, partner(i))

22 Our First Attempt (3) Step 3: Apply the RMQ techniques to an array Step 3: Apply the RMQ techniques to an array i partner(i) sum(i, partner(i)) Retrieve the maximum-sum pair

23 Faults What if its partner go beyond the querying interval? What if its partner go beyond the querying interval? i partner(i) sum(i, partner(i)) Needs to be updated Worst case

24 A Better Partner

25 Nesting Property Input Sequence: Input Sequence: 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -5, 3 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -5, 3 Update can be done in O(1) time 9,-10, 4,-2, 6,-5, 4,-3,8, -11, 8,-3, 9,-5, 3 Apply RMQ techniques

26 Use RMSQ Techniques to Solve the Other two relevant problems 1. Finding the Maximum-Sum Segment with length constraints in O(n) time. 1. Finding the Maximum-Sum Segment with length constraints in O(n) time. - Y.-L. Lin, T. Jiang, K.-M. Chao, 2002 - T.-H Fan, S. Lee, H.-I. Lu, T.-S. Tsou, 2003 2. Finding all maximal scoring subsequences in O(n) time. 2. Finding all maximal scoring subsequences in O(n) time. - W. L. Ruzzo & M. Tompa, 1999

27 Maximum-Sum Segment with length constraints Length at least L, at most U Length at least L, at most U L U Runs in O(n) time since each query costs O(1) time

28 All Maximal Scoring Subsequences Recursive calls. Recursive calls. S RL Runs in O(n) time since each query costs O(1) time

29 The End Thank You. Thank You.


Download ppt "On the R ange M aximum-Sum S egment Q uery Problem Kuan-Yu Chen and Kun-Mao Chao Department of Computer Science and Information Engineering, National Taiwan."

Similar presentations


Ads by Google