Presentation is loading. Please wait.

Presentation is loading. Please wait.

Efficient Algorithms for Locating the Length- Constrained Heaviest Segments, with Applications to Biomolecular Sequence Analysis Yaw-Ling Lin * Tao Jiang.

Similar presentations


Presentation on theme: "Efficient Algorithms for Locating the Length- Constrained Heaviest Segments, with Applications to Biomolecular Sequence Analysis Yaw-Ling Lin * Tao Jiang."— Presentation transcript:

1 Efficient Algorithms for Locating the Length- Constrained Heaviest Segments, with Applications to Biomolecular Sequence Analysis Yaw-Ling Lin * Tao Jiang Kun-Mao Chao * Dept CS & Info Mngmt, Providence Univ, Taiwan Dept CS & Engineering, UC Riverside, USA Dept CS & Info Engnr, Nat. Taiwan Univ, Taiwan

2 Yaw-Ling Lin, Providence, Taiwan2 Outline Introduction. Applications to Biomolecular Sequence Analysis. Maximum Sum Consecutive Subsequence. Maximum Average Consecutive Subsequence. Implementation and Preliminary Experiments Concluding Remarks

3 Yaw-Ling Lin, Providence, Taiwan3 Introduction Two fundamental algorithms in searching for interesting regions in sequences: Given a sequence of real numbers of length n and an upper bound U, find a consecutive subsequence of length at most U with the maximum sum --- an O(n)-time algorithm. Given a sequence of real numbers of length n and a lower bound L, find a consecutive subsequence of length at least L with the maximum average. --- an O(n log L)-time algorithm.

4 Yaw-Ling Lin, Providence, Taiwan4 Applications to Biomolecular Sequence Analysis (I) Locating GC-Rich Regions –Finding GC-rich regions: an important problem in gene recognition and comparative genomics. –CpG islands ( 200 ~ 1400 bp ) –[Huang’94]: O(n L)-time algorithm. Post-Processing Sequence Alignments –Comparative analysis of human and mouse DNA: useful in gene prediction in human genome. –Mosaic effect: bad inner sequence. –Normalized local alignment. –Post-processing local aligned subsequences

5 Yaw-Ling Lin, Providence, Taiwan5 Applications to Biomolecular Sequence Analysis (II) Annotating Multiple Sequence Alignments – [Stojanovic’99]: conserved regions in biomolecular sequences. –Numerical scores for columns of a multiple alignment; each column score shall be adjusted by subtracting an anchor value. Ungapped Local Alignments with Length Constraints –Computing the length-constrained segment of each diagonal in the matrix with the largest sum (or average) of scores. –Applications in motif identification.

6 Yaw-Ling Lin, Providence, Taiwan6 Maximum Sum Consecutive Subsequence is left-negative is not. is minimal left-negative partitioned.

7 Yaw-Ling Lin, Providence, Taiwan7 Minimal left-negative partition

8 Yaw-Ling Lin, Providence, Taiwan8 MLN-partition: linear time

9 Yaw-Ling Lin, Providence, Taiwan9 Max-Sum with LC

10 Yaw-Ling Lin, Providence, Taiwan10 Analysis of MSLC

11 Yaw-Ling Lin, Providence, Taiwan11 Max Average Subsequence is right-skew is not. is decreasing right- skew partitioned.

12 Yaw-Ling Lin, Providence, Taiwan12 Decreasing right-skiew partition

13 Yaw-Ling Lin, Providence, Taiwan13 DRS-partition: linear time

14 Yaw-Ling Lin, Providence, Taiwan14 Max-Avg-Seq with LC

15 Yaw-Ling Lin, Providence, Taiwan15 Locate good-partner

16 Yaw-Ling Lin, Providence, Taiwan16 Analysis of MaxAvgSeq

17 Yaw-Ling Lin, Providence, Taiwan17 Implementation and Preliminary Experiments

18 Yaw-Ling Lin, Providence, Taiwan18 Implementation and Preliminary Experiments

19 Yaw-Ling Lin, Providence, Taiwan19 Conclusion Find a max-sum subsequence of length at most U can be done in O(n)-time. Find a max-avg subsequence of length at least L can be done in O(n log L)-time.

20 Yaw-Ling Lin, Providence, Taiwan20 Recent Progress Lu (CMCT’2002): finding the max-avg subsequence of length at least L on binary (0,1) sequences.  O(n)-time. Goldwasser, Kao, Lu (2002, manuscripts): finding the max-avg subsequence of length at least L and at most U on real sequences.  O(n)-time Tools: finding CpG islands using MAVG (joint work with Huang, X., Jiang, T. and Chao, K.-M.) http://deepc2.zool.iastate.edu/aat/mavg/cgdoc.html http://deepc2.zool.iastate.edu/aat/mavg/cg.html

21 Yaw-Ling Lin, Providence, Taiwan21 Future Research Best k (nonintersecting) subsequences? Normalized local alignment? Measurement of goodness?


Download ppt "Efficient Algorithms for Locating the Length- Constrained Heaviest Segments, with Applications to Biomolecular Sequence Analysis Yaw-Ling Lin * Tao Jiang."

Similar presentations


Ads by Google