Presentation is loading. Please wait.

Presentation is loading. Please wait.

Efficient Algorithms for Locating the Length- Constrained Heaviest Segments, with Applications to Biomolecular Sequence Analysis Yaw-Ling Lin * Tao Jiang.

Similar presentations


Presentation on theme: "Efficient Algorithms for Locating the Length- Constrained Heaviest Segments, with Applications to Biomolecular Sequence Analysis Yaw-Ling Lin * Tao Jiang."— Presentation transcript:

1 Efficient Algorithms for Locating the Length- Constrained Heaviest Segments, with Applications to Biomolecular Sequence Analysis Yaw-Ling Lin * Tao Jiang Kun-Mao Chao * Dept CS & Info Mngmt, Providence Univ, Taiwan Dept CS & Engineering, UC Riverside, USA Dept CS & Info Engnr, Nat. Taiwan Univ, Taiwan

2 Yaw-Ling Lin, Providence, Taiwan2 Outline Introduction. Applications to Biomolecular Sequence Analysis. Maximum Sum Consecutive Subsequence. Maximum Average Consecutive Subsequence. Implementation and Preliminary Experiments Concluding Remarks

3 Yaw-Ling Lin, Providence, Taiwan3 Motivation: GC-rich Region

4 Yaw-Ling Lin, Providence, Taiwan4 Introduction Two fundamental algorithms in searching for interesting regions in sequences: Given a sequence of real numbers of length n and an upper bound U, find a consecutive subsequence of length at most U with the maximum sum --- an O(n)-time algorithm. Given a sequence of real numbers of length n and a lower bound L, find a consecutive subsequence of length at least L with the maximum average. --- an O(n log L)-time algorithm.

5 Yaw-Ling Lin, Providence, Taiwan5 Applications to Biomolecular Sequence Analysis (I) Locating GC-Rich Regions –Finding GC-rich regions: an important problem in gene recognition and comparative genomics. –CpG islands ( 200 ~ 1400 bp ) –[Huang’94]: O(n L)-time algorithm. Post-Processing Sequence Alignments –Comparative analysis of human and mouse DNA: useful in gene prediction in human genome. –Mosaic effect: bad inner sequence. –Normalized local alignment. –Post-processing local aligned subsequences

6 Yaw-Ling Lin, Providence, Taiwan6 Applications to Biomolecular Sequence Analysis (II) Annotating Multiple Sequence Alignments – [Stojanovic’99]: conserved regions in biomolecular sequences. –Numerical scores for columns of a multiple alignment; each column score shall be adjusted by subtracting an anchor value. Ungapped Local Alignments with Length Constraints –Computing the length-constrained segment of each diagonal in the matrix with the largest sum (or average) of scores. –Applications in motif identification.

7 Yaw-Ling Lin, Providence, Taiwan7 Maximum Sum Consecutive Subsequence is left-negative is not. is minimal left-negative partitioned.

8 Yaw-Ling Lin, Providence, Taiwan8 Minimal left-negative partition

9 Yaw-Ling Lin, Providence, Taiwan9 MLN-partition: linear time

10 Yaw-Ling Lin, Providence, Taiwan10 Max-Sum with LC

11 Yaw-Ling Lin, Providence, Taiwan11 Analysis of MSLC

12 Yaw-Ling Lin, Providence, Taiwan12 Max Average Subsequence is right-skew is not. is decreasing right- skew partitioned.

13 Yaw-Ling Lin, Providence, Taiwan13 Decreasing right-skiew partition

14 Yaw-Ling Lin, Providence, Taiwan14 DRS-partition: linear time

15 Yaw-Ling Lin, Providence, Taiwan15 Max-Avg-Seq with LC

16 Yaw-Ling Lin, Providence, Taiwan16 Locate good-partner

17 Yaw-Ling Lin, Providence, Taiwan17 Analysis of MaxAvgSeq

18 Yaw-Ling Lin, Providence, Taiwan18 Implementation and Preliminary Experiments

19 Yaw-Ling Lin, Providence, Taiwan19 Implementation and Preliminary Experiments

20 Yaw-Ling Lin, Providence, Taiwan20 Conclusion Find a max-sum subsequence of length at most U can be done in O(n)-time. Find a max-avg subsequence of length at least L can be done in O(n log L)-time.

21 Yaw-Ling Lin, Providence, Taiwan21 Recent Progress Lu (CMCT’2002): finding the max-avg subsequence of length at least L on binary (0,1) sequences.  O(n)-time. Goldwasser, Kao, Lu (WABI’2002): finding the max-avg subsequence of length at least L and at most U on real sequences.  O(n)-time Tools: finding CpG islands using MAVG (joint work with Huang, X., Jiang, T. and Chao, K.-M.) http://deepc2.zool.iastate.edu/aat/mavg/cgdoc.html http://deepc2.zool.iastate.edu/aat/mavg/cg.html

22 Goldwasser, Kao, Lu (WABI’2002)’s Linear-Time Algorithm

23 Yaw-Ling Lin, Providence, Taiwan23 A new important observation i < j < g(j) < g(i) implies density(i, g(i)) is no more than density(j, g(j)) ig(i) j g(j)

24 Yaw-Ling Lin, Providence, Taiwan24 ig(i) j g(j)g(j)

25 Yaw-Ling Lin, Providence, Taiwan25 Searching for all g(i) in linear time

26 Yaw-Ling Lin, Providence, Taiwan26 Some thoughts Attacking new problems with new ideas. Collaboration is important for bioinformatics –Communication –Work on what you are good at

27 Yaw-Ling Lin, Providence, Taiwan27 Future Research Best k (nonintersecting) subsequences? Normalized local alignment? Measurement of goodness?


Download ppt "Efficient Algorithms for Locating the Length- Constrained Heaviest Segments, with Applications to Biomolecular Sequence Analysis Yaw-Ling Lin * Tao Jiang."

Similar presentations


Ads by Google