Efficient Algorithms for Locating the Length- Constrained Heaviest Segments, with Applications to Biomolecular Sequence Analysis Yaw-Ling Lin * Tao Jiang.

Slides:



Advertisements
Similar presentations
Minimum Clique Partition Problem with Constrained Weight for Interval Graphs Jianping Li Department of Mathematics Yunnan University Jointed by M.X. Chen.
Advertisements

Hidden Markov Model in Biological Sequence Analysis – Part 2
Greedy Algorithms CS 6030 by Savitha Parur Venkitachalam.
Ulf Schmitz, Statistical methods for aiding alignment1 Bioinformatics Statistical methods for pattern searching Ulf Schmitz
Finding a Length-Constrained Maximum-Density Path in a Tree Rung-Ren Lin, Wen-Hsiung Kuo, and Kun-Mao Chao.
Heaviest Segments in a Number Sequence Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University, Taiwan.
Efficient Algorithms for Locating Maximum Average Consecutive Substrings Jie Zheng Department of Computer Science UC, Riverside.
Profiles for Sequences
CS2420: Lecture 19 Vladimir Kulyukin Computer Science Department Utah State University.
Global alignment algorithm CS 6890 Zheng Lu. Introduction Global alignments find the best match over the total length of both sequences. We do global.
Space Efficient Alignment Algorithms and Affine Gap Penalties
Applying haplotype models to association study design Natalie Castellana June 7, 2005.
Identification of a Novel cis-Regulatory Element Involved in the Heat Shock Response in Caenorhabditis elegans Using Microarray Gene Expression and Computational.
Efficient Algorithms for Locating the Length- Constrained Heaviest Segments, with Applications to Biomolecular Sequence Analysis Yaw-Ling Lin Tao Jiang.
An efficient algorithm for optimizing whole genome alignment with noise P. Wong, T. Lam, N. Lu, H. Ting, and S. Yiu Department of Computer Science, University.
Regular Expression Constrained Sequence Alignment Abdullah N. Arslan Assistant Professor Computer Science Department.
Minimum Back-Walk-Free Latency Problem Yaw-Ling Lin Dept Computer Sci. & Info. Management, Providence University, Taichung, Taiwan.
CS262 Lecture 9, Win07, Batzoglou Real-world protein aligners MUSCLE  High throughput  One of the best in accuracy ProbCons  High accuracy  Reasonable.
Finding Bit Patterns Applying haplotype models to association study design Natalie Castellana Kedar Dhamdhere Russell Schwartz August 16, 2005.
Efficient Algorithms for Locating the Length- Constrained Heaviest Segments, with Applications to Biomolecular Sequence Analysis Yaw-Ling Lin * Tao Jiang.
Phylogenetic Shadowing Daniel L. Ong. March 9, 2005RUGS, UC Berkeley2 Abstract The human genome contains about 3 billion base pairs! Algorithms to analyze.
Two Component Systems Sequence Characteristics Identification in Bacterial Genome Yaw-Ling Lin Dept Computer Sci. & Info. Management, Providence University,
Blast heuristics Morten Nielsen Department of Systems Biology, DTU.
Space Efficient Alignment Algorithms Dr. Nancy Warter-Perez.
Sequence comparison: Score matrices Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas
Sequence comparison: Local alignment
Sequence Alignment.
Dynamic-Programming Strategies for Analyzing Biomolecular Sequences Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National.
Energy-Aware Scheduling with Quality of Surveillance Guarantee in Wireless Sensor Networks Jaehoon Jeong, Sarah Sharafkandi and David H.C. Du Dept. of.
Descendent Subtrees Comparison of Phylogenetic Trees with Applications to Co-evolutionary Classifications in Bacterial Genome Yaw-Ling Lin 1 Tsan-Sheng.
Dynamic Programming Method for Analyzing Biomolecular Sequences Tao Jiang Department of Computer Science University of California - Riverside (Typeset.
Identification of Regulatory Binding Sites Using Minimum Spanning Trees Pacific Symposium on Biocomputing, pp , 2003 Reporter: Chu-Ting Tseng Advisor:
National Taiwan University Department of Computer Science and Information Engineering Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Dynamic-Programming Strategies for Analyzing Biomolecular Sequences Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National.
Lecture 6. Pairwise Local Alignment and Database Search Csc 487/687 Computing for bioinformatics.
The Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization Jia Wang, Shiyan Hu Department of Electrical and Computer Engineering.
Construction of Substitution Matrices
The Group Lasso for Logistic Regression Lukas Meier, Sara van de Geer and Peter Bühlmann Presenter: Lu Ren ECE Dept., Duke University Sept. 19, 2008.
Multiple Sequence Alignment Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University, Taiwan WWW:
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Chapter 18: Searching and Sorting Algorithms. Objectives In this chapter, you will: Learn the various search algorithms Implement sequential and binary.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Synthetic Sequence Design for Signal Location Yaw-Ling Lin ( 林 耀 鈴 ) Dept Computer Sci and Info Engineering College of Computing and Informatics Providence.
Heaviest Segments in a Number Sequence Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University, Taiwan.
Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.
Space Efficient Alignment Algorithms and Affine Gap Penalties Dr. Nancy Warter-Perez.
Construction of Substitution matrices
Doug Raiford Phage class: introduction to sequence databases.
Efficient Algorithms for SNP Haplotype Block Selection Problems Yaw-Ling Lin ( 林耀鈴 ) Dept Computer Sci and Info Engineering College of Computing and Informatics.
For Solving Hierarchical Decomposable Functions Dept. of Computer Engineering, Chulalongkorn Univ., Bangkok, Thailand Simultaneity Matrix Assoc. Prof.
Heuristic Alignment Algorithms Hongchao Li Jan
Never-ending stories Kun-Mao Chao ( 趙坤茂 ) Dept. of Computer Science and Information Engineering National Taiwan University, Taiwan
Space-Saving Strategies for Analyzing Biomolecular Sequences Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan.
HW7: Evolutionarily conserved segments ENCODE region 009 (beta-globin locus) Multiple alignment of human, dog, and mouse 2 states: neutral (fast-evolving),
National Taiwan University Department of Computer Science and Information Engineering An Approximation Algorithm for Haplotype Inference by Maximum Parsimony.
On the R ange M aximum-Sum S egment Q uery Problem Kuan-Yu Chen and Kun-Mao Chao Department of Computer Science and Information Engineering, National Taiwan.
Chapter 7: Array.
Sequence comparison: Local alignment
Learning Sequence Motif Models Using Expectation Maximization (EM)
Dynamic-Programming Strategies for Analyzing Biomolecular Sequences
Jianping Fan Dept of CS UNC-Charlotte
SMA5422: Special Topics in Biotechnology
Heaviest Segments in a Number Sequence
On the Range Maximum-Sum Segment Query Problem
Multiple Sequence Alignment (I)
Space-Saving Strategies for Analyzing Biomolecular Sequences
Space-Saving Strategies for Analyzing Biomolecular Sequences
Projects….
Presentation transcript:

Efficient Algorithms for Locating the Length- Constrained Heaviest Segments, with Applications to Biomolecular Sequence Analysis Yaw-Ling Lin * Tao Jiang Kun-Mao Chao * Dept CS & Info Mngmt, Providence Univ, Taiwan Dept CS & Engineering, UC Riverside, USA Dept CS & Info Engnr, Nat. Taiwan Univ, Taiwan

Yaw-Ling Lin, Providence, Taiwan2 Outline Introduction. Applications to Biomolecular Sequence Analysis. Maximum Sum Consecutive Subsequence. Maximum Average Consecutive Subsequence. Implementation and Preliminary Experiments Concluding Remarks

Yaw-Ling Lin, Providence, Taiwan3 Introduction Two fundamental algorithms in searching for interesting regions in sequences: Given a sequence of real numbers of length n and an upper bound U, find a consecutive subsequence of length at most U with the maximum sum --- an O(n)-time algorithm. Given a sequence of real numbers of length n and a lower bound L, find a consecutive subsequence of length at least L with the maximum average. --- an O(n log L)-time algorithm.

Yaw-Ling Lin, Providence, Taiwan4 Applications to Biomolecular Sequence Analysis (I) Locating GC-Rich Regions –Finding GC-rich regions: an important problem in gene recognition and comparative genomics. –CpG islands ( 200 ~ 1400 bp ) –[Huang’94]: O(n L)-time algorithm. Post-Processing Sequence Alignments –Comparative analysis of human and mouse DNA: useful in gene prediction in human genome. –Mosaic effect: bad inner sequence. –Normalized local alignment. –Post-processing local aligned subsequences

Yaw-Ling Lin, Providence, Taiwan5 Applications to Biomolecular Sequence Analysis (II) Annotating Multiple Sequence Alignments – [Stojanovic’99]: conserved regions in biomolecular sequences. –Numerical scores for columns of a multiple alignment; each column score shall be adjusted by subtracting an anchor value. Ungapped Local Alignments with Length Constraints –Computing the length-constrained segment of each diagonal in the matrix with the largest sum (or average) of scores. –Applications in motif identification.

Yaw-Ling Lin, Providence, Taiwan6 Maximum Sum Consecutive Subsequence is left-negative is not. is minimal left-negative partitioned.

Yaw-Ling Lin, Providence, Taiwan7 Minimal left-negative partition

Yaw-Ling Lin, Providence, Taiwan8 MLN-partition: linear time

Yaw-Ling Lin, Providence, Taiwan9 Max-Sum with LC

Yaw-Ling Lin, Providence, Taiwan10 Analysis of MSLC

Yaw-Ling Lin, Providence, Taiwan11 Max Average Subsequence is right-skew is not. is decreasing right- skew partitioned.

Yaw-Ling Lin, Providence, Taiwan12 Decreasing right-skiew partition

Yaw-Ling Lin, Providence, Taiwan13 DRS-partition: linear time

Yaw-Ling Lin, Providence, Taiwan14 Max-Avg-Seq with LC

Yaw-Ling Lin, Providence, Taiwan15 Locate good-partner

Yaw-Ling Lin, Providence, Taiwan16 Analysis of MaxAvgSeq

Yaw-Ling Lin, Providence, Taiwan17 Implementation and Preliminary Experiments

Yaw-Ling Lin, Providence, Taiwan18 Implementation and Preliminary Experiments

Yaw-Ling Lin, Providence, Taiwan19 Conclusion Find a max-sum subsequence of length at most U can be done in O(n)-time. Find a max-avg subsequence of length at least L can be done in O(n log L)-time.

Yaw-Ling Lin, Providence, Taiwan20 Recent Progress Lu (CMCT’2002): finding the max-avg subsequence of length at least L on binary (0,1) sequences.  O(n)-time. Goldwasser, Kao, Lu (2002, manuscripts): finding the max-avg subsequence of length at least L and at most U on real sequences.  O(n)-time Tools: finding CpG islands using MAVG (joint work with Huang, X., Jiang, T. and Chao, K.-M.)

Yaw-Ling Lin, Providence, Taiwan21 Future Research Best k (nonintersecting) subsequences? Normalized local alignment? Measurement of goodness?