Regular Expression Constrained Sequence Alignment Abdullah N. Arslan Assistant Professor Computer Science Department.

Slides:



Advertisements
Similar presentations
Parallel BioInformatics Sathish Vadhiyar. Parallel Bioinformatics  Many large scale applications in bioinformatics – sequence search, alignment, construction.
Advertisements

Longest Common Subsequence
Chapter 7 Dynamic Programming.
Rapid Global Alignments How to align genomic sequences in (more or less) linear time.
Sabegh Singh Virdi ASC Processor Group Computer Science Department
Definitions Optimal alignment - one that exhibits the most correspondences. It is the alignment with the highest score. May or may not be biologically.
March 2006Vineet Bafna Designing Spaced Seeds March 2006Vineet Bafna Project/Exam deadlines May 2 – Send to me with a title of your project May.
6/11/2015 © Bud Mishra, 2001 L7-1 Lecture #7: Local Alignment Computational Biology Lecture #7: Local Alignment Bud Mishra Professor of Computer Science.
§ 8 Dynamic Programming Fibonacci sequence
Sequence Alignment Storing, retrieving and comparing DNA sequences in Databases. Comparing two or more sequences for similarities. Searching databases.
Longest Common Subsequence (LCS) Dr. Nancy Warter-Perez.
CPM '05 Sensitivity Analysis for Ungapped Markov Models of Evolution David Fernández-Baca Department of Computer Science Iowa State University (Joint work.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2005.
Introduction to Bioinformatics Algorithms Sequence Alignment.
Fa05CSE 182 CSE182-L5: Position specific scoring matrices Regular Expression Matching Protein Domains.
DNA Alignment. Dynamic Programming R. Bellman ~ 1950.
A Simple Algorithm for the Constrained Sequence Problems Francis Y.L. Chin, Alfredo De Santis, Anna Lisa Ferrara, N.L. Ho and S.K. Kim Information Processing.
Longest Common Subsequence (LCS) Dr. Nancy Warter-Perez June 22, 2005.
Sequence Alignment Bioinformatics. Sequence Comparison Problem: Given two sequences S & T, are S and T similar? Need to establish some notion of similarity.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2004.
Algorithms for Regulatory Motif Discovery Xiaohui Xie University of California, Irvine.
Dynamic Programming1. 2 Outline and Reading Matrix Chain-Product (§5.3.1) The General Technique (§5.3.2) 0-1 Knapsack Problem (§5.3.3)
7 -1 Chapter 7 Dynamic Programming Fibonacci Sequence Fibonacci sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, … F i = i if i  1 F i = F i-1 + F i-2 if.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 20, 2003.
Developing Sequence Alignment Algorithms in C++ Dr. Nancy Warter-Perez May 21, 2002.
FA05CSE182 CSE 182-L2:Blast & variants I Dynamic Programming
Finding the optimal pairwise alignment We are interested in finding the alignment of two sequences that maximizes the similarity score given an arbitrary.
Introduction to Bioinformatics Algorithms Sequence Alignment.
Bioinformatics Workshop, Fall 2003 Algorithms in Bioinformatics Lawrence D’Antonio Ramapo College of New Jersey.
Sequence similarity. Motivation Same gene, or similar gene Suffix of A similar to prefix of B? Suffix of A similar to prefix of B..Z? Longest similar.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 10, 2005.
Phylogenetic Tree Construction and Related Problems Bioinformatics.
Incorporating Bioinformatics in an Algorithms Course Lawrence D’Antonio Ramapo College of New Jersey.
FA05CSE182 CSE 182-L2:Blast & variants I Dynamic Programming
Introduction to Bioinformatics From Pairwise to Multiple Alignment.
LCS and Extensions to Global and Local Alignment Dr. Nancy Warter-Perez June 26, 2003.
1 Dot Plots For Time Series Analysis Dragomir Yankov, Eamonn Keogh, Stefano Lonardi Dept. of Computer Science & Eng. University of California Riverside.
Developing Pairwise Sequence Alignment Algorithms
Sequence Alignment.
Traceback and local alignment Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington.
Pairwise alignments Introduction Introduction Why do alignments? Why do alignments? Definitions Definitions Scoring alignments Scoring alignments Alignment.
Space-Efficient Sequence Alignment Space-Efficient Sequence Alignment Bioinformatics 202 University of California, San Diego Lecture Notes No. 7 Dr. Pavel.
Comp. Genomics Recitation 2 12/3/09 Slides by Igor Ulitsky.
An Improved Algorithm to Accelerate Regular Expression Evaluation Author : Michela Becchi 、 Patrick Crowley Publisher : ANCS’07 Presenter : Wen-Tse Liang.
Pairwise Sequence Alignment (I) (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 22, 2005 ChengXiang Zhai Department of Computer Science University.
Position Weight Matrices for Representing Signals in Sequences Triinu Tasa, Koke
By: Er. Sukhwinder kaur.  What is Automata Theory? What is Automata Theory?  Alphabet and Strings Alphabet and Strings  Empty String Empty String 
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
7 -1 Chapter 7 Dynamic Programming Fibonacci sequence Fibonacci sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, … F i = i if i  1 F i = F i-1 + F i-2 if.
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
Lecture 6. Pairwise Local Alignment and Database Search Csc 487/687 Computing for bioinformatics.
Eidhammer et al. Protein Bioinformatics Chapter 4 1 Multiple Global Sequence Alignment and Phylogenetic trees Inge Jonassen and Ingvar Eidhammer.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
A Table-Driven, Full-Sensitivity Similarity Search Algorithm Gene Myers and Richard Durbin Presented by Wang, Jia-Nan and Huang, Yu- Feng.
Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor.
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
1 Longest Common Subsequence as Private Search Payman Mohassel and Mark Gondree U of CalgaryNPS.
Local Alignment Vasileios Hatzivassiloglou University of Texas at Dallas.
More on HMMs and Multiple Sequence Alignment BMI/CS 776 Mark Craven March 2002.
Vineet Bafna. How can we compute the local alignment itself?
Learning to Align: a Statistical Approach
RNA sequence-structure alignment
Decision Properties of Regular Languages
Sequence Alignment Using Dynamic Programming
CSE322 CONSTRUCTION OF FINITE AUTOMATA EQUIVALENT TO REGULAR EXPRESSION Lecture #9.
Intro to Alignment Algorithms: Global and Local
Multiple Sequence Alignment (I)
The connected word recognition problem Problem definition: Given a fluently spoken sequence of words, how can we determine the optimum match in terms.
Presentation transcript:

Regular Expression Constrained Sequence Alignment Abdullah N. Arslan Assistant Professor Computer Science Department

Outline Sequence alignment  Common frame-work  DP solution  Why constrained ? RE constrained sequence alignment  Algorithm Concluding Remarks

Alignment Matrix

Edit Graph

Dynamic Programming Solution H i,j : maximum score achieved at (i, j) where H i,j = 0 whenever i=0 or j=0, H n,m in O(nm) time, O(m) space

DP Solution: Local Alignment H i,j : similarity score achieved at (i, j) where S i,j = 0 whenever i=0 or j=0, max H i,j in O(nm) time, O(m) space

Dynamic Programming Formulation Affine gap penalties Penalty for a gap of length k is  +(k-1)  where S i,j = F i,j = E i,j = 0 when i=0 or j=0 max H i,j O(nm) time, O(m) space

The Definition of the Constrained LCS Problem The contrained LCS (CLCS) problem  Given strings S 1,S 2, and P Find lcs of S 1 and S 2 s.t. P is a subsequence of this lcs Motivation:  Computing the homology of two biological sequences that have a specific part in common

Constrained Sequence Alignment Problems Constrained LCS  Tsai 2003,O(n 2 m 2 r) time  Chin et. al 2004, Arslan and Egecioglu 2004 O(nmr) time Edit-distance constrained sequence alignment  Arslan and Egecioglu 2004, O(dnmr) Regular-expression constrained sequence alignment  Motivation: Comet and Henry, 2002 PROSITE patterns  This paper

PROSITE patterns as constraints PROSITE patterns are  Regular expressions with no Kleene closure  PROSITE database  e.g. [GA]-X(4)-G-K-[ST] ATP/GTP-binding site motif A (P-loop) (PS00017) Comet and Henry reward alignments Regular expression constrained sequence alignment  Find a maximal alignment that includes a given RE

Example: For [GA]-X(4)-G-K-[ST]

Using Edit Graph: e.g. A(C+G) * (S+T)

Automata for A(C+G) * (S+T)

Some Details of Automata Construction Equivalent NFA N to a given RE R Construct from N a new NxN automaton  Moves on edit operations (or equivalently on alignment columns)  States have weights Interested in the weights of the final states after the alignment is complete

Weighted Automaton Initial weights are Weight of (q 0,q 0 ) is initially 0 Update new maximum scores at reachable states Weights become in unreachable states What are the maximum weights at the final states?

Computations on Automata

Complexity Simulate automata based on DP solution  Each steps requires examining the trasition functions  Maintain a list of active (reachable) states  Update state weights as alignments are formed  Automaton M i,j has the optimum weights

Generalizations: Local Alignment & Affine gaps

CONCLUSION Introduced the regular expression constrained sequence alignment problem Present an algorithm for the problem Future work  Generalization of the problem for Multiple sequence alignment Multiple regular expressions as a constraint

Thank You