Presentation is loading. Please wait.

Presentation is loading. Please wait.

Regular Expression Constrained Sequence Alignment Abdullah N. Arslan Assistant Professor Computer Science Department.

Similar presentations


Presentation on theme: "Regular Expression Constrained Sequence Alignment Abdullah N. Arslan Assistant Professor Computer Science Department."— Presentation transcript:

1 Regular Expression Constrained Sequence Alignment Abdullah N. Arslan Assistant Professor Computer Science Department

2 Outline Sequence alignment  Common frame-work  DP solution  Why constrained ? RE constrained sequence alignment  Algorithm Concluding Remarks

3 Alignment Matrix

4 Edit Graph

5 Dynamic Programming Solution H i,j : maximum score achieved at (i, j) where H i,j = 0 whenever i=0 or j=0, H n,m in O(nm) time, O(m) space

6 DP Solution: Local Alignment H i,j : similarity score achieved at (i, j) where S i,j = 0 whenever i=0 or j=0, max H i,j in O(nm) time, O(m) space

7 Dynamic Programming Formulation Affine gap penalties Penalty for a gap of length k is  +(k-1)  where S i,j = F i,j = E i,j = 0 when i=0 or j=0 max H i,j O(nm) time, O(m) space

8 The Definition of the Constrained LCS Problem The contrained LCS (CLCS) problem  Given strings S 1,S 2, and P Find lcs of S 1 and S 2 s.t. P is a subsequence of this lcs Motivation:  Computing the homology of two biological sequences that have a specific part in common

9 Constrained Sequence Alignment Problems Constrained LCS  Tsai 2003,O(n 2 m 2 r) time  Chin et. al 2004, Arslan and Egecioglu 2004 O(nmr) time Edit-distance constrained sequence alignment  Arslan and Egecioglu 2004, O(dnmr) Regular-expression constrained sequence alignment  Motivation: Comet and Henry, 2002 PROSITE patterns  This paper

10 PROSITE patterns as constraints PROSITE patterns are  Regular expressions with no Kleene closure  PROSITE database  e.g. [GA]-X(4)-G-K-[ST] ATP/GTP-binding site motif A (P-loop) (PS00017) Comet and Henry reward alignments Regular expression constrained sequence alignment  Find a maximal alignment that includes a given RE

11 Example: For [GA]-X(4)-G-K-[ST]

12 Using Edit Graph: e.g. A(C+G) * (S+T)

13 Automata for A(C+G) * (S+T)

14 Some Details of Automata Construction Equivalent NFA N to a given RE R Construct from N a new NxN automaton  Moves on edit operations (or equivalently on alignment columns)  States have weights Interested in the weights of the final states after the alignment is complete

15 Weighted Automaton Initial weights are Weight of (q 0,q 0 ) is initially 0 Update new maximum scores at reachable states Weights become in unreachable states What are the maximum weights at the final states?

16 Computations on Automata

17 Complexity Simulate automata based on DP solution  Each steps requires examining the trasition functions  Maintain a list of active (reachable) states  Update state weights as alignments are formed  Automaton M i,j has the optimum weights

18 Generalizations: Local Alignment & Affine gaps

19 CONCLUSION Introduced the regular expression constrained sequence alignment problem Present an algorithm for the problem Future work  Generalization of the problem for Multiple sequence alignment Multiple regular expressions as a constraint

20 Thank You


Download ppt "Regular Expression Constrained Sequence Alignment Abdullah N. Arslan Assistant Professor Computer Science Department."

Similar presentations


Ads by Google