Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fixed parameter algorithms for protein similarity search under mRNA structure constrains A joint work by: G. Blin, G. Fertin, D. Hermelin, and S. Vialette.

Similar presentations


Presentation on theme: "Fixed parameter algorithms for protein similarity search under mRNA structure constrains A joint work by: G. Blin, G. Fertin, D. Hermelin, and S. Vialette."— Presentation transcript:

1 Fixed parameter algorithms for protein similarity search under mRNA structure constrains A joint work by: G. Blin, G. Fertin, D. Hermelin, and S. Vialette.

2 2 Outline Biological motivation.  mRNA molecules.  The mRNA to protein process.  Selenocysteine Insertion. The MRSO problem.  Implied structure graph.  Known results. Two natural parameters.  The parameters.  Nice edge bipartition.  A general algorithm for both parameters.

3 3 Outline The cutwidth parameter.  An efficient algorithm for small cutwidth.  Implications of this algorithm. Binary similarity functions. Closing remarks.

4 4 mRNA molecules:  Can be considered as strings over {A,C,G,U}.  Complementary bases (A-U, G-C) may pair to form a folding structure (secondary structure) of the mRNAs.  Encode genetic information that is later translated into proteins. Biological Motivation

5 5 The mRNA  protein process:

6 6 The mRNA  protein process - standard assumption:  Each codon encodes into a single amino acid. Recently, biologists found that this not necessarily true:  According to different folding structures of the mRNA, a single codon might encode into different amino acids.  Example application - Selenocysteine insertion. Biological Motivation

7 7 Selenocysteine insertion:  Selenocysteine is a rare amino acid only recently discovered.  Generated by the UGA codon which usually encodes a stop signal.  The presence of the SECIS element forces the generation of Selenocysteine rather than stopping the encoding. Biological Motivation

8 8 Selenocysteine insertion:  Modifying existing proteins by inserting the SECIS element results in certain cases in enhanced proteins.  Is this application only the tip of the iceberg? Biological Motivation

9 9 The MRSO problem The MRSO problem:  Given a specified secondary structure S and an mRNA sequence R, construct an mRNA sequence R’ with complementary nucleotides according to S which is as similar as possible to R. CGG CGACUAAAU + R S

10 10 G CGU The MRSO problem The MRSO problem:  Given a specified secondary structure S and an mRNA sequence R, construct an mRNA sequence R’ with complementary nucleotides according to S which is as similar as possible to R. CG CGACUA R’ A G A U

11 11 The score of a solution is given by n similarity functions:  Given f 1,…,f n, one needs no additional information on the source mRNA sequence R. CGU CGACUAGCG R’ s(R’) = f 1 (CGU) + f 2 (CGA) + f 3 (CUA) + f 4 (GCG) The MRSO problem

12 12 implied structure graph The implied structure graph:  A linear graph with maximum degree 3.  Complementary constrains within nucleotides are labeled on the edges of G. S 1 234 G The MRSO problem

13 13 The MRSO problem A more formal definition [Backofen et al.’02]:  Given an implied structure graph G with n vertices, and f 1,…,f n similarity functions, find an assignment of codons c 1,…,c n to the vertices of G that: 1. Maximizes  f(c i ). 2. Is compatible with respect to G. Definition allows adapting to different applications.  Allows also a certain degree of combinatorial leverage as we shall soon see…

14 14 The MRSO problem – known results [Backofen et al.’02 and Bongartz’04]:  NP-complete (APX-hard) for general implied structure graphs.  Constant factor approximation algorithms.  Cannot handle well - . In P when the implied structure graph G is outer-planar.  In other words, if one can permutate the nodes of G such that all of the edges of G are non-crossing.  [Backofen et al.’02] give an O (n) algorithm for outer-planar implied structure graphs.  We call this algorithm A op in this talk.

15 15 1 234 Two natural parameters Let  = # degree 3 vertices in G. Let  = # edge crossings in G. 56 7 8

16 16 Two natural parameters Modifying the similarity functions:  We can modify the similarity functions so that some vertices are assigned specific codons in any feasible solution. For example:  Ensuring the first vertex is assigned AAA: f* 1 (AAA) = f 1 (AAA). f* 1 (C) = - , for all C  AAA.

17 17 6 Nice edge bipartition Nice edge bipartition of G:  Upper part induces an outer-planar graph. Two natural parameters 1 2 3 4 5 78 Upper part Bottom part

18 18 A general algorithm:  Enumerate all assignments which are compatible with respect to the bottom part.  Invoke A op with each such assignment.  Time complexity = O (2 O (b) n), where b = # bottom edges. Two natural parameters 6 1 2 3 4 5 78

19 19 The general algorithm can be applied for our two natural parameters:  Parameter  = # edge crossings in G. Time = O (2 O (  ) n), hence polynomial for  = O (lgn). 5 Two natural parameters 12 34 67 8

20 20 The general algorithm can be applied for our two natural parameters:  Parameter  = # degree 3 vertices in G.  Every graph with maximum degree 2 is outer-planar. Time = O (2 O (  ) n), hence polynomial for  = O (lgn). Two natural parameters 7 13 5 24 68 1234 5678

21 21 4 56 3 1 2 cutwidth The cutwidth of G:  For p  {1,…,n-1}, let E p denote the edges connecting vertices from {1,…,p} to {p+1,…,n}, and let V p denote the vertices of G which are incident to E p.  Let  denote the cutwidth of G. Then  = max p |E p |. 7 The cutwidth parameter 8 p = 2 EpEp VpVp

22 22 Algorithm outline:  Pick any p  {1,…,n-1}.  For each assignment for V p that is compatible with E p :  Recursively find the optimal solution for the subgraphs of G induced by {1,…,p} and {p+1,…,n} under this assignment.  Return the highest scoring solution found in the previous step. The cutwidth parameter 1 2 7 34 568 CGAUAACGGAUAGUUCGC

23 23 Time = O (2 O (  ) n), hence polynomial for  = O (lgn). Theorem [Korach&Solel’93 via Chung&Seymour’89]: Any graph G with n vertices and constant treewidth has a vertex ordering such that G under this ordering has cutwidth of O(lgn). Theorem [Bodlaender’95]: If G is either a chordal graph or a circular-arc graph with constant maximum clique size then G has constant treewidth. If G is k-outerplanar for any constant k then G has constant treewidth. Combining all the above we get: MRSO is polynomial time solvable if G is either a chordal graph, a circular- arc graph, or k-outerplanar. The cutwidth parameter

24 24 Binary similarity functions Suppose we are only interested in the number of “correct” codons in a solution.  In this case we can restrict ourselves to binary similarity functions. That is, for all i : f i :  3  {0,1}.  Unfortunately, MRSO is NP-hard even when restricted only to instances with binary similarity functions. CGG CGACUAAAU Source CUAGGACGGUGA Target CGG GACUAAAUCGACGGUGA U A C C CUA AAU CGA CGGUGA GACGG

25 25 Binary similarity functions MRSO with restrictive similarity functions is in FPT for parameter  = score of the optimal solution.  More precisely, its solvable in O (  2 9.25  n) time. Proof sketch:  We can assume w.l.o.g. that for all i there exists a C such that f i (C) = 1.  Any maximal independent set in G is of size at least n/4, since G is at most cubic.  We prove for   n/4 and  > n/4 separately.

26 26 Binary similarity functions Suppose   n/4:  Find an independent set of size  in O (  ) time.  Since for all i there exists a C such that f i (C) = 1, there exists an assignment to this independent set which guarantees a score of at least .  Since f i  0 for all i, this assignment can be extended to all vertices of G to obtain an assignment with score at least . Suppose  > n/4:  Try all  - subsets of the vertices of G. There are at most  2 3.25  such subsets. Enumerating all possible codon assignments for each subset requires O (2 6   ) time. 44  ( )

27 27 Closing remarks Extending our results:  Finding a practical algorithm for the cutwidth problem restricted to cubic graphs with fixed cutwidth.  More interesting parameters? Hardness results?  Applying our techniques to a similar variation of the problem which has been studied in the literature [Backofen’04]. Thank You!


Download ppt "Fixed parameter algorithms for protein similarity search under mRNA structure constrains A joint work by: G. Blin, G. Fertin, D. Hermelin, and S. Vialette."

Similar presentations


Ads by Google