Fixed parameter algorithms for protein similarity search under mRNA structure constrains A joint work by: G. Blin, G. Fertin, D. Hermelin, and S. Vialette.

Slides:



Advertisements
Similar presentations
Problems and Their Classes
Advertisements

Covers, Dominations, Independent Sets and Matchings AmirHossein Bayegan Amirkabir University of Technology.
Minimum Clique Partition Problem with Constrained Weight for Interval Graphs Jianping Li Department of Mathematics Yunnan University Jointed by M.X. Chen.
Minimum Vertex Cover in Rectangle Graphs
Generalization and Specialization of Kernelization Daniel Lokshtanov.
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
Recursive Definitions and Structural Induction
Bart Jansen, Utrecht University. 2  Max Leaf  Instance: Connected graph G, positive integer k  Question: Is there a spanning tree for G with at least.
1 NP-Complete Problems. 2 We discuss some hard problems:  how hard? (computational complexity)  what makes them hard?  any solutions? Definitions 
A polylogarithmic approximation of the minimum bisection Robert Krauthgamer The Hebrew University Joint work with Uri Feige.
CS774. Markov Random Field : Theory and Application Lecture 17 Kyomin Jung KAIST Nov
Train DEPOT PROBLEM USING PERMUTATION GRAPHS
Fingerprint Clustering - CPM Fingerprint Clustering with Bounded Number of Missing Values Paola Bonizzoni, Gianluca Della Vedova, Giancarlo Mauri.
The number of edge-disjoint transitive triples in a tournament.
Complexity 16-1 Complexity Andrei Bulatov Non-Approximability.
Complexity 11-1 Complexity Andrei Bulatov NP-Completeness.
Computability and Complexity 23-1 Computability and Complexity Andrei Bulatov Search and Optimization.
Complexity 15-1 Complexity Andrei Bulatov Hierarchy Theorem.
Introduction to Approximation Algorithms Lecture 12: Mar 1.
16:36MCS - WG20041 On the Maximum Cardinality Search Lower Bound for Treewidth Hans Bodlaender Utrecht University Arie Koster ZIB Berlin.
Computability and Complexity 15-1 Computability and Complexity Andrei Bulatov NP-Completeness.
Approximation Algorithm: Iterative Rounding Lecture 15: March 9.
Clique-Width of Monogenic Bipartite Graphs Jordan Volz DIMACS REU 2006 Mentor: Dr. Vadim Lozin, RUTCOR.
NP-Complete Problems Reading Material: Chapter 10 Sections 1, 2, 3, and 4 only.
Analysis of Algorithms CS 477/677
Joint with Christian KnauerFreie U., Berlin Andreas SpillnerJena Takeshi TokuyamaTohoku University Alexander WolffUniversity of Karlsruhe Algorithms for.
Computational Complexity, Physical Mapping III + Perl CIS 667 March 4, 2004.
Computability and Complexity 24-1 Computability and Complexity Andrei Bulatov Approximation.
Chapter 11: Limitations of Algorithmic Power
The community-search problem and how to plan a successful cocktail party Mauro SozioAris Gionis Max Planck Institute, Germany Yahoo! Research, Barcelona.
1 Introduction to Approximation Algorithms Lecture 15: Mar 5.
1 Refined Search Tree Technique for Dominating Set on Planar Graphs Jochen Alber, Hongbing Fan, Michael R. Fellows, Henning Fernau, Rolf Niedermeier, Fran.
1 Joint work with Shmuel Safra. 2 Motivation 3 Motivation.
Data reduction lower bounds: Problems without polynomial kernels Hans L. Bodlaender Joint work with Downey, Fellows, Hermelin, Thomasse, Yeo.
Fixed Parameter Complexity Algorithms and Networks.
Graph Coalition Structure Generation Maria Polukarov University of Southampton Joint work with Tom Voice and Nick Jennings HUJI, 25 th September 2011.
APPROXIMATION ALGORITHMS VERTEX COVER – MAX CUT PROBLEMS
Approximating Minimum Bounded Degree Spanning Tree (MBDST) Mohit Singh and Lap Chi Lau “Approximating Minimum Bounded DegreeApproximating Minimum Bounded.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
1 Treewidth, partial k-tree and chordal graphs Delpensum INF 334 Institutt fo informatikk Pinar Heggernes Speaker:
1 Bart Jansen Independent Set Kernelization for a Refined Parameter: Upper and Lower bounds TACO Day, Utrecht January 12 th, 2011 Joint work with Hans.
Batch Scheduling of Conflicting Jobs Hadas Shachnai The Technion Based on joint papers with L. Epstein, M. M. Halldórsson and A. Levin.
Approximation Algorithms
Computational Molecular Biology Non-unique Probe Selection via Group Testing.
Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.
NP-COMPLETENESS PRESENTED BY TUSHAR KUMAR J. RITESH BAGGA.
1 Bart Jansen Independent Set Kernelization for a Refined Parameter: Upper and Lower bounds ALGORITMe Staff Colloquium, Utrecht September 10 th, 2010 Joint.
EMIS 8373: Integer Programming NP-Complete Problems updated 21 April 2009.
CSCI 3160 Design and Analysis of Algorithms Tutorial 10 Chengyu Lin.
Unit 9: Coping with NP-Completeness
ICS 253: Discrete Structures I Induction and Recursion King Fahd University of Petroleum & Minerals Information & Computer Science Department.
Linear Program Set Cover. Given a universe U of n elements, a collection of subsets of U, S = {S 1,…, S k }, and a cost function c: S → Q +. Find a minimum.
Computational Molecular Biology Non-unique Probe Selection via Group Testing.
Computing Branchwidth via Efficient Triangulations and Blocks Authors: F.V. Fomin, F. Mazoit, I. Todinca Presented by: Elif Kolotoglu, ISE, Texas A&M University.
1 CS612 Algorithms for Electronic Design Automation CS 612 – Lecture 8 Lecture 8 Network Flow Based Modeling Mustafa Ozdal Computer Engineering Department,
NP-completeness NP-complete problems. Homework Vertex Cover Instance. A graph G and an integer k. Question. Is there a vertex cover of cardinality k?
Chapter 11 Introduction to Computational Complexity Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.
A Linear Time Algorithm for the Longest Path Problem on 2-trees joint work with Tzvetalin Vassilev and Krassimir Manev
Algorithms for hard problems Parameterized complexity Bounded tree width approaches Juris Viksna, 2015.
Kernel Bounds for Path and Cycle Problems Bart M. P. Jansen Joint work with Hans L. Bodlaender & Stefan Kratsch September 8 th 2011, Saarbrucken.
COSC 3101A - Design and Analysis of Algorithms 14 NP-Completeness.
Approximation Algorithms based on linear programming.
The NP class. NP-completeness Lecture2. The NP-class The NP class is a class that contains all the problems that can be decided by a Non-Deterministic.
Hongyu Liang Institute for Theoretical Computer Science Tsinghua University, Beijing, China The Algorithmic Complexity.
Theory of Computational Complexity Probability and Computing Chapter Hikaru Inada Iwama and Ito lab M1.
Joint work with Hans Bodlaender
Computability and Complexity
Enumerating Distances Using Spanners of Bounded Degree
Bart M. P. Jansen June 3rd 2016, Algorithms for Optimization Problems
Complexity Theory in Practice
Presentation transcript:

Fixed parameter algorithms for protein similarity search under mRNA structure constrains A joint work by: G. Blin, G. Fertin, D. Hermelin, and S. Vialette.

2 Outline Biological motivation.  mRNA molecules.  The mRNA to protein process.  Selenocysteine Insertion. The MRSO problem.  Implied structure graph.  Known results. Two natural parameters.  The parameters.  Nice edge bipartition.  A general algorithm for both parameters.

3 Outline The cutwidth parameter.  An efficient algorithm for small cutwidth.  Implications of this algorithm. Binary similarity functions. Closing remarks.

4 mRNA molecules:  Can be considered as strings over {A,C,G,U}.  Complementary bases (A-U, G-C) may pair to form a folding structure (secondary structure) of the mRNAs.  Encode genetic information that is later translated into proteins. Biological Motivation

5 The mRNA  protein process:

6 The mRNA  protein process - standard assumption:  Each codon encodes into a single amino acid. Recently, biologists found that this not necessarily true:  According to different folding structures of the mRNA, a single codon might encode into different amino acids.  Example application - Selenocysteine insertion. Biological Motivation

7 Selenocysteine insertion:  Selenocysteine is a rare amino acid only recently discovered.  Generated by the UGA codon which usually encodes a stop signal.  The presence of the SECIS element forces the generation of Selenocysteine rather than stopping the encoding. Biological Motivation

8 Selenocysteine insertion:  Modifying existing proteins by inserting the SECIS element results in certain cases in enhanced proteins.  Is this application only the tip of the iceberg? Biological Motivation

9 The MRSO problem The MRSO problem:  Given a specified secondary structure S and an mRNA sequence R, construct an mRNA sequence R’ with complementary nucleotides according to S which is as similar as possible to R. CGG CGACUAAAU + R S

10 G CGU The MRSO problem The MRSO problem:  Given a specified secondary structure S and an mRNA sequence R, construct an mRNA sequence R’ with complementary nucleotides according to S which is as similar as possible to R. CG CGACUA R’ A G A U

11 The score of a solution is given by n similarity functions:  Given f 1,…,f n, one needs no additional information on the source mRNA sequence R. CGU CGACUAGCG R’ s(R’) = f 1 (CGU) + f 2 (CGA) + f 3 (CUA) + f 4 (GCG) The MRSO problem

12 implied structure graph The implied structure graph:  A linear graph with maximum degree 3.  Complementary constrains within nucleotides are labeled on the edges of G. S G The MRSO problem

13 The MRSO problem A more formal definition [Backofen et al.’02]:  Given an implied structure graph G with n vertices, and f 1,…,f n similarity functions, find an assignment of codons c 1,…,c n to the vertices of G that: 1. Maximizes  f(c i ). 2. Is compatible with respect to G. Definition allows adapting to different applications.  Allows also a certain degree of combinatorial leverage as we shall soon see…

14 The MRSO problem – known results [Backofen et al.’02 and Bongartz’04]:  NP-complete (APX-hard) for general implied structure graphs.  Constant factor approximation algorithms.  Cannot handle well - . In P when the implied structure graph G is outer-planar.  In other words, if one can permutate the nodes of G such that all of the edges of G are non-crossing.  [Backofen et al.’02] give an O (n) algorithm for outer-planar implied structure graphs.  We call this algorithm A op in this talk.

Two natural parameters Let  = # degree 3 vertices in G. Let  = # edge crossings in G

16 Two natural parameters Modifying the similarity functions:  We can modify the similarity functions so that some vertices are assigned specific codons in any feasible solution. For example:  Ensuring the first vertex is assigned AAA: f* 1 (AAA) = f 1 (AAA). f* 1 (C) = - , for all C  AAA.

17 6 Nice edge bipartition Nice edge bipartition of G:  Upper part induces an outer-planar graph. Two natural parameters Upper part Bottom part

18 A general algorithm:  Enumerate all assignments which are compatible with respect to the bottom part.  Invoke A op with each such assignment.  Time complexity = O (2 O (b) n), where b = # bottom edges. Two natural parameters

19 The general algorithm can be applied for our two natural parameters:  Parameter  = # edge crossings in G. Time = O (2 O (  ) n), hence polynomial for  = O (lgn). 5 Two natural parameters

20 The general algorithm can be applied for our two natural parameters:  Parameter  = # degree 3 vertices in G.  Every graph with maximum degree 2 is outer-planar. Time = O (2 O (  ) n), hence polynomial for  = O (lgn). Two natural parameters

cutwidth The cutwidth of G:  For p  {1,…,n-1}, let E p denote the edges connecting vertices from {1,…,p} to {p+1,…,n}, and let V p denote the vertices of G which are incident to E p.  Let  denote the cutwidth of G. Then  = max p |E p |. 7 The cutwidth parameter 8 p = 2 EpEp VpVp

22 Algorithm outline:  Pick any p  {1,…,n-1}.  For each assignment for V p that is compatible with E p :  Recursively find the optimal solution for the subgraphs of G induced by {1,…,p} and {p+1,…,n} under this assignment.  Return the highest scoring solution found in the previous step. The cutwidth parameter CGAUAACGGAUAGUUCGC

23 Time = O (2 O (  ) n), hence polynomial for  = O (lgn). Theorem [Korach&Solel’93 via Chung&Seymour’89]: Any graph G with n vertices and constant treewidth has a vertex ordering such that G under this ordering has cutwidth of O(lgn). Theorem [Bodlaender’95]: If G is either a chordal graph or a circular-arc graph with constant maximum clique size then G has constant treewidth. If G is k-outerplanar for any constant k then G has constant treewidth. Combining all the above we get: MRSO is polynomial time solvable if G is either a chordal graph, a circular- arc graph, or k-outerplanar. The cutwidth parameter

24 Binary similarity functions Suppose we are only interested in the number of “correct” codons in a solution.  In this case we can restrict ourselves to binary similarity functions. That is, for all i : f i :  3  {0,1}.  Unfortunately, MRSO is NP-hard even when restricted only to instances with binary similarity functions. CGG CGACUAAAU Source CUAGGACGGUGA Target CGG GACUAAAUCGACGGUGA U A C C CUA AAU CGA CGGUGA GACGG

25 Binary similarity functions MRSO with restrictive similarity functions is in FPT for parameter  = score of the optimal solution.  More precisely, its solvable in O (   n) time. Proof sketch:  We can assume w.l.o.g. that for all i there exists a C such that f i (C) = 1.  Any maximal independent set in G is of size at least n/4, since G is at most cubic.  We prove for   n/4 and  > n/4 separately.

26 Binary similarity functions Suppose   n/4:  Find an independent set of size  in O (  ) time.  Since for all i there exists a C such that f i (C) = 1, there exists an assignment to this independent set which guarantees a score of at least .  Since f i  0 for all i, this assignment can be extended to all vertices of G to obtain an assignment with score at least . Suppose  > n/4:  Try all  - subsets of the vertices of G. There are at most   such subsets. Enumerating all possible codon assignments for each subset requires O (2 6   ) time. 44  ( )

27 Closing remarks Extending our results:  Finding a practical algorithm for the cutwidth problem restricted to cubic graphs with fixed cutwidth.  More interesting parameters? Hardness results?  Applying our techniques to a similar variation of the problem which has been studied in the literature [Backofen’04]. Thank You!