Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine.

Slides:



Advertisements
Similar presentations
Weighted Matching-Algorithms, Hamiltonian Cycles and TSP
Advertisements

C&O 355 Lecture 23 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A.
1 LP Duality Lecture 13: Feb Min-Max Theorems In bipartite graph, Maximum matching = Minimum Vertex Cover In every graph, Maximum Flow = Minimum.
1 NP-completeness Lecture 2: Jan P The class of problems that can be solved in polynomial time. e.g. gcd, shortest path, prime, etc. There are many.
The Theory of NP-Completeness
1 NP-Complete Problems. 2 We discuss some hard problems:  how hard? (computational complexity)  what makes them hard?  any solutions? Definitions 
Great Theoretical Ideas in Computer Science for Some.
© The McGraw-Hill Companies, Inc., Chapter 8 The Theory of NP-Completeness.
CSC5160 Topics in Algorithms Tutorial 2 Introduction to NP-Complete Problems Feb Jerry Le
Complexity 11-1 Complexity Andrei Bulatov NP-Completeness.
Computability and Complexity 23-1 Computability and Complexity Andrei Bulatov Search and Optimization.
Complexity 15-1 Complexity Andrei Bulatov Hierarchy Theorem.
Computational problems, algorithms, runtime, hardness
Optimal solution of binary problems Much material taken from :  Olga Veksler, University of Western Ontario
Computability and Complexity 15-1 Computability and Complexity Andrei Bulatov NP-Completeness.
1 Optimization problems such as MAXSAT, MIN NODE COVER, MAX INDEPENDENT SET, MAX CLIQUE, MIN SET COVER, TSP, KNAPSACK, BINPACKING do not have a polynomial.
1 CSE 417: Algorithms and Computational Complexity Winter 2001 Lecture 21 Instructor: Paul Beame.
Implicit Hitting Set Problems Richard M. Karp Harvard University August 29, 2011.
NP-Complete Problems Reading Material: Chapter 10 Sections 1, 2, 3, and 4 only.
1 COMPOSITION PCP proof by Irit Dinur Presentation by Guy Solomon.
NP-Complete Problems Problems in Computer Science are classified into
1 CSE 417: Algorithms and Computational Complexity Winter 2001 Lecture 23 Instructor: Paul Beame.
Analysis of Algorithms CS 477/677
2-Layer Crossing Minimisation Johan van Rooij. Overview Problem definitions NP-Hardness proof Heuristics & Performance Practical Computation One layer:
Job Scheduling Lecture 19: March 19. Job Scheduling: Unrelated Multiple Machines There are n jobs, each job has: a processing time p(i,j) (the time to.
Pseudo-polynomial time algorithm (The concept and the terminology are important) Partition Problem: Input: Finite set A=(a1, a2, …, an} and a size s(a)
Chapter 11: Limitations of Algorithmic Power
1 CSE 417: Algorithms and Computational Complexity Winter 2001 Lecture 22 Instructor: Paul Beame.
Linear Programming and Parameterized Algorithms. Linear Programming n real-valued variables, x 1, x 2, …, x n. Linear objective function. Linear (in)equality.
1 Integrality constraints Integrality constraints are often crucial when modeling optimizayion problems as linear programs. We have seen that if our linear.
10/31/02CSE Greedy Algorithms CSE Algorithms Greedy Algorithms.
10/31/02CSE Greedy Algorithms CSE Algorithms Greedy Algorithms.
The Theory of NP-Completeness 1. Nondeterministic algorithms A nondeterminstic algorithm consists of phase 1: guessing phase 2: checking If the checking.
1 The Theory of NP-Completeness 2012/11/6 P: the class of problems which can be solved by a deterministic polynomial algorithm. NP : the class of decision.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
The Complexity of Optimization Problems. Summary -Complexity of algorithms and problems -Complexity classes: P and NP -Reducibility -Karp reducibility.
Computational Complexity Theory Lecture 2: Reductions, NP-completeness, Cook-Levin theorem Indian Institute of Science.
Theory of Computing Lecture 17 MAS 714 Hartmut Klauck.
Great Theoretical Ideas in Computer Science.
TECH Computer Science NP-Complete Problems Problems  Abstract Problems  Decision Problem, Optimal value, Optimal solution  Encodings  //Data Structure.
Approximation Algorithms
Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.
NP-COMPLETENESS PRESENTED BY TUSHAR KUMAR J. RITESH BAGGA.
1 Lower Bounds Lower bound: an estimate on a minimum amount of work needed to solve a given problem Examples: b number of comparisons needed to find the.
Complexity 25-1 Complexity Andrei Bulatov Counting Problems.
EMIS 8373: Integer Programming NP-Complete Problems updated 21 April 2009.
Combinatorial Optimization Problems in Computational Biology Ion Mandoiu CSE Department.
1 The Theory of NP-Completeness 2 Cook ’ s Theorem (1971) Prof. Cook Toronto U. Receiving Turing Award (1982) Discussing difficult problems: worst case.
NP-Complete Problems. Running Time v.s. Input Size Concern with problems whose complexity may be described by exponential functions. Tractable problems.
NP-COMPLETE PROBLEMS. Admin  Two more assignments…  No office hours on tomorrow.
NP-Complete problems.
Minicourse on parameterized algorithms and complexity Part 4: Linear programming Dániel Marx (slides by Daniel Lokshtanov) Jagiellonian University in Kraków.
Fixed parameter algorithms for protein similarity search under mRNA structure constrains A joint work by: G. Blin, G. Fertin, D. Hermelin, and S. Vialette.
CSE 589 Part V One of the symptoms of an approaching nervous breakdown is the belief that one’s work is terribly important. Bertrand Russell.
CPS Computational problems, algorithms, runtime, hardness (a ridiculously brief introduction to theoretical computer science) Vincent Conitzer.
10/11/10 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne Adam Smith Algorithm Design and Analysis L ECTURE 21 Network.
Lecture 25 NP Class. P = ? NP = ? PSPACE They are central problems in computational complexity.
Chapter 11 Introduction to Computational Complexity Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.
NP Completeness Piyush Kumar. Today Reductions Proving Lower Bounds revisited Decision and Optimization Problems SAT and 3-SAT P Vs NP Dealing with NP-Complete.
CSC 413/513: Intro to Algorithms
Lecture. Today Problem set 9 out (due next Thursday) Topics: –Complexity Theory –Optimization versus Decision Problems –P and NP –Efficient Verification.
The Theory of NP-Completeness 1. Nondeterministic algorithms A nondeterminstic algorithm consists of phase 1: guessing phase 2: checking If the checking.
TU/e Algorithms (2IL15) – Lecture 12 1 Linear Programming.
COSC 3101A - Design and Analysis of Algorithms 14 NP-Completeness.
The NP class. NP-completeness Lecture2. The NP-class The NP class is a class that contains all the problems that can be decided by a Non-Deterministic.
Great Theoretical Ideas in Computer Science.
Approximation algorithms
Integer Programming An integer linear program (ILP) is defined exactly as a linear program except that values of variables in a feasible solution have.
3.3 Applications of Maximum Flow and Minimum Cut
Parsimony population haplotyping
Presentation transcript:

Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine

Outline of talk: 1. Problem definition 2. Parametrized complexity 3. Polynomial cases 4. NP-hardness 5. ILP formulations

1. Problem definition

We are given a string s and a parameter k (e.g., k = 3) The string has a set of k-mers, its support, K(s)

K(s) = { 010 } We are given a string s and a parameter k (e.g., k = 3) The string has a set of k-mers, its support, K(s)

K(s) = { 010, 100 } We are given a string s and a parameter k (e.g., k = 3) The string has a set of k-mers, its support, K(s)

K(s) = { 010, 100, 001} We are given a string s and a parameter k (e.g., k = 3) The string has a set of k-mers, its support, K(s)

K(s) = { 010, 100, 001} We are given a string s and a parameter k (e.g., k = 3) The string has a set of k-mers, its support, K(s)

K(s) = { 010, 100, 001} We are given a string s and a parameter k (e.g., k = 3) The string has a set of k-mers, its support, K(s)

K(s) = { 010, 100, 001} We are given a string s and a parameter k (e.g., k = 3) The string has a set of k-mers, its support, K(s)

K(s) = { 010, 100, 001, 011} We are given a string s and a parameter k (e.g., k = 3) The string has a set of k-mers, its support, K(s)

K(s) = { 010, 100, 001, 011} | K(s) | = 4 We are given a string s and a parameter k (e.g., k = 3) The string has a set of k-mers, its support, K(s)

K(s) = { 010, 100, 001, 011} | K(s) | = 4 By flipping some bits, we could reduce the number of k-mers We are given a string s and a parameter k (e.g., k = 3)

K(s) = { 010, 100, 001, 011} | K(s) | = 4 By flipping some bits, we could reduce the number of k-mers S’= We are given a string s and a parameter k (e.g., k = 3)

K(s) = { 010, 100, 001, 011} | K(s) | = 4 By flipping some bits, we could reduce the number of k-mers S’= K(s’) = { 010, 100, 001} | K(s’) | = 3 We are given a string s and a parameter k (e.g., k = 3)

The Problem : - A string s over an alphabet  Ingredients:

The Problem : - A string s over an alphabet  - A parameter k (k-mer size) Ingredients:

The Problem : - A string s over an alphabet  - A parameter k (k-mer size) - A budget B Ingredients:

The Problem : - A string s over an alphabet  - A parameter k (k-mer size) - A budget B Ingredients: Change at most B letters in s so as resulting s’ has as few distinct k-mers as possible Objective:

The Problem : - A string s over an alphabet  - A parameter k (k-mer size) - A budget B Ingredients: Find a string s’ with d(s,s’) <= B with the smallest number of kmers Objective: s s’

Motivation : Curiosity-driven (it’s a cute combinatorial problem)Real:

Motivation : Curiosity-driven (it’s a cute combinatorial problem)Real: Analysis of DNA sequencesFictious: atcgattgatccttta atc, tcg, cga, gat, …. 3-mers are aminoacid codons. Protein complexity relates to # of codons. Mutations may reduce complexity….

Our results: The problem has many parameters (|s|, |  |, k, B), we study all versions (when possibly some of the parameters are bounded) - Polynomial special cases (e.g. for B fixed or both k,|  | fixed) - NP-hard special cases (even k=2 or |  |=2)

2. Parametrized complexity

 s  NO B NO  s  YES B NO  s  NO B YES  s  YES B YES  NO k NO  YES k NO  NO k YES  YES k YES

 s  NO B NO  s  YES B NO  s  NO B YES  s  YES B YES We can assume : k <= |s|  NO k NO  YES k NO  NO k YES  YES k YES

 s  NO B NO  s  YES B NO  s  NO B YES  s  YES B YES We can assume : k <= |s|  NO k NO  YES k NO  NO k YES  YES k YES

 s  NO B NO  s  YES B NO  s  NO B YES  s  YES B YES We can assume : B <= |s|  NO k NO  YES k NO  NO k YES  YES k YES

 NO k NO  YES k NO  NO k YES  YES k YES  s  NO B NO  s  YES B NO  s  NO B YES  s  YES B YES We can assume : |  | <= |s| (we don’t need any symbol not already in s)

 NO k NO  YES k NO  NO k YES  YES k YES  s  NO B NO  s  YES B NO  s  NO B YES  s  YES B YES

 NO k NO  YES k NO  NO k YES  YES k YES  s  NO B NO  s  YES B NO  s  NO B YES  s  YES B YES Polynomial cases

 NO k NO  YES k NO  NO k YES  YES k YES  s  NO B NO  s  YES B NO  s  NO B YES  s  YES B YES NP-hard cases N P-hard for |  |=2 N P-hard for k=2

3. Polynomial cases

The case |  | and k fixed:

 NO k NO  YES k NO  NO k YES  YES k YES  s  NO B NO  s  YES B NO  s  NO B YES  s  YES B YES N P-hard for |  |=2 N P-hard for k=2

The case |  | and k fixed: We start with this subproblem: SUB(A): Given a set of kmers A, can we correct s within budget so as it has all of its kmers in A?

The case |  | and k fixed: s = A = 0100, 1001, 0010, 0001 B = 3 We start with this subproblem: SUB(A): Given a set of kmers A, can we correct s within budget so as it has all of its kmers in A?

The case |  | and k fixed: s = A = 0100, 1001, 0010, 0001 B = …… We start with this subproblem: SUB(A): Given a set of kmers A, can we correct s within budget so as it has all of its kmers in A?

The case |  | and k fixed: s = A = 0100, 1001, 0010, 0001 B = …… We start with this subproblem: SUB(A): Given a set of kmers A, can we correct s within budget so as it has all of its kmers in A?

The case |  | and k fixed: s = A = 0100, 1001, 0010, 0001 B = …… We start with this subproblem: SUB(A): Given a set of kmers A, can we correct s within budget so as it has all of its kmers in A?

The case |  | and k fixed: s = A = 0100, 1001, 0010, 0001 B = …… … We start with this subproblem: SUB(A): Given a set of kmers A, can we correct s within budget so as it has all of its kmers in A?

The case |  | and k fixed: s = A = 0100, 1001, 0010, 0001 B = …… … We start with this subproblem: SUB(A): Given a set of kmers A, can we correct s within budget so as it has all of its kmers in A?

The case |  | and k fixed: s = A = 0100, 1001, 0010, 0001 B = …… … We start with this subproblem: SUB(A): Given a set of kmers A, can we correct s within budget so as it has all of its kmers in A?

The case |  | and k fixed: s = A = 0100, 1001, 0010, 0001 B = …… … Each path corresponds to a string s’ with all its kmers in A We start with this subproblem: SUB(A): Given a set of kmers A, can we correct s within budget so as it has all of its kmers in A?

The case |  | and k fixed: s = A = 0100, 1001, 0010, 0001 B = …… … The length of the path is the Hamming distance d(s’, s) We start with this subproblem: SUB(A): Given a set of kmers A, can we correct s within budget so as it has all of its kmers in A?

The case |  | and k fixed: We start with this subproblem: SUB(A): Given a set of kmers A, can we correct s within budget so as it has all of its kmers in A? s = A = 0100, 1001, 0010, 0001 B = …… … SUB(A) has a solution iff the shortest path is <= B

The case |  | and k fixed: We start with this subproblem: SUB(A): Given a set of kmers A, can we correct s within budget so as it has all of its kmers in A? - we can solve SUB(A) in polytime (O|A||  ||s|) = O(|s|) since

The case |  | and k fixed: We start with this subproblem: SUB(A): Given a set of kmers A, can we correct s within budget so as it has all of its kmers in A? - we can solve SUB(A) in polytime (O|A||  ||s|) = O(|s|) since - There are “only” possible subsets A to try…  problem is solved in polytime O(|s|)

The case of B fixed:

 NO k NO  YES k NO  NO k YES  YES k YES  s  NO B NO  s  YES B NO  s  NO B YES  s  YES B YES N P-hard for |  |=2 N P-hard for k=2

The case of B fixed: For B fixed, we can try all possible solutions. There are possible choices of bits to flip. We can try them all, and count the # of k-mers. Since |  |<=|s|, the way to flip them is bounded by

4. NP-hardness

- Theorem: the problem is NP-hard even for k=2.

 NO k NO  YES k NO  NO k YES  YES k YES  s  NO B NO  s  YES B NO  s  NO B YES  s  YES B YES N P-hard for |  |=2 N P-hard for k=2

- Theorem: the problem is NP-hard even for k=2. - Proof: reduction from COMPACT BIPARTITE SUBGRAPH (CBS) INSTANCE: a bipartite graph G=(U,V;E). Integers n, m PROBLEM: does there exist a set such that (i) (ii)

- Theorem: the problem is NP-hard even for k=2. - Proof: reduction from COMPACT BIPARTITE SUBGRAPH (CBS) INSTANCE: a bipartite graph G=(U,V;E). Integers n, m PROBLEM: does there exist a set such that (i) (ii) n = 6, m = 5

- Theorem: the problem is NP-hard even for k=2. - Proof: reduction from COMPACT BIPARTITE SUBGRAPH (CBS) INSTANCE: a bipartite graph G=(U,V;E). Integers n, m PROBLEM: does there exist a set such that (i) (ii) n = 6, m = 5

- Theorem: the problem is NP-hard even for k=2. - Proof: reduction from COMPACT BIPARTITE SUBGRAPH (CBS) INSTANCE: a bipartite graph G=(U,V;E). Integers n, m PROBLEM: does there exist a set such that (i) (ii) n = 6, m = 5 ( note that CBS is NP-hard because it includes MAX BALANCED BIP GRAPH, for n = 2t, m = t^2 )

CBS: does there existsuch that The reduction: and? a b c d e f g h i l

CBS: does there existsuch that The reduction: and? a b c d e f g h i l Let  = {a,b,c,d,e,f,g,h,i,l} U { ,  }B = |E| - m

CBS: does there existsuch that The reduction: and? a b c d e f g h i l Let  = {a,b,c,d,e,f,g,h,i,l} U { ,  }B = |E| - m IDEA: Encode an edge (i,j) as …  i  j  … and make all k-mers  x, x  and  unavoidable (i.e., insert a LOT of each of them in s)

CBS: does there existsuch that The reduction: and? a b c d e f g h i l Let  = {a,b,c,d,e,f,g,h,i,l} U { ,  }B = |E| - m The only kmers that can be destroyed are of the form x  or  x, and this is achieved by “flippling” the  into a . This corresponds to removing the edge.  i  j .. ...  i  j  IDEA: Encode an edge (i,j) as …  i  j  … and make all k-mers  x, x  and  unavoidable (i.e., insert a LOT of each of them in s)

CBS: does there existsuch that The reduction: and? a b c d e f g h i l Let  = {a,b,c,d,e,f,g,h,i,l} U { ,  }B = |E| - m The set of kmers of type  x or x  which remain define the set X which covers at least m edges IDEA: Encode an edge (i,j) as …  i  j  … and make all k-mers  x, x  and  unavoidable (i.e., insert a LOT of each of them in s) The only kmers that can be destroyed are of the form x  or  x, and this is achieved by “flippling” the  into a . This corresponds to removing the edge.  i  j .. ...  i  j 

CBS: does there existsuch that The reduction: and? a b c d e f g h i l Let  = {a,b,c,d,e,f,g,h,i,l} U { ,  }B = |E| - m IDEA: Encode an edge (i,j) as …  i  j  … and make all k-mers  x, x  and  unavoidable (i.e., insert a LOT of each of them in s) S =  a  a ...a  a  b  b ...b  b...  l  l  l...  l  a  g  b  g  e  i 

-With similar reductions, we can also prove the Theorem: the problem is NP-hard even for |  | = 2

5. Integer Linear Programming formulations

Let K be the set of all possible kmers Define a 0/1 variable in K and a 0/1 variable for each position Exponential-size formulation (  ={0,1}) K={ 000, 001, 010, 011, 100, 101, 110, 111 }

min Exponential-size formulation (  ={0,1})

min Exponential-size formulation (  ={0,1})

min Exponential-size formulation (  ={0,1})

Exponential n. of variables/constraints  pricing & separation problems Our P&S strategy is exponential in the general case (polynomial for k fixed)

Exponential n. of variables/constraints  pricing & separation problems Our P&S strategy is exponential in the general case (polynomial for k fixed) Fractional solutions may lead to effective heuristics (e.g., solving SUB(A) with

Polynomial-size formulation (  ={0,1})

In addition to variables, there are variables for positions i and j saying “is the kmer starting at i identical to the kmer starting at j ? “ The variables w depend on s and z via linear constraints

To count only kmers that have no identical kmer following them: if all kmers starting after i are different from kmer at i Polynomial-size formulation (  ={0,1}) In addition to variables, there are variables for positions i and j saying “is the kmer starting at i identical to the kmer starting at j ? “ The variables w depend on s and z via linear constraints

Polynomial-size formulation (  ={0,1}) -The formulation (not show) is basically a big boolean formula -It yields a poor bound compared to the exponential formulation -It can be improved in many ways (expecially via valid cuts) -At this stage it’s not clear which method is best for not too small instances -We’ll run experiments with variants of both formulations

<EOT>