# Copyright 2004 Nobuhisa UEDA, Bioinformatics Center, ICR, Kyoto University Topics in Algorithms Introduction to Computational Complexity Theory.

## Presentation on theme: "Copyright 2004 Nobuhisa UEDA, Bioinformatics Center, ICR, Kyoto University Topics in Algorithms Introduction to Computational Complexity Theory."— Presentation transcript:

Copyright 2004 Nobuhisa UEDA, Bioinformatics Center, ICR, Kyoto University Topics in Algorithms Introduction to Computational Complexity Theory

Copyright 2004 Nobuhisa UEDA, Bioinformatics Center, ICR, Kyoto University Quiz  A set S of strings is given as below.  Find the shortest strings s (called superstring) of S that contains every element of S as a substring. This quiz mimics DNA sequencing. (example) S={ ate, half, lethal, alpha, alfalfa } s = lethalphalfalfate ate half lethal alpha alfalfa S={ TCTCTA, CAGTCT,CTCCAAA, GGCAA,TAAGCTCC,TTCTCTC, TCCAAATTCTA,CTTTCT,AACACCTT, CTCCGACC,TTCTATC,TCTATCTC, CTCTGTAACA, CAACAG } [Quiz] s’= atehalflethalphalfalfa This example is from [Blum 94].

Copyright 2004 Nobuhisa UEDA, Bioinformatics Center, ICR, Kyoto University Issues in Computational complexity theory  Showing upper/lower bounds of computational resources required for solving a problem L. Upper/lower bounds are described as functions of the length of an input. Such bounds for  time,  (memory) space,  …  Structural complexity among classes of problems  Example) P  NP  E  EXP, P  E

Copyright 2004 Nobuhisa UEDA, Bioinformatics Center, ICR, Kyoto University This talk’s main issues (1/2)  How to deal with hard (time-consuming) problems What to do when we find a problem that looks hard.  Sometimes, we could not find any efficient (polynomial-time) algorithm to solve the problem. (1) If the problem is not hard, someone can find it. (2) If the problem is really hard, other smart people cannot find it either. (1)(2)

Copyright 2004 Nobuhisa UEDA, Bioinformatics Center, ICR, Kyoto University This talk’s main issues (2/2)  The previous quiz looks intractable to solve. # possible solutions is 14!=14  13 ・・ 1=87,178,291,200.  However, it is not easy to say the problem is hard. It is hard to find a needle in a haystack.  needle = efficient algorithm It seems harder to say that there is no needle in a haystack.  You just might miss a needle in the haystack. No needle? Computational complexity theory provides an answer.

Copyright 2004 Nobuhisa UEDA, Bioinformatics Center, ICR, Kyoto University Key idea  We have two problems A and B. Given input x, we would like to know if x  A (x  B).  Suppose A is efficiently transformed with f into B such that a  A iff f(a)  B.  a: input of A, f: transformation (reduction), f(a): input of B.  This shows that B is harder than (or as hard as) A. A is solvable if there is a way to solve B. ‘yes’ ‘no’ ‘yes’ ‘no’ x 1  B algorithm B 1 x 2  B f(x 3 )  B f(x 4 )  B x 3  A x 4  A

Copyright 2004 Nobuhisa UEDA, Bioinformatics Center, ICR, Kyoto University Overview  Intuitive explanation of hard (time-consuming) problems Decision problems/Optimization problems Polynomial time Class P, Class NP Reductions NP-complete and NP-hard  Examples Superstring problem  Reduction from Traveling salesman problem

Copyright 2004 Nobuhisa UEDA, Bioinformatics Center, ICR, Kyoto University Types of problems (1/2)  Computational problems roughly fall into two categories: Decision problem (output: yes/no), Optimization problem (output: solution with max./min. cost).  Decision problem L input:  string x output:  ‘yes’ if x  L,  ‘no’ otherwise.  Example) L: positive odd numbers. L={1, 3, 5,…} x=3  ‘yes’ since x  L, x=4  ‘no’ since x  L.

Copyright 2004 Nobuhisa UEDA, Bioinformatics Center, ICR, Kyoto University Types of problems (2/2)  Computational problems fall into two categories: Decision problem (output: yes/no), Optimization problem (output: solution with max./min. cost).  Optimization problem M input:  string x  cost function f output:  y such that f(y) is the maximum (or the minimum)  Example) maximize f(x,y)= 2x 2 y–xy 2 +3. x=1  y=1, f(1,1)=4.

Copyright 2004 Nobuhisa UEDA, Bioinformatics Center, ICR, Kyoto University Examples of problems (1/6)  Euler cycle problem (ECP) Decision problem  Input (instance): A undirected graph G=(V,E).  Output: ‘yes’ if there is a graph cycle which uses each edge in G exactly once, ‘no’ otherwise. ‘yes’ ‘no’

Copyright 2004 Nobuhisa UEDA, Bioinformatics Center, ICR, Kyoto University Examples of problems (2/6)  Shortest superstring problem (SSP) Decision problem  Input (instance): A set of sequences S={s 1, … s n } and an integer (threshold) l.  Output: ‘yes’ if there is a string s such that, for all i, s i is a substring of s and the length of s is at most l. ‘no’ otherwise. length: 18 length: 10 ‘yes’ since this string contains any sequences and its length is less than 18. ‘no’ s 1 = TACGA s 2 = ACCC s 3 = CTAAAG s 4 = GAGC TACGACCCTAAAGAGC TACGA ACCC CTAAAG GAGC

Copyright 2004 Nobuhisa UEDA, Bioinformatics Center, ICR, Kyoto University Examples of problems (3/6)  Shortest superstring problem (Min-SSP) Optimization problem  Input (instance): A set of sequences S={s 1, … s n }.  Output: The shortest string s such that, for all i, s i is a substring of s. s 1 = TACGA s 2 = ACCC s 3 = CTAAAG s 4 = GAGC TACGACCCTAAAGAGC

Copyright 2004 Nobuhisa UEDA, Bioinformatics Center, ICR, Kyoto University Examples of problems (4/6)  Traveling salesman problem (TSP) Decision problem  Input (instance): n cities (nodes) with the cost of travel between each pair of them, and an integer (threshold) t.  Output: ‘yes’ if there is a tour of visiting all the cities and returning to your starting point with cost at most t, ‘no’ otherwise. a b c d 4 2 3 4 3 5 max. cost: 14 max. cost: 10 ‘yes’ since the cost of this tour is less than 14. ‘no’ abdca 4233

Copyright 2004 Nobuhisa UEDA, Bioinformatics Center, ICR, Kyoto University Examples of problems (5/6)  Traveling salesman problem (Min-TSP) Optimization problem  Input (instance): n cities (nodes) with the cost of travel between each pair of them.  Output: A tour of visiting all the cities and returning to your staring point with the smallest cost. a b c d 4 2 3 4 3 5 abdca 4233

Copyright 2004 Nobuhisa UEDA, Bioinformatics Center, ICR, Kyoto University Examples of problems (6/6)  Satisfiability problem (SAT) Decision problem  Input (instance): A Boolean function f over variables x 1,…,x n.  Each takes either true (1) or false (0).  Output: ‘yes’ if there is a truth assignment of x 1,…,x n that satisfies f. ‘no’ otherwise. f =  x 1  (x 1  x 2   x 3 )  ( x 1   x 2  x 3   x 4 )  ( x 2   x 3   x 4 )  ( x 1   x 3 ) ‘yes’ since f = T (1) where x 1 = F (0), x 2 = T (1), x 3 = F (0), x 4 = F (0). x1x1 x2x2 x1x2x1x2 x1x2x1x2  x1 x1 TTTTF TFFTF FTFTT FFFFT

Copyright 2004 Nobuhisa UEDA, Bioinformatics Center, ICR, Kyoto University Polynomial time  To simplify the notion of ‘hardness’, we use polynomial-time as the cut-off for efficiency.  polynomial p(n) Function for some k  1 and a k,…,a 0 :  p(n)=a k n k + a k – 1 n k – 1 + ・・・ + ・・・ +a 0. Key property of polynomials  Let p(n) + q(n) be polynomials.  The sum p(n) + q(n) is also polynomial.  A composite function q(p(n)) is also polynomial of n.

Copyright 2004 Nobuhisa UEDA, Bioinformatics Center, ICR, Kyoto University Turing machine  An abstract model of computers. At each step,  based on its current state and the symbol indicated by the header,  the Turing machine changes its internal state, the symbol indicated by the header, and a position of the header. 10011BBBB header s1s1 11011BBBB s2s2 one step

Copyright 2004 Nobuhisa UEDA, Bioinformatics Center, ICR, Kyoto University Hierarchy in the Computational Theory undecidable decidable intractable= exponential time tractable= polynomial time n : input size n nlogn graph isomorphism Traveling salesman Halting problem of Turing machines sorting median 2n2n Based on a figure in http://www-imai.is.s.u-tokyo.ac.jp/~imai/lecture/quantum_complexity.pdf P EXP NP

Copyright 2004 Nobuhisa UEDA, Bioinformatics Center, ICR, Kyoto University Well-known classes of decision problems  P: a set of decision problems solvable by a deterministic Turing machine in polynomial time. ECP  P.  NP: a set of decision problems solvable by a non- deterministic Turing machine in polynomial time. ECP, TSP, SSP, SAT  NP. P NP

Copyright 2004 Nobuhisa UEDA, Bioinformatics Center, ICR, Kyoto University Example of class NP  TSP  NP since TSP is solvable in polynomial time by a non-deterministic Turing machine.  At each branch, one node is chosen non-deterministically.  We suppose that it is possible to select the best choice at each branch with the non-deterministic Turing machine. ab cd 4 2 3 4 3 5 a bcd b a b d d a c a c d d a b a b c c a 121412 16 Time threshold: 14

Copyright 2004 Nobuhisa UEDA, Bioinformatics Center, ICR, Kyoto University Alternate definition of class NP  TSP  NP since TSP is a decision problem defined with a verifier A(x, y) over strings such that  a string y is with length smaller than |x| c where c is a constant,  A(x,y) is computable by a deterministic Turing machine in polynomial time of |x|+|y|. A(x,y) is also computable by a deterministic Turing machine in polynomial time of |x|.  Such y is usually called a certificate for x. ab cd 4 2 3 4 3 5 threshold: 14 ‘yes’ abdca certificate verifier A(x, y) running in polynomial time

Copyright 2004 Nobuhisa UEDA, Bioinformatics Center, ICR, Kyoto University Features of problems in NP (1/2)  The number of possible solutions grows exponentially with the size of inputs. Example) SSP  Threshold: 12 S={ half, alpha, alfalfa } halfalphalfalfa half alpha alfalfa halfalfalpha half alpha alfalfa alphalfalfa half alpha alfalfa alphalfalfahalf half alpha alfalfa alfalfahalfalpha half alpha alfalfa alfalfalphalf half alpha alfalfa

Copyright 2004 Nobuhisa UEDA, Bioinformatics Center, ICR, Kyoto University Features of problems in NP (2/2)  We can verify any instance in polynomial time where we have its certificate (a superstring). Example) SSP  Threshold: 12 S={ half, alpha, alfalfa } alphalfalfa half alpha alfalfa

Copyright 2004 Nobuhisa UEDA, Bioinformatics Center, ICR, Kyoto University Harder problems (1/3)  Suppose that problems L 1 and L 2 are in NP.  C(x) denotes a certificate for x. ‘yes’ verifier A 1 ‘no’ ‘yes’ ‘no’ verifier A 2 x 1  L 1, C(x 1 ) x 2  L 1,  y x 3  L 2, C(x 3 ) x 4  L 2,  y

Copyright 2004 Nobuhisa UEDA, Bioinformatics Center, ICR, Kyoto University Harder problems (2/3)  Suppose that problems L 1 and L 2 are in NP,  C(x) denotes a certificate for x, we construct this transformation called a reduction. ‘yes’ verifier A 1 ‘no’ ‘yes’ ‘no’ x 1  L 1, C(x 1 ) verifier A 1 reduction running in polynomial time x 2  L 1,  y f(x 3 )  L 1, C(f(x 3 )) f(x 4 )  L 1,  y x 3  L 2, C(x 3 ) x 4  L 2,  y

Copyright 2004 Nobuhisa UEDA, Bioinformatics Center, ICR, Kyoto University Harder problems (3/3)  Under these assumptions, verifier A 1 for L 1 is able to say ‘yes’ or ‘no’ correctly for any instance of L 2. We say L 1 is (polynomial-time) reducible to L 2.  We denote this by L 1  L 2 L 2 then has to be harder than or as hard as L 1 if we can construct this reduction.  When a polynomial-time algorithm for L 1 is available, the algorithm also provides a solution in polynomial time for any instance of L 2. ‘‘yes’’ ‘‘no’’ verifier A 1 verifier A 2 f(x 3 )  L 1, C(f(x 3 )) f(x 4 )  L 1,  y x 3  L 2, C(x 3 ) x 4  L 2,  y

Copyright 2004 Nobuhisa UEDA, Bioinformatics Center, ICR, Kyoto University Cook-Levin Theorem  [Theorem] Any decision problem Q in NP is reducible to SAT. SAT is one of the hardest problems in NP.  Such a problem is called a NP-complete problem. ‘yes’ ‘no’ verifier A f’(x 3 )  SAT, C(f(x 3 )) f’(x 4 )  SAT,  y x 3  Q 2, C(x 3 ) x 4  Q 2,  y f(x 1 )  SAT, C(f(x 1 )) f(x 2 )  SAT,  y x 1  Q 1, C(x 1 ) x 2  Q 1,  y

Copyright 2004 Nobuhisa UEDA, Bioinformatics Center, ICR, Kyoto University Good property on reductions  Reduction can contain multiple transformations. ‘yes’ ‘no’ ‘yes’ ‘no’ verifier A 1 verifier A 2 f(x 3 )  L 1, C(f(x 1 )) f(x 4 )  L 1,  y x 3  L 2, C(x 3 ) x 4  L 2,  y x 3  L 2, C(x 3 ) x 4  L 2,  y ‘yes’ ‘no’ verifier A 3 x 3  L 2, C(x 3 ) x 4  L 2,  y

Copyright 2004 Nobuhisa UEDA, Bioinformatics Center, ICR, Kyoto University NP-complete  A problem L in NP is NP-complete if Q is reducible to L for any problem Q in NP, if SAT is reducible to L,  since Q  SAT  L for any Q in NP, or if an NP-complete problem L’ is reducible to L.  since Q  L’  L for any Q in NP,  SAT is reducible to other problems in NP. 3-SAT, Clique, 3-Color, Hamilton path problem, Traveling salesman problem, …  These problems are also the most intractable problems in NP. SAT3-SAT Clique 3-Color HamPathTSP Indep. set Vertex Cover

Copyright 2004 Nobuhisa UEDA, Bioinformatics Center, ICR, Kyoto University How to show that a problem L is NP-complete  It consists of two steps: A decision problem L is in NP. There is a reduction from an NP-complete problem Q to L.  L is (as hard as or) harder than Q.  From the definition of NP-complete, for any problem Q’ in NP, there is a reduction from Q’ to L.  For an optimization problem Max(Min)-L, we can say Max(Min)-L is NP-hard if there is a reduction from an NP-complete problem Q to L.

Copyright 2004 Nobuhisa UEDA, Bioinformatics Center, ICR, Kyoto University Example of reductions (1/9)  We will see that TSP is reducible to SSP. SSP is as hard as or harder than TSP. SSP is NP-complete since TSP is NP-complete and TSP  SSP  Let x be an instance of TSP, where threshold = n.  Let f(x) be a transformed instance of SSP, where threshold = 3n + 2m + 1. xf(x)f(x) (TSP) n vertices m edges with cost 1 threshold: n optimal cost: n+k n+m strings threshold: 3n+2m+1 optimal cost: 3n+2m+k+1 (SSP) ab cd a#A b#B c#C d#D … f

Copyright 2004 Nobuhisa UEDA, Bioinformatics Center, ICR, Kyoto University Example of reductions (2/9)  Reduction from TSP to SSP Input x of TSP  Graph with costs between two nodes (arc 1, without arc: 2) Input f(x) of SSP  Created from the input x of TSP. bc de a a#A b#B c#C d#D e#E AbAc AcAe AeAb BaBc BcBa CdCe CeCd DbDe DeDb EbEc EcEb nodesarcs with cost 1 abcdeabcde babcbabc abacaeabacae cdcecdce dbdedbde ebecebec strings

Copyright 2004 Nobuhisa UEDA, Bioinformatics Center, ICR, Kyoto University Example of reductions (3/9)  x  TSP  f(x)  SSP TSP  the optimal cost is 5 with the tour (a  e  c  d  b  a). n=5, m=11, k=0. SSP  the shortest superstring is 38 long. 3n + 2m + k + 1 = 3  5+2  11+0+1=38. bc de a a#AeAbAcAe#EcEbEc#CdCeCd#DbDeDb#BaBcBa AbAc AcAe AeAb BaBc BcBa CdCe CeCd DbDe DeDb EbEc EcEb e#E a#A c#C b#B d#D 10 2030

Copyright 2004 Nobuhisa UEDA, Bioinformatics Center, ICR, Kyoto University Example of reductions (4/9)  x  TSP  f(x)  SSP Distance graph  A weight on an arc is # characters of a prefix before a match. thin line = cost 2, thick line = cost 3, no line = more than 3. a#Ab#Bc#Cd#De#E AbAc AcAe AeAb BaBc BcBa CdCe CeCd DbDe DeDb EbEc EcEb BcBaBc BcBa BaBc CeCd#D CeCd d#D

Copyright 2004 Nobuhisa UEDA, Bioinformatics Center, ICR, Kyoto University Example of reductions (5/9)  x  TSP  f(x)  SSP Distance graph with cost-2 arcs a#Ab#Bc#Cd#De#E AbAc AcAe AeAb BaBc BcBa CdCe CeCd DbDe DeDb EbEc EcEb b#BaBc b#B BaBc b#BcBa b#B BcBa

Copyright 2004 Nobuhisa UEDA, Bioinformatics Center, ICR, Kyoto University Example of reductions (6/9)  x  TSP  f(x)  SSP Distance graph with cost 2 arcs  The sum of costs of arcs: 2  m. a#Ab#Bc#Cd#De#E AbAc AcAe AeAb BaBc BcBa CdCe CeCd DbDe DeDb EbEc EcEb b#BaBcBa b#B BaBc BcBa

Copyright 2004 Nobuhisa UEDA, Bioinformatics Center, ICR, Kyoto University Example of reductions (7/9)  x  TSP  f(x)  SSP Distance graph with cost 2 arcs  3n + 2m + k + 1 = 3  5+2  11+0+1=38. Tour a  e  c  d  b  a a#Ab#Bc#Cd#De#E AbAc AcAe AeAb BaBc BcBa CdCe CeCd DbDe DeDb EbEc EcEb

Copyright 2004 Nobuhisa UEDA, Bioinformatics Center, ICR, Kyoto University Example of reductions (8/9)  x  TSP  f(x)  SSP TSP  the optimal cost is 6 with the tour (a  e  c  d  b  a). n=5, m=11, k=1. SSP  the shortest superstring is 37 long, where the threshold is 36. 3n + 2m + k + 1 = 3  5+2  10+1+1=37. bc de a a#A b#B c#C d#D e#E AbAc AcAb BaBc BcBa CdCe CeCd DbDe DeDb EbEc EcEb nodes arcs abcdeabcde babcbabc abacabac cdcecdce dbdedbde ebecebec strings

Copyright 2004 Nobuhisa UEDA, Bioinformatics Center, ICR, Kyoto University Example of reductions (9/9)  x  TSP  f(x)  SSP Distance graph  a–  e  c  d  b  a  Additional cost from an edge between and “ AbAc ” to “ e#E ”. a#Ab#Bc#Cd#De#E AbAc AcAb BaBc BcBa CdCe CeCd DbDe DeDb EbEc EcEb

Copyright 2004 Nobuhisa UEDA, Bioinformatics Center, ICR, Kyoto University Results on approximation  Min-SSP is MAX SNP-hard [Blum 94], that is, there is no polynomial time algorithm for Min- SSP that finds approximate solution with arbitrary error ratio if P  NP [Arora 98]. It is hard to efficiently find an arbitrary approximate solution for a given instance of Min-SSP.  On the other hand, several constant-factor (4-, 3-, or 2.5-) approximation algorithms have been developed.

Copyright 2004 Nobuhisa UEDA, Bioinformatics Center, ICR, Kyoto University Summary  NP-complete problems is the most intractable decision problems in NP. No one knows any polynomial-time algorithm that finds a solution of an NP-complete problem.  A decision problem L is NP-complete if L is in NP and there is a polynomial-time reduction from Q to L, where Q is an NP-complete problem.  A optimization problem Max-(Min-)L is NP-hard if there is a polynomial-time reduction from Q to L, where Q is an NP-complete problem.

Copyright 2004 Nobuhisa UEDA, Bioinformatics Center, ICR, Kyoto University Reference (1/2)  Issues on the computational complexity theory Textbooks  M.R. Garey and D.S. Johnson (1979): Computers and Intractability: a guide to the theory of NP-completeness, W. H. Freeman.  O. Watanabe (1992): Introduction to computability and complexity theory, Kindai-Kagaku-sha (in Japanese).  M. Sipser (1996): Introduction to the theory of computation, PWS Publishing company.  M. T. Goodrich and R. Tamassia (2002): Algorithm Design: Foundations, Analysis, and Internet Examples, John Wiley and Sons, Inc. Slides of ‘NP-completeness’ ( http://www.algorithmdesign.net/handouts/NPComplete.pdf ) Article  A. Arora, C. Lund, R. Motwani, M. Sudan, and M. Szegedy (1998): “Proof verification and the hardness of approximation problems”, Journal of the ACM, 45(3), pp. 501 – 555.

Copyright 2004 Nobuhisa UEDA, Bioinformatics Center, ICR, Kyoto University Reference (2/2)  Shortest superstring problem Textbook  D. Gusfield (1997): ‘‘Algorithms on strings, trees, and sequences: computer science and computational biology’’, Chapter 16, Cambridge University Press. Article  A. Blum, T. Jiang, M. Li, J. Tromp, and M. Yannakakis (1994): “Linear approximation of shortest superstring”, Journal of the ACM, 41(4), pp. 630 – 647.

Download ppt "Copyright 2004 Nobuhisa UEDA, Bioinformatics Center, ICR, Kyoto University Topics in Algorithms Introduction to Computational Complexity Theory."

Similar presentations