1 Biological Computing – DNA solution Presented by Wooyoung Kim 4/8/09 CSc 8530 Parallel Algorithms, Spring 2009 Dr. Sushil K. Prasad.

Slides:



Advertisements
Similar presentations
Analysis of Algorithms
Advertisements

Complexity Classes: P and NP
NP-Hard Nattee Niparnan.
Midwestern State University Department of Computer Science Dr. Ranette Halverson CMPS 2433 – CHAPTER 4 GRAPHS 1.
Max Cut Problem Daniel Natapov.
Ashish Gupta Ashish Gupta Unremarkable Problem, Remarkable Technique Operations in a DNA Computer DNA : A Unique Data Structure ! Pros.
NP-Completeness: Reductions
Department of Computer Science & Engineering
Enrique Blanco - imim.es © 2006 Enrique Blanco (2006) A few ideas about DNA computing.
The Theory of NP-Completeness
1 DNA Computing: Concept and Design Ruoya Wang April 21, 2008 MATH 8803 Final presentation.
Computability and Complexity 23-1 Computability and Complexity Andrei Bulatov Search and Optimization.
Complexity 15-1 Complexity Andrei Bulatov Hierarchy Theorem.
Computability and Complexity 14-1 Computability and Complexity Andrei Bulatov Cook’s Theorem.
Computability and Complexity 13-1 Computability and Complexity Andrei Bulatov The Class NP.
Montek Singh COMP Nov 15,  Two different technologies ◦ TODAY: DNA as biochemical computer  DNA molecules encode data  enzymes, probes.
NP-complete and NP-hard problems Transitivity of polynomial-time many-one reductions Definition of complexity class NP –Nondeterministic computation –Problems.
Graphs 4/16/2017 8:41 PM NP-Completeness.
NP-Complete Problems Reading Material: Chapter 10 Sections 1, 2, 3, and 4 only.
The Theory of NP-Completeness
NP-Complete Problems Problems in Computer Science are classified into
NP-complete and NP-hard problems
Analysis of Algorithms CS 477/677
Computational Complexity, Physical Mapping III + Perl CIS 667 March 4, 2004.
NP-complete examples CSC3130 Tutorial 11 Xiao Linfu Department of Computer Science & Engineering Fall 2009.
A Space-Efficient Randomized DNA Algorithm for k-SAT Kevin Chen and Vijay Ramachandran Princeton University.
Chapter 11: Limitations of Algorithmic Power
Complexity ©D.Moshkovitz 1 Paths On the Reasonability of Finding Paths in Graphs.
Physical Mapping of DNA Shanna Terry March 2, 2004.
MCS312: NP-completeness and Approximation Algorithms
1 The Theory of NP-Completeness 2012/11/6 P: the class of problems which can be solved by a deterministic polynomial algorithm. NP : the class of decision.
DNA Computing on a Chip Mitsunori Ogihara and Animesh Ray Nature, vol. 403, pp Cho, Dong-Yeon.
Nattee Niparnan. Easy & Hard Problem What is “difficulty” of problem? Difficult for computer scientist to derive algorithm for the problem? Difficult.
Beyond Silicon: Tackling the Unsolvable with DNA.
1 Computing with DNA L. Adelman, Scientific American, pp (Aug 1998) Note: This ppt file is based on a student presentation given in October, 1999.
MCS 312: NP Completeness and Approximation algorithms Instructor Neelima Gupta
Lecture 22 More NPC problems
The Complexity of Optimization Problems. Summary -Complexity of algorithms and problems -Complexity classes: P and NP -Reducibility -Karp reducibility.
Algorithms and Running Time Algorithm: Well defined and finite sequence of steps to solve a well defined problem. Eg.,, Sequence of steps to multiply two.
NP Complexity By Mussie Araya. What is NP Complexity? Formal Definition: NP is the set of decision problems solvable in polynomial time by a non- deterministic.
CSC 413/513: Intro to Algorithms NP Completeness.
CSCI 2670 Introduction to Theory of Computing November 29, 2005.
Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.
CSE 024: Design & Analysis of Algorithms Chapter 9: NP Completeness Sedgewick Chp:40 David Luebke’s Course Notes / University of Virginia, Computer Science.
NP-COMPLETENESS PRESENTED BY TUSHAR KUMAR J. RITESH BAGGA.
EMIS 8373: Integer Programming NP-Complete Problems updated 21 April 2009.
CSCI 3160 Design and Analysis of Algorithms Tutorial 10 Chengyu Lin.
Fast parallel molecular solution to the Hitting-set problem Speaker Nung-Yue Shi.
1 The Theory of NP-Completeness 2 Cook ’ s Theorem (1971) Prof. Cook Toronto U. Receiving Turing Award (1982) Discussing difficult problems: worst case.
What is DNA Computing? Shin, Soo-Yong Artificial Intelligence Lab.
NP-Complete Problems. Running Time v.s. Input Size Concern with problems whose complexity may be described by exponential functions. Tractable problems.
NP-COMPLETE PROBLEMS. Admin  Two more assignments…  No office hours on tomorrow.
NP-Complete problems.
DNA computing on a chip Mitsunori Ogihara and Animesh Ray Nature, 2000 발표자 : 임예니.
Strings Basic data type in computational biology A string is an ordered succession of characters or symbols from a finite set called an alphabet Sequence.
CS6045: Advanced Algorithms NP Completeness. NP-Completeness Some problems are intractable: as they grow large, we are unable to solve them in reasonable.
NPC.
CSC 413/513: Intro to Algorithms
Towards Autonomous Molecular Computers Towards Autonomous Molecular Computers Masami Hagiya, Proceedings of GP, Nakjung Choi
COMPLEXITY. Satisfiability(SAT) problem Conjunctive normal form(CNF): Let S be a Boolean expression in CNF. That is, S is the product(and) of several.
CSE 421 Algorithms Richard Anderson Lecture 27 NP-Completeness Proofs.
COSC 3101A - Design and Analysis of Algorithms 14 NP-Completeness.
Computability Examples. Reducibility. NP completeness. Homework: Find other examples of NP complete problems.
The NP class. NP-completeness Lecture2. The NP-class The NP class is a class that contains all the problems that can be decided by a Non-Deterministic.
 2005 SDU Lecture15 P,NP,NP-complete.  2005 SDU 2 The PATH problem PATH = { | G is a directed graph that has a directed path from s to t} s t
Probabilistic Algorithms
NP-Complete Problems.
DNA computing on surfaces
Instructor: Aaron Roth
DNA Solution of Hard Computational Problems
Presentation transcript:

1 Biological Computing – DNA solution Presented by Wooyoung Kim 4/8/09 CSc 8530 Parallel Algorithms, Spring 2009 Dr. Sushil K. Prasad

Outline  NP and NP-complete  Biological computation  Hamiltonian path problem (HPP)  Satisfaction problem  Generalized SAT  Discussion

NP and NP-complete NP vs. NP-complete  NP problems: Non-deterministic Polynomial Time complexity.  NP-complete : all NP problems can be reduced to it, and if it has an efficient solution, then so do all NP problems.  No general efficient solution exists for any NP-complete problem.

Biological computation – Adv. Speed of any computer is determined by: 1. How many parallel processes it has. 2. How many steps each can perform per unit time. Biological computations could potentially have vastly more parallelism.  Ex: 3 g water contains approx molecules. The second factor favors conventional computers, since biological machine is limited to small fraction of a biological experiment. However, the advantage in parallelism is so huge, the difference in the execution time is not a problem.

Biological computation – Disadv. Even with parallelism, brute force approach is not always feasible, too inefficient. The biological computer can solve any HPP of 70 or less edges. Practically, there is not a great need, though.

Hamiltonian Path Problem L.M. Adleman. "Molecular Computation of Solutions To Combinatorial Problem," Science, vol. 266, 1994, pp Using DNA, solve Hamiltonian Path Problem efficiently.

Hamiltonian Path Problem  1  2  3  4  5  6

Algorithm for HPP 1. Generating random paths through the graph. 2. Keep only those paths that begin with v in and end with v out. 3. If the graph has n vertices, then keep only those paths that enter exactly n vertices. 4. Keep only those paths that enter all of the vertices of the graph at least once. 5. If any paths remain, say “Yes”; otherwise say “No”.

Implementing Step 1 1. Generating random paths through the graph.  Ligation reaction (annealing)  Each vertex encoded by random 20bp sequences (O i )  Approximately 3x10 13 copies of the associated oligonucleotides (a short nucleic acid polymer) were added. TATCGGATCG GTATATCCGAGCTATTCGAG CTTAAAGCTA GTATATCCGA GCTATTCGAG Vertex 2 ( O 2 )Vertex 3 ( O 3 ) Edge 2->3

Implementing Step 2 2. Keep only those paths that begin with v in (O 0 )and end with v out (O 6 ).  The product of step 1 were amplified by PCR (polymerase chain reaction) using O 0 (starting point) and O 6 (ending point)  Thus keep only those molecules encode paths which begin with v in and end with v out. O0O0 O6O6

Implementing Step 3 3. If the graph has n vertices, then keep only those paths that enter exactly n vertices.  The product of Step2 was run on an agarose gel.  The 140bp band (corresponding to double strand (ds) DNA encoding paths entering exactly seen vertices) was excised and soaked in ddH 2 O to extract DNA.

Implementing Step 4 4. Keep only those paths that enter all of the vertices of the graph at least once.  The product of step 3 was affinity-purified with a biotin-avidin magnetic bead system, by First generating single stranded (ss) DNA from the dsDNA of step3 Then incubating the ssDNA with the O 1 conjugated to magnetic beads. Only those ssDNA containing O 1 annealed to the bound O 1, and were were retained. It is repeated with O 2 until O 5

Implementing Step 5 5. If any paths remain, say “Yes”; otherwise say “No”.  The product of step 4 was amplified by PCR and run on a gel.

Drawbacks 7 days of lab work. Step 4 (magnetic bead separation) is most labor-intensive work. Possibility of errors  Pseudo-paths  Inexact reactions  Hairpin loops

Advantages The number of different oligonucleotides required should grow linearly with the number of edges.  O(n) The fastest supercomputer vs. DNA computer  10 6 op/sec vs op/sec  10 9 op/J vs op/J (in ligation step)  1bit per nm 3 vs. 1 bit per 1 nm 3 (video tape vs. molecules)

Satisfaction problem SAT consists of a Boolean formula of,, where each C l is a clause of the form. V i is a variable or its negation. Ex. Problem : find values of the variables so that the formula is 1. If we have n variables, then there are 2 n choices to search.

Satisfaction problem The graph G n e ncoding two-bit numbers Graph formulation Suppose we have n variables in the formula, where a i represents the variables. This graph is constructed so that all paths from a 1 to a n+1 encode an n-bit binary number. At each stage, a path has exactly two choices : unprimed  1, primed  0 Ex. A path a 1 x’ a 2 ya 3  01, that is, x is 0 and y is 1. unprimed  1 primed  0

Satisfaction problem Example  Number of variables : n=2 (x and y)  Number of clauses : m =2  Construct a graph with (n+1) +2n nodes for each clause and connect them as the following;

Satisfaction problem Graph paths and SAT problem If we have a path from a 1 to a n+1, that means each variable is represented by 0 or 1 and the formula satisfies. If there is no path from start to end, then the formula does not have any solution (not satisfies). Using the properties of DNA annealing (Watson-Crick complement binding), we can construct a graph representing the variables, and using test tubes, we can either obtain paths (satisfies) or no paths at all (not satisfies).

Satisfaction problem Assign random pattern of DNA strings to each vertex. (ex. length 8) Then decide the pattern of DNA strings of each edge. ATTCGGAATTACGGGTGGATTCCA TATCCCGA GCTAAGCT GGCTCGTT CCCAATTA CCTTATAG CCTTCGATTCGAAATG GGCTAATG CCCACCGA CCCAGGGT GCAACCTA TAATCCTA

Satisfaction problem In an initial test tube t 0, put many copies of the DNA strings corresponding the vertices and the edges. (many copies of each vertex and each edge) Put a sequence of complement of the first half of a 1 and complement of the last half of a 3 : To show the start and end strings. ATTCGGAATTACGGGTGGATTCCA TATCCCGA GCTAAGCT GGCTCGTT CCCAATTA TAAGAGGT

1.Let t 0 be an initial test tube containing all the DNA strings of vertices and edges. 2.Since the first clause is (that is, the first variable x is 1), operate E(t 0,1,1) setting the first variable x to 1. Then extract only those corresponding patterns (10,11) and put it to t Put the remainder (pattern 00, 01), to t’ 0-1 and operate E(t’ 0-1,2,1) setting the second variable y to 1. Then extract only those corresponding patterns from t’ 0-1 and put them to t Pour t 0-1 and t 0-2 together to form t 1 test tubes. 5.Note that now the patterns of t 1 is 01,10,11 and that is the solution of the first clause. Satisfaction problem

6.Repeat the same process for the second clause starting from t 1. 7.Since the second clause is operate E(t 1, 1, 0) to extract it to the t1-1 test tube. 8.Put the remainder to t’ 1-1 and make t 1-2 by operating E(t’ 1-1, 2,0). 9.Pour t 1-1 and t 1-2 into t 2 test tube. 10.Check to see if there is any DNA in the last tube. 11.The satisfying assignments are exactly those in this final test tube. Satisfaction problem

Test tubeOPValues t0t0 initial00, 01, 10, 11 t 0-1 E(t 0,1,1) 10, 11 t’ 0-1 Reminder of t , 01 t 0-2 E(t’ 0-1,2,1) 01 t1t1 Put t 0-1 and t 0-2 together01, 10, 11 t 1-1 E(t 1, 1, 0) 01 t’ 1-1 Reminder of t ,11 t 1-2 E(t’ 1-1,2,0) 10 t2t2 Put t 1-1 and t 1-2 together01, 10

Satisfaction problem  For general formula with n variables and m clauses, we only need O(m) number of test tubes. (For each clause there are constant number of test tubes are additionally constructed)  The last tube are checked to see if there is any patterns (paths) left from the start vertex to the end vertex.

Generalized SAT  Generalize this to consider problems that correspond to any Boolean formula.  Formulas are defined by the recursive definition 1.Any variable x is a formula 2.If F is a formula, then so is F 3.If F and G are formulas, then so are and

Generalized SAT  Size of the formula S: the number of operations used to build the formula.  SAT problem: given a formula, find an assignment of Boolean values of variables so that the formula is true.  NP-complete.  Claim: A O(S) number of DNA experiments can solve this SAT problem.

Generalized SAT – step1 1. Construct a contact network for a formula.  A contact network is a directed graph with source s and sink t  Each edge is x or  Given any assignment, an edge is connected if it is 1. For example, the above graph is 1 only if w=1 or x=y=z=1

Generalized SAT – step2 2. Solve the SAT problem of a contact network by deciding:  Whether or not there is an assignment of values to the variables such that there is a directed connected path from s to t.  If two edges have the same label, they should be consistent.  How many of DNA experiments? – O(S)

Generalized SAT – claims  Note that the result follows from the two claims:  Given any formula of size S, there is a contact network of size linear in S, s.t. if the formula satisfies then the network satisfies.  Given any contact network of size S, the SAT problem for the network can be solved in O(S) DNA experiments.

Generalized SAT – claim 1 Existence of contact network for given formula: simple formula Any formula can be placed into a normal form with DeMorgan’s laws.

Generalized SAT – claim 1 (A)The networks for (B) The networks for Existence of contact network for given formula: general formula G is a network for E, H is a network for F.

Generalized SAT – claim2 Solve the SAT problem for any contact network using O(S) number of DNA experiments  Associate a test tube P v with each node v in the contact network.  The test tube P t associated with the sink t is the “answer”  Suppose that v  u is an edge with the label x and that P v is already constructed. Then construct P u by doing the extraction E(P v, x,1)  If several edges leave a vertex v then use an amplify step to get multiple copies in test tube P v  If several enter a vertex v, then pour the resulting test tubes together to form P v.

Discussion  Can we actually build DNA computers?  All the methods described here assumes that all the operations are perfect without error.  However, the operations are not perfect.  In the future, the DNA-based computers are hoped to be a practical means of solving hard problems.

35 Reference R.J. Lipton. “DNA solution of hard computational problems,” Science, vol. 268, 1995, pp L.M. Adleman. "Molecular Computation of Solutions To Combinatorial Problem," Science, vol. 266, 1994, pp R.J. Lipton. “Speeding Up Computations via Molecular Biology,” unpublished manuscript, available at