DNA Computing on Surfaces Anne Condon, Computer Science, UBC Robert Corn, Chemistry, U. Wisconsin Max Lagally, Materials Science, U. Wisconsin Lloyd Smith, Chemistry, U. Wisconsin
Goals Encode information in DNA strands (Adleman, Science 266:1994) Encode information in DNA strands Compute on many strands in parallel: chemical manipulations = logical operations
“…the number of of operations per second … would exceed that of current supercomputers by a thousandfold…remarkable energy efficiency… information density a dramatic improvement over existing storage media Len Adleman, Science 266:1994
“for certain intrinsically complex problems…where existing electronic computers are very inefficient and where massively parallel searches can be organized to take advantage of the operations that molecular biology currently provides, molecular computation might compete with electronic computation in the near term”
Outline Background DNA Computing on Surfaces Conclusions What is computation? What is DNA? DNA computation Research on DNA computation in the biotech industry in the solution of combinatorial problems Models Experiments
What is Computation? (very simple view) Input: string over finite alphabet Process: determine if input satisfies some property Output: yes or no
Satisfy a Property: Binary Inputs Output: 1 and set the output of a circuit to 1 1 or 1 and not Input: 1 1
Satisfy a Property: Non-binary Inputs C Output: Set the output of a generalized circuit to a given value G T G A G C G
Simple Parallel Computation Input: set of strings Process: independently for each input, determine if it satisfies a common circuit Output: indicate whether there exists an input satisfying the circuit
What is DNA?
“DNA Computation:” Affymetrix Arrays Input: strings over {A,C,G,T}, (represented as the corresponding single-stranded DNA) Photolithography used to synthesize and array DNA strands on a planar surface
“DNA Computation:” Affymetrix Arrays Process: e.g. for each input, test if it approximately matches a given string (i.e. hybridizes to Watson-Crick complement of given string)
“DNA Computation:” Affymetrix Arrays Output: fluorescence detection
Adleman’s Hamiltonian Path Experiment Input: generate random paths Process: Output: “yes” iff path remains 2 1 3 select paths from S to T select paths with 7 nodes select paths entering all nodes at least once 5 4 T
Generate Random Paths Associate DNA strands with nodes and edges Join edge strands in test tube to form double-stranded “paths” (hybridization, ligation) Wash to form single-stranded paths 2 3 4 5
Adleman’s Experiment: Select Paths That Enter Node 2 Attach strand associated with node 2 to beads and introduce to test tube The paths that enter node 2 hybridize to strands on the beads Remove beads; wash and detach desired paths
Biomolecular Computation Research “Classical” DNA/RNA computation (e.g. search-and-prune) O(1)-biostep computation (e.g. self-assembly of 3-D DNA molecules)
Biomolecular Computation Research Splicing-based computation Non-computational applications (e.g. exquisite detection, DNA2DNA computation, DNA nanotechnology, DNA tags)
DNA Computing on Surfaces
DNA Computing on Surfaces Advantages over “solution phase” chemistry: Disadvantages: Facile purification steps Reduced interference between strands Easily automated Loss of information density (2D) Lower surface hybridization efficiency Slower surface enzyme kinetics
DNA Surface Model: Input DNA strands representing the set {0,1}^n are synthesized and subsequently immobilized on a surface in a non-addressed fashion
Encoding of Binary Information in DNA Strands Word Bit A strand is comprised of words. Each word is a short DNA strand (16mer) representing one or more bits. 1 2 3 4 . 1 2 3 4 A C T .
DNA Word Design Problem Requirements of a “DNA code”: Success in specific hybridization between a DNA code word and its Watson-crick complement Few false positive signals Virtually all designs enforce combinatorial constraints on the code words Applications: Information storage, retrieval for DNA computing Molecular bar codes for chemical libraries On last bullet, mention that combinatorial design can provide a promising set that can then be pruned using experimental methods
What combinatorial constraints are placed on DNA Codes? Hamming: distance between two code words should be large Reverse complement: distance between a word and the reverse complement of another word should be large Also: frame shift, distinct sub-words, forbidden sub-words, …
Work on DNA code design Seeman (1990): de novo design of sequences for nucleic acid structural engineering Brenner (1997): sorting polynucleotides using DNA tags Shoemaker et al. (1996): analysis of yeast deletion mutants using a parallel molecular bar-coding strategy Many other examples in DNA computing
Word Design Example
DNA Surface Model: Process MARK strands in which bit j = 0 (or 1): hybridize with Watson-Crick complements of word containing bit j, followed by polymerization DESTROY UNMARK
DNA Surface Model: Process MARK strands in which bit j = 0 (or 1) DESTROY unmarked strands: exonuclease degradation UNMARK
DNA Surface Model: Process MARK strands in which bit j = 0 (or 1): hybridize with Watson-Crick complements of word containing bit j, followed by polymerization
DNA Surface Model: Process MARK strands in which bit j = 0 (or 1) DESTROY unmarked strands UNMARK strands: wash in distilled water
DNA Surface Model: Output Detect remaining strands (if any) by detaching strands from surface and amplifying using PCR (polymerase chain reaction).
Computational Power of DNA Surface Model Theorem: Any CNFSAT formula of size m can be computed using O(m) mark, unmark and destroy operations. Theorem: Any circuit of size m can be computed using O(m) mark, unmark, destroy, and append operations.
Surface DNA Computation: the Satisfiability Problem Input: 16 strands Process: Output: exactly those strands that satisfy the circuit remain on the surface. and MARK if bit z = 1 MARK if bit w = 1 MARK if bit y = 0 DESTROY UNMARK MARK if bit w = 0 … or or or or not not not z w y x
DNA Computing on Surfaces: Experiments Students: Tony Frutos, Susan Gillmor, Zhen Guo, Qinghua Liu, Andy Thiel, Liman Wang
MARK Operation: 4-Base Mismatch Word Design
Repeated MARK, DESTROY, UNMARK Operations
Append (DNA Ligase) . Hybridize with Cb . Hybridize with Cab, Wb . Ligate; Wash; Hybridize with Cb.
Two-Word Mark and Destroy A. Mark C1a, C1b, C2b B. Ligate; Melt single words C. Destroy; Unmark; Mark C1a, C1b, C2b.
Surface Attachment Chemistry
Word Readout Strategy PCR amplify words remaining on surface Detect PCR products on single word readout arrays
4-Variable SAT Demo Synthesize; Attach Mark Destroy Umark Readout Cycle
Conclusions DNA computing has expanded the notion of what is computation Solid-phase chemistry is a promising approach to DNA computing DNA computing will require greatly improved DNA surface attachment chemistries and control of chemical and enzymatic processes New research problems in combinatorics, complexity theory and algorithms
Open Problem: DNA Strand Engineering Given a DNA strand, there are polynomial-time algorithms that predict the secondary structure of the strand. Inverse Problem: find an efficient algorithm that, given a desired secondary structure, generates a strand with that structure.