Engineering a Scalable Placement Heuristic for DNA Probe Arrays A.B. Kahng, I.I. Mandoiu, P. Pevzner, S. Reda (all UCSD), A. Zelikovsky (GSU)

Slides:



Advertisements
Similar presentations
Hierarchical Dummy Fill for Process Uniformity Supported by Cadence Design Systems, Inc. NSF, and the Packard Foundation Y. Chen, A. B. Kahng, G. Robins,
Advertisements

New Graph Bipartizations for Double-Exposure, Bright Field Alternating Phase-Shift Mask Layout Andrew B. Kahng (UCSD) Shailesh Vaya (UCLA) Alex Zelikovsky.
Quality and Error Control Coding for DNA Microarrays Olgica Milenkovic ECE Department University of Colorado, Boulder IEEE Denver ComSoc.
Optimization of Linear Placements for Wirelength Minimization with Free Sites A. B. Kahng, P. Tucker, A. Zelikovsky (UCLA & UCSD) Supported by grants from.
Design Flow Enhancements for DNA Arrays Andrew B. Kahng 1 Ion I. Mandoiu 2 Sherief Reda 1 Xu Xu 1 Alex Zelikovsky 3 (1) CSE Department, University of California.
Optimal Testing of Digital Microfluidic Biochips: A Multiple Traveling Salesman Problem R. Garfinkel 1, I.I. Măndoiu 2, B. Paşaniuc 2 and A. Zelikovsky.
S. J. Shyu Chap. 1 Introduction 1 The Design and Analysis of Algorithms Chapter 1 Introduction S. J. Shyu.
Coupling-Aware Length-Ratio- Matching Routing for Capacitor Arrays in Analog Integrated Circuits Kuan-Hsien Ho, Hung-Chih Ou, Yao-Wen Chang and Hui-Fang.
Applications of Single and Multiple UAV for Patrol and Target Search. Pinsky Simyon. Supervisor: Dr. Mark Moulin.
© Yamacraw, 2001 Minimum-Buffered Routing of Non-Critical Nets for Slew Rate and Reliability A. Zelikovsky GSU Joint work with C. Alpert.
Algorithm Strategies Nelson Padua-Perez Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Minimum-Buffered Routing of Non- Critical Nets for Slew Rate and Reliability Control Supported by Cadence Design Systems, Inc. and the MARCO Gigascale.
Border Length Minimization in DNA Array Design A.B. Kahng, I.I. Mandoiu, P. Pevzner, S. Reda (all UCSD), A. Zelikovsky (GSU)
Design and Optimization of Universal DNA Arrays Ion Mandoiu CSE Department & BME Program University of Connecticut.
Evaluation of Placement Techniques for DNA Probe Array Layout Andrew B. Kahng 1 Ion I. Mandoiu 2 Sherief Reda 1 Xu Xu 1 Alex Zelikovsky 3 (1) CSE Department,
Boosting: Min-Cut Placement with Improved Signal Delay Andrew B. KahngSherief Reda CSE & ECE Departments University of CA, San Diego La Jolla, CA
F.F. Dragan (Kent State) A.B. Kahng (UCSD) I. Mandoiu (UCLA/UCSD) S. Muddu (Sanera Systems) A. Zelikovsky (Georgia State) Practical Approximation Algorithms.
F.F. Dragan (Kent State) A.B. Kahng (UCSD) I. Mandoiu (Georgia Tech/UCLA) S. Muddu (Silicon Graphics) A. Zelikovsky (Georgia State) Provably Good Global.
Provably Good Global Buffering Using an Available Buffer Block Plan F. F. Dragan (Kent) A. B. Kahng (UCLA) I. Mandoiu (Gatech) S. Muddu (Silicon graphics)
Algorithms for Biochip Design and Optimization Ion Mandoiu Computer Science & Engineering Department University of Connecticut.
On Legalization of Row-Based Placements Andrew B. KahngSherief Reda CSE & ECE Departments University of CA, San Diego La Jolla, CA 92093
Yield- and Cost-Driven Fracturing for Variable Shaped-Beam Mask Writing Andrew B. Kahng CSE and ECE Departments, UCSD Xu Xu CSE Department, UCSD Alex Zelikovsky.
Practical Iterated Fill Synthesis for CMP Uniformity Supported by Cadence Design Systems, Inc. Y. Chen, A. B. Kahng, G. Robins, A. Zelikovsky (UCLA, UVA.
DPIMM-03 1 Performance-Impact Limited Area Fill Synthesis Yu Chen, Puneet Gupta, Andrew B. Kahng (UCLA, UCSD) Supported by Cadence.
Processing Rate Optimization by Sequential System Floorplanning Jia Wang 1, Ping-Chih Wu 2, and Hai Zhou 1 1 Electrical Engineering & Computer Science.
Border Length Minimization in DNA Array Design A.B. Kahng, I.I. Mandoiu, P.A. Pevzner, S. Reda (all UCSD), A. Zelikovsky (GSU)
May 25, GSU Biotech Symposium1 Minimum PCR Primer Set Selection with Amplification Length and Uniqueness Constraints Ion Mandoiu University of.
1 Area Fill Generation With Inherent Data Volume Reduction Yu Chen, Andrew B. Kahng, Gabriel Robins, Alexander Zelikovsky and Yuhong Zheng (UCLA, UCSD,
Engineering a Scalable Placement Heuristic for DNA Probe Arrays A.B. Kahng, I.I. Mandoiu, P. Pevzner, S. Reda (all UCSD), A. Zelikovsky (GSU)
Hierarchical Dummy Fill for Process Uniformity Supported by Cadence Design Systems, Inc. Y. Chen, A. B. Kahng, G. Robins, A. Zelikovsky (UCLA, UCSD, UVA.
Introduction to Bioinformatics Algorithms Clustering and Microarray Analysis.
Chih-Hung Lin, Kai-Cheng Wei VLSI CAD 2008
Toshihide IBARAKI Mikio KUBO Tomoyasu MASUDA Takeaki UNO Mutsunori YAGIURA Effective Local Search Algorithms for the Vehicle Routing Problem with General.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
BIONFORMATIC ALGORITHMS Ryan Tinsley Brandon Lile May 9th, 2014.
Gene expression & Clustering (Chapter 10)
CSE 242A Integrated Circuit Layout Automation Lecture: Partitioning Winter 2009 Chung-Kuan Cheng.
Network Aware Resource Allocation in Distributed Clouds.
Linear Reduction for Haplotype Inference Alex Zelikovsky joint work with Jingwu He WABI 2004.
1 Global Routing Method for 2-Layer Ball Grid Array Packages Yukiko Kubo*, Atsushi Takahashi** * The University of Kitakyushu ** Tokyo Institute of Technology.
Low-Power Gated Bus Synthesis for 3D IC via Rectilinear Shortest-Path Steiner Graph Chung-Kuan Cheng, Peng Du, Andrew B. Kahng, and Shih-Hung Weng UC San.
Graph Algorithms. Definitions and Representation An undirected graph G is a pair (V,E), where V is a finite set of points called vertices and E is a finite.
Graph Theory And Bioinformatics Jason Wengert. Outline Introduction to Graphs Eulerian Paths & Hamiltonian Cycles Interval Graph & Shape of Genes Sequencing.
Closing the Smoothness and Uniformity Gap in Area Fill Synthesis Supported by Cadence Design Systems, Inc., NSF, the Packard Foundation, and State of Georgia’s.
Informative SNP Selection Based on Multiple Linear Regression
 Chemical-Mechanical Polishing (CMP)  Rotating pad polishes each layer on wafers to achieve planarized surfaces  Uneven features cause polishing pad.
Combinatorial Optimization Problems in Computational Biology Ion Mandoiu CSE Department.
1 Short Term Scheduling. 2  Planning horizon is short  Multiple unique jobs (tasks) with varying processing times and due dates  Multiple unique jobs.
1 Combinatorial Problem. 2 Graph Partition Undirected graph G=(V,E) V=V1  V2, V1  V2=  minimize the number of edges connect V1 and V2.
Register Placement for High- Performance Circuits M. Chiang, T. Okamoto and T. Yoshimura Waseda University, Japan DATE 2009.
Genotype Calling Jackson Pang Digvijay Singh Electrical Engineering, UCLA.
Consolidating Software Tools for DNA Microarray Design and Manufacturing Mourad Atlas Nisar Hundewale Ludmila Perelygina Alex Zelikovsky.
Linear Reduction Method for Tag SNPs Selection Jingwu He Alex Zelikovsky.
Gene expression & Clustering. Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species –Dynamic.
Operational Research & ManagementOperations Scheduling Economic Lot Scheduling 1.Summary Machine Scheduling 2.ELSP (one item, multiple items) 3.Arbitrary.
© Yamacraw, 2002 Symmetric Minimum Power Connectivity in Radio Networks A. Zelikovsky (GSU) Joint work with Joint work with.
CS38 Introduction to Algorithms Lecture 10 May 1, 2014.
1 Microarray Clustering. 2 Outline Microarrays Hierarchical Clustering K-Means Clustering Corrupted Cliques Problem CAST Clustering Algorithm.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Efficient Placement and Dispatch of Sensors in a Wireless Sensor Network You-Chiun Wang, Chun-Chi Hu, and Yu-Chee Tseng IEEE Transactions on Mobile Computing.
1 Combinatorial Problem. 2 Graph Partition Undirected graph G=(V,E) V=V1  V2, V1  V2=  minimize the number of edges connect V1 and V2.
1 Double-Patterning Aware DSA Template Guided Cut Redistribution for Advanced 1-D Gridded Designs Zhi-Wen Lin and Yao-Wen Chang National Taiwan University.
1 A* search Outline In this topic, we will look at the A* search algorithm: –It solves the single-source shortest path problem –Restricted to physical.
NAME THAT ALGORITHM #2 HERE ARE SOME PROBLEMS. SOLVE THEM. GL HF.
Prof. Yu-Chee Tseng Department of Computer Science
Haim Kaplan and Uri Zwick
L. Perelygina (BIO-GSU)
Clustering.
Automated Layout and Phase Assignment for Dark Field PSM
Clustering.
Presentation transcript:

Engineering a Scalable Placement Heuristic for DNA Probe Arrays A.B. Kahng, I.I. Mandoiu, P. Pevzner, S. Reda (all UCSD), A. Zelikovsky (GSU)

Outline DNA probe arrays and unwanted illumination Synchronous array design (2-D placement) Asynchronous array design (3-D placement) Experimental results Extensions Conclusions

Outline DNA probe arrays and unwanted illumination Synchronous array design (2-D placement) Asynchronous array design (3-D placement) Experimental results Extensions Conclusions

DNA Probe Arrays Used in wide range of genomic analyses –Gene expression monitoring, SNP mapping, sequencing by hybridization,… Arrays with up to 1000x1000 probes in commercial use, 10 8 probes envisioned for next generation arrays –Highly scalable algorithms required for array design

Simplified DNA Array Flow Probe Selection Array Manufacturing Hybridization Experiment Gene sequences, position of SNPs, etc. Analysis of Hybridization Intensities Mask Manufacturing Soft/Computational Domain Hard/Biochemistry Domain Mask Design: Placement & Embedding

Array Manufacturing Process Very Large-Scale Immobilized Polymer Synthesis: 1.Treat substrate with chemically protected “linker” molecules, creating rectangular array –Site size = approx. 10x10 microns 2.Selectively expose array sites to light –Light deprotects exposed molecules, activating further synthesis 3.Flush chip surface with solution of protected A,C,G,T –Binding occurs at previously deprotected sites 4.Repeat steps 2&3 until desired probes are synthesized

Photo-Deprotection Step Our concern: diffraction  unwanted illumination  yield decrease

Probe Synthesis Nucleotide deposition sequence ACG G  M 3 C  M 2 A  M 1 CG AC CG AC ACG AG G C Placed probes A A A A A C C C C C C G G G G G

Measuring Unwanted Illumination Nucleotide deposition sequence ACG G  M 3 C  M 2 A  M 1 A A A A A C C C C C C G G G G G border Unwanted illumination  border length CG AC CG AC ACG AG G C Placed probes

Synchronous vs. Asynchronous Synthesis (a) periodic deposition sequence (b) Synchronous embedding of CTG (c) Asynchronous leftmost embedding of CTG (d) Another asynchronous embedding T G C A T G T G C A … C A 4-group (a) C G T (b) C T G (c) G C T (d)

Outline DNA probe arrays and unwanted illumination Synchronous array design (2-D placement) Asynchronous array design (3-D placement) Experimental results Extensions Conclusions

Problem Formulation (Synchronous Case) Synchronous Array Design (2-D Placement) Problem: Minimize placement cost of Hamming graph H (vertices = probes, distance = Hamming) On 2-dimensional grid graph G2 (N x N array, edges b/w distance 1 neighbors) H probe G2 site

2-D Placement Lower Bound Sum of Hamming distances to 4 closest neighbors minus weight of 4N heaviest arcs H probe G2

TSP+1-Threading Placement Hubbell 90’s Find TSP tour/path over given probes w.r.t. Hamming distance Thread TSP path in the grid row by row Hannenhalli,Hubbell,Lipshutz, Pevzner’02 Place the probes according to 1-Threading Further decreases total border by 20%

Lexicographical Sorting +1-Threading A A T G C A A T G A T G G Radix-sort the probes in lexicographical order 123 C C Thread on the chip

Matching Based Probe Placement Select an independent (mutually nonadjacent) set of placed probes Re-embed using optimal perfect matching Total cost can only decrease or remain the same Runtime: roughly proportional to square of independent set size

Sliding Window Matching There is a trade-off between solution quality and size/overlap of windows Iterate SlidingWindowMatching over the chip until improvement drops below 0.1%

Effect of Window Size on Solution Quality Increased window size/overlap decreases number of conflicts, but increases runtime

Epitaxial Placement Algorithm Simulates crystal-growth Start with arbitrary probe placed at center Maintain a best probe-candidate (i.e, a probe with min number of conflicts to the already placed neighbors) for each border site Iteratively fill the border site with minimum increase in border length - give priority to sites with more neighbors filled

Tile- and Row- Epitaxial Tile-epitaxial –Divide array into 100x100 tiles –Run Epitaxial within each tile –Take into account border of already placed tiles Row-epitaxial –Place probes by a fast method, e.g., sort+1-thread –Re-place probes row by row, sequentially filling sites within a row –Assign to each site a probe with min number of conflicts among the unplaced probes from following K rows

2-D Placement Algorithm Comparison: Border Conflict

2-D Placement Algorithm Comparison: Runtime

Outline DNA probe arrays and unwanted illumination Synchronous array design (2-D placement) Asynchronous array design (3-D placement) Experimental results Extensions Conclusions

Problem Formulation (Asynchronous Case) Asynchronous synthesis: –Periodic nucleotide deposition sequence, e.g., (ACTG) p –Every probe grows asynchronously  Border length = Hamming distance between embedded probes Asynchronous Array (3-D Placement) Design Problem: –Minimize placement cost of embedded-probe Hamming graph H (vertices=probes, distance = Hamming b/w embedded probes) –on 2-dimensional grid graph G2 (N x N array, edges b/w neighbors) H probe G2 site

Lower Bound Sum of distances to 4 closest neighbors minus weight of 4N heaviest arcs –Distance between two probes of length p = 2p - |Longest Common Subsequence| Non-tight bound: example with LB = 8 and best placement cost = 10 AC CTTG GA Optimum placement AC CTTG GA Nucleotide deposition sequence S=ACTGA A G T C A A G G TT C C A (c)

Optimal Probe Alignment A C T ACG T ACGT Source Sink Find best alignment of probe wrt embedded neighbors Dynamic Programming: – Source-sink paths corresponds to feasible embeddings – O[(probe length) x (deposition sequence length)] Can be extended to simultaneous alignment of two adjacent probes (2x1) with increase by O(probe length)

3-D Placement Flows -Simultaneous placement and alignment -asynchronous epitaxial (slow and low quality) -Synchronous placement followed by in-place probe alignment (analogous to standard for VLSI flow partition) -using previous DP to do in-place probe alignment -Synchronous placement followed by probe alignment with reshuffle (analogous to feedback loops in VLSI flows) -asynchronous sliding window matching

Algorithms for In-Place Probe Alignment Asynchronous re-embedding after 2-dim placement – Greedy Algorithm While there exist probes to re-embed with gain –Optimally re-embed the probe with the largest gain –Batched greedy: speed-up by avoiding recalculations –Chessboard Algorithm While there is gain –Re-embed probes in green sites –Re-embed probes in red sites

Comparison of In-Place Probe Alignments Chip size LBTSP+1ThrGreedyChessboard2x1 Chessboard %LB CPU%LBCPU%LBCPU Post-placement LB = sum of distances to adjacent probes –D istance between two probes of length p = 2p - |LCS | –Useful for assessing quality of algorithms that change probe embeddings but do not change probe placement

Outline DNA probe arrays and unwanted illumination Synchronous array design (2-D placement) Asynchronous array design (3-D placement) Experimental results Extensions Conclusions

3-D vs. 2-D Placement Results Chip size TSP+1ThrTSP+1Thr+ Chessboard Epitaxial+ Chessboard SyncSWM+ Chessboard AsyncSWM Cost CPUCostCPUCostCPUCostCPU

3-D Placement Algorithm Comparison: Border Conflict

3-D Placement Algorithm Comparison: Runtime

Outline DNA probe arrays and unwanted illumination Synchronous array design (2-D placement) Asynchronous array design (3-D placement) Experimental results Extensions Conclusions

Practical Extensions Distant-dependent border conflict weights  Take into account conflicts between 2-,3-hop neighbors rather than only immediate neighbors Position-dependent border conflict weights  In alignment DP for two sequences take into account importance of conflicts in the middle of probes – alignment cost has weights on conflicts which depend on conflict position Polymorphic probes  Chip contains SNP’s, e.g. pairs of probes different in a single position – they should be placed together and alignment DP should align them simultaneously

Alignment DP for 2-SNP’s Optimal Embedding of A{C,T}T

Simplified DNA Array Flow Probe Selection Array Manufacturing Hybridization Experiment Gene sequences, position of SNPs, etc. Analysis of Hybridization Intensities Mask Manufacturing Soft/Computational Domain Hard/Biochemistry Domain Mask Design: Placement & Embedding

Enhanced DNA Array Design Flow Probe Selection Mask Design: Placement & Embedding

Enhanced DNA Array Design Flow Probe Selection Mask Design: Placement & Embedding Probe Pools

Enhanced DNA Array Design Flow Probe Selection Mask Design: Placement & Embedding Deposition Mask Design Probe Pools

Enhanced DNA Array Design Flow Probe Selection Mask Design: Placement & Embedding Deposition Mask Design Probe Pools Design Rules &Parameters

Enhanced DNA Array Design Flow Probe Selection Mask Design: Placement & Embedding Deposition Mask Design Conflict Map Probe Pools Design Rules &Parameters

Enhanced DNA Array Design Flow Probe Selection Mask Design: Placement & Embedding Deposition Mask Design Test/Control Structure Design Conflict Map Probe Pools Design Rules &Parameters

Summary Contributions: –Epitaxial placement  reduces by extra 10% over the previously best known method –Asynchronous placement problem formulation –Postplacement improvement by extra % –Lower bounds –Scalable Placements (1000x1000 in 20min) Ongoing work –Comparison on industrial benchmarks –Experiments with algorithms for extended formulations (SNPs, distance-dependent weights, etc.) Future Directions –Design flow enhancements –Nucleotide deposition sequence design –Partitioning and integration for manufacturing cost reduction

Thank you!