Presentation is loading. Please wait.

Presentation is loading. Please wait.

Engineering a Scalable Placement Heuristic for DNA Probe Arrays A.B. Kahng, I.I. Mandoiu, P. Pevzner, S. Reda (all UCSD), A. Zelikovsky (GSU)

Similar presentations


Presentation on theme: "Engineering a Scalable Placement Heuristic for DNA Probe Arrays A.B. Kahng, I.I. Mandoiu, P. Pevzner, S. Reda (all UCSD), A. Zelikovsky (GSU)"— Presentation transcript:

1 Engineering a Scalable Placement Heuristic for DNA Probe Arrays A.B. Kahng, I.I. Mandoiu, P. Pevzner, S. Reda (all UCSD), A. Zelikovsky (GSU)

2 Outline DNA probe arrays and unwanted illumination Synchronous array design (2-D placement) Asynchronous array design (3-D placement) Experimental results Extensions Conclusions

3 Outline DNA probe arrays and unwanted illumination Synchronous array design (2-D placement) Asynchronous array design (3-D placement) Experimental results Extensions Conclusions

4 DNA Probe Arrays Used in wide range of genomic analyses –Gene expression monitoring, SNP mapping, sequencing by hybridization,… Arrays with up to 1000x1000 probes in commercial use, 10 8 probes envisioned for next generation arrays –Highly scalable algorithms required for array design

5 Simplified DNA Array Flow Probe Selection Array Manufacturing Hybridization Experiment Gene sequences, position of SNPs, etc. Analysis of Hybridization Intensities Mask Manufacturing Soft/Computational Domain Hard/Biochemistry Domain Mask Design: Placement & Embedding

6 Array Manufacturing Process Very Large-Scale Immobilized Polymer Synthesis: 1.Treat substrate with chemically protected “linker” molecules, creating rectangular array –Site size = approx. 10x10 microns 2.Selectively expose array sites to light –Light deprotects exposed molecules, activating further synthesis 3.Flush chip surface with solution of protected A,C,G,T –Binding occurs at previously deprotected sites 4.Repeat steps 2&3 until desired probes are synthesized

7 Photo-Deprotection Step Our concern: diffraction  unwanted illumination  yield decrease

8 Probe Synthesis Nucleotide deposition sequence ACG G  M 3 C  M 2 A  M 1 CG AC CG AC ACG AG G C Placed probes A A A A A C C C C C C G G G G G

9 Measuring Unwanted Illumination Nucleotide deposition sequence ACG G  M 3 C  M 2 A  M 1 A A A A A C C C C C C G G G G G border Unwanted illumination  border length CG AC CG AC ACG AG G C Placed probes

10 Synchronous vs. Asynchronous Synthesis (a) periodic deposition sequence (b) Synchronous embedding of CTG (c) Asynchronous leftmost embedding of CTG (d) Another asynchronous embedding T G C A T G T G C A … C A 4-group (a) C G T (b) C T G (c) G C T (d)

11 Outline DNA probe arrays and unwanted illumination Synchronous array design (2-D placement) Asynchronous array design (3-D placement) Experimental results Extensions Conclusions

12 Problem Formulation (Synchronous Case) Synchronous Array Design (2-D Placement) Problem: Minimize placement cost of Hamming graph H (vertices = probes, distance = Hamming) On 2-dimensional grid graph G2 (N x N array, edges b/w distance 1 neighbors) H probe G2 site

13 2-D Placement Lower Bound Sum of Hamming distances to 4 closest neighbors minus weight of 4N heaviest arcs H probe G2

14 TSP+1-Threading Placement Hubbell 90’s Find TSP tour/path over given probes w.r.t. Hamming distance Thread TSP path in the grid row by row Hannenhalli,Hubbell,Lipshutz, Pevzner’02 Place the probes according to 1-Threading Further decreases total border by 20%

15 Lexicographical Sorting +1-Threading A A T G C A A T G A T G G Radix-sort the probes in lexicographical order 123 C C Thread on the chip

16 Matching Based Probe Placement 1 3 2 5 4 Select an independent (mutually nonadjacent) set of placed probes Re-embed using optimal perfect matching 2 2 3 1 4 Total cost can only decrease or remain the same Runtime: roughly proportional to square of independent set size

17 Sliding Window Matching There is a trade-off between solution quality and size/overlap of windows Iterate SlidingWindowMatching over the chip until improvement drops below 0.1%

18 Effect of Window Size on Solution Quality Increased window size/overlap decreases number of conflicts, but increases runtime

19 Epitaxial Placement Algorithm Simulates crystal-growth Start with arbitrary probe placed at center Maintain a best probe-candidate (i.e, a probe with min number of conflicts to the already placed neighbors) for each border site Iteratively fill the border site with minimum increase in border length - give priority to sites with more neighbors filled

20 Tile- and Row- Epitaxial Tile-epitaxial –Divide array into 100x100 tiles –Run Epitaxial within each tile –Take into account border of already placed tiles Row-epitaxial –Place probes by a fast method, e.g., sort+1-thread –Re-place probes row by row, sequentially filling sites within a row –Assign to each site a probe with min number of conflicts among the unplaced probes from following K rows

21 2-D Placement Algorithm Comparison: Border Conflict

22 2-D Placement Algorithm Comparison: Runtime

23 Outline DNA probe arrays and unwanted illumination Synchronous array design (2-D placement) Asynchronous array design (3-D placement) Experimental results Extensions Conclusions

24 Problem Formulation (Asynchronous Case) Asynchronous synthesis: –Periodic nucleotide deposition sequence, e.g., (ACTG) p –Every probe grows asynchronously  Border length = Hamming distance between embedded probes Asynchronous Array (3-D Placement) Design Problem: –Minimize placement cost of embedded-probe Hamming graph H (vertices=probes, distance = Hamming b/w embedded probes) –on 2-dimensional grid graph G2 (N x N array, edges b/w neighbors) H probe G2 site

25 Lower Bound Sum of distances to 4 closest neighbors minus weight of 4N heaviest arcs –Distance between two probes of length p = 2p - |Longest Common Subsequence| Non-tight bound: example with LB = 8 and best placement cost = 10 AC CTTG GA Optimum placement AC CTTG GA 1 1 1 1 1111 Nucleotide deposition sequence S=ACTGA A G T C A A G G TT C C A (c)

26 Optimal Probe Alignment A C T ACG T ACGT Source Sink Find best alignment of probe wrt embedded neighbors Dynamic Programming: – Source-sink paths corresponds to feasible embeddings – O[(probe length) x (deposition sequence length)] Can be extended to simultaneous alignment of two adjacent probes (2x1) with increase by O(probe length)

27 3-D Placement Flows -Simultaneous placement and alignment -asynchronous epitaxial (slow and low quality) -Synchronous placement followed by in-place probe alignment (analogous to standard for VLSI flow partition) -using previous DP to do in-place probe alignment -Synchronous placement followed by probe alignment with reshuffle (analogous to feedback loops in VLSI flows) -asynchronous sliding window matching

28 Algorithms for In-Place Probe Alignment Asynchronous re-embedding after 2-dim placement – Greedy Algorithm While there exist probes to re-embed with gain –Optimally re-embed the probe with the largest gain –Batched greedy: speed-up by avoiding recalculations –Chessboard Algorithm While there is gain –Re-embed probes in green sites –Re-embed probes in red sites

29 Comparison of In-Place Probe Alignments Chip size LBTSP+1ThrGreedyChessboard2x1 Chessboard %LB CPU%LBCPU%LBCPU 100 152.0125.740120.554119.4480 200100150.2126.3154120.9221119.71915 300100149.1126.7357121.5522121.64349 500100147.9127.1943121.41423120.215990 Post-placement LB = sum of distances to adjacent probes –D istance between two probes of length p = 2p - |LCS | –Useful for assessing quality of algorithms that change probe embeddings but do not change probe placement

30 Outline DNA probe arrays and unwanted illumination Synchronous array design (2-D placement) Asynchronous array design (3-D placement) Experimental results Extensions Conclusions

31 3-D vs. 2-D Placement Results Chip size TSP+1ThrTSP+1Thr+ Chessboard Epitaxial+ Chessboard SyncSWM+ Chessboard AsyncSWM Cost CPUCostCPUCostCPUCostCPU 1005548494398291134190692744332741417890875 2002140903172335219011624988444116936584616366583676 3004667882380176512028--- 374672211236152828406 5001270247410426237109648--- 10049442302968691822351 1000--- 3889879213073800503954501

32 3-D Placement Algorithm Comparison: Border Conflict

33 3-D Placement Algorithm Comparison: Runtime

34 Outline DNA probe arrays and unwanted illumination Synchronous array design (2-D placement) Asynchronous array design (3-D placement) Experimental results Extensions Conclusions

35 Practical Extensions Distant-dependent border conflict weights  Take into account conflicts between 2-,3-hop neighbors rather than only immediate neighbors Position-dependent border conflict weights  In alignment DP for two sequences take into account importance of conflicts in the middle of probes – alignment cost has weights on conflicts which depend on conflict position Polymorphic probes  Chip contains SNP’s, e.g. pairs of probes different in a single position – they should be placed together and alignment DP should align them simultaneously

36 Alignment DP for 2-SNP’s Optimal Embedding of A{C,T}T

37 Simplified DNA Array Flow Probe Selection Array Manufacturing Hybridization Experiment Gene sequences, position of SNPs, etc. Analysis of Hybridization Intensities Mask Manufacturing Soft/Computational Domain Hard/Biochemistry Domain Mask Design: Placement & Embedding

38 Enhanced DNA Array Design Flow Probe Selection Mask Design: Placement & Embedding

39 Enhanced DNA Array Design Flow Probe Selection Mask Design: Placement & Embedding Probe Pools

40 Enhanced DNA Array Design Flow Probe Selection Mask Design: Placement & Embedding Deposition Mask Design Probe Pools

41 Enhanced DNA Array Design Flow Probe Selection Mask Design: Placement & Embedding Deposition Mask Design Probe Pools Design Rules &Parameters

42 Enhanced DNA Array Design Flow Probe Selection Mask Design: Placement & Embedding Deposition Mask Design Conflict Map Probe Pools Design Rules &Parameters

43 Enhanced DNA Array Design Flow Probe Selection Mask Design: Placement & Embedding Deposition Mask Design Test/Control Structure Design Conflict Map Probe Pools Design Rules &Parameters

44 Summary Contributions: –Epitaxial placement  reduces by extra 10% over the previously best known method –Asynchronous placement problem formulation –Postplacement improvement by extra 15.5-21.8% –Lower bounds –Scalable Placements (1000x1000 in 20min) Ongoing work –Comparison on industrial benchmarks –Experiments with algorithms for extended formulations (SNPs, distance-dependent weights, etc.) Future Directions –Design flow enhancements –Nucleotide deposition sequence design –Partitioning and integration for manufacturing cost reduction

45 Thank you!


Download ppt "Engineering a Scalable Placement Heuristic for DNA Probe Arrays A.B. Kahng, I.I. Mandoiu, P. Pevzner, S. Reda (all UCSD), A. Zelikovsky (GSU)"

Similar presentations


Ads by Google