Presentation is loading. Please wait.

Presentation is loading. Please wait.

APBC 20051 Improved Algorithms for Multiplex PCR Primer Set Selection with Amplification Length Constraints Kishori M. Konwar Ion I. Mandoiu Alexander.

Similar presentations


Presentation on theme: "APBC 20051 Improved Algorithms for Multiplex PCR Primer Set Selection with Amplification Length Constraints Kishori M. Konwar Ion I. Mandoiu Alexander."— Presentation transcript:

1 APBC 20051 Improved Algorithms for Multiplex PCR Primer Set Selection with Amplification Length Constraints Kishori M. Konwar Ion I. Mandoiu Alexander C. Russell Alexander A. Shvartsman CS&E Dept., Univ. of Connecticut

2 APBC 20052 Combinatorial Optimization in Bioinformatics Fast growing number of applications –Sequence alignment –DNA sequencing –Haplotype inference –Pathogen identification –… –High-throughput assay design Microarray probe selection Microarray quality control Universal tag arrays … This talk: Multiplex PCR primer set selection

3 APBC 20053 Outline Background and problem formulation “Potential function” greedy algorithm Approximation guarantee Experimental results Conclusions

4 APBC 20054 The Polymerase Chain Reaction Target Sequence Polymerase Primer 1 Primer 2 Primers Repeat 20-30 cycles

5 APBC 20055 Primer Pair Selection Problem Given: Genomic sequence around amplification locus Primer length k Amplification upperbound L Find: Forward and reverse primers of length k that hybridize within a distance of L of each other and optimize amplification efficiency (melting temperature, secondary structure, mis-priming, etc.)  L L Forward primer Reverse primer amplification locus 3'3' 3'3' 5'5' 5'5'

6 APBC 20056 PCR for SNP Genotyping Thousands of SNPs to be genotyped using hybridization methods (e.g., SBE) Selective PCR amplification needed to improve accuracy of detection steps –whole-genome amplification not appropriate Simultaneous amplification OK  Multiplex PCR

7 APBC 20057 Multiplex PCR How it works –Multiple DNA fragments amplified simultaneously –Each amplified fragment still defined by two primers –A primer may participate in amplification of multiple targets Primer set selection –Currently done by time-consuming trial and error –An important objective is to minimize number of primers  Reduced assay cost  Higher effective concentration of primers  higher amplification efficiency  Reduced unintended amplification

8 APBC 20058 Primer Set Selection Problem Given: Genomic sequences around n amplification loci Primer length k Amplification upper bound L Find: Minimum size set S of primers of length k such that, for each amplification locus, there are two primers in S hybridizing with the forward and reverse genomic sequences within a distance of L of each other

9 APBC 20059 Previous Work on Primer Selection Well-studied problem: [Pearson et al. 96], [Linhart & Shamir’02], [Souvenir et al.’03], etc. Almost all problem formulations decouple selection of forward and reverse primers –To enforce bound of L on amplification length, select only primers that hybridize within L/2 bases of desired target –In worst case, this method can increase the number of primers by a factor of O(n) compared to the optimum [Pearson et al. 96] Greedy set cover algorithm gives O(ln n) approximation factor for the “decoupled” formulation

10 APBC 200510 Previous Work (2) [Fernandes&Skiena’02] study primer set selection with uniqueness constraints Minimum Multi-Colored Subgraph Problem: –Vertices correspond to candidate primers –Edge colored by color i between u and v iff corresponding primers hybridize within a distance of L of each other around i-th amplification locus –Goal is to find minimum size set of vertices inducing edges of all colors

11 APBC 200511 The Set Cover Problem  Given: - Universal set U with n elements - Family of sets (S x, x  X) covering all elements of U  Find: - Minimum size subset X’ of X s.t. (S x, x  X’) covers all elements of U

12 APBC 200512 Selection w/ Length Constraints “Simultaneous set covering” problem: - Ground set partitioned into n disjoint sets S i (one for each target), each with 2L elements - Goal is to select minimum number of sets == primers covering at least 1/2 of the elements in each partition L L SNP i

13 APBC 200513 Greedy Setcover Algorithm  Classical result (Johnson’74, Lovasz’75, Chvatal’79): the greedy setcover algorithm has an approximation factor of H(n)=1+1/2+1/3+…+1/n < 1+ln(n) - The approximation factor is tight - Cannot be approximated within a factor of (1-  )ln(n) unless NP=DTIME(n loglog(n) )  Greedy Algorithm: - Repeatedly pick the set with most uncovered elements

14 APBC 200514 Potential Functions Set cover  = #uncovered elements Initially,  = n For feasible solutions,  = 0 Primer selection with length constraints  = minimum number of elements that must be covered =  i max{0, L - #uncovered elements in S i } Initially,  = nL For feasible solutions,  = 0

15 APBC 200515 General setting  Potential function  (X’)  0   ({}) =  max   (X’) = 0 for all feasible solutions  X’’  X’   (X’’)   (X’)  If  (X’)>0, then there exists x s.t.  (X’+x) <  (X’)  X’’  X’  ∆(x,X’)  ∆(x,X’) for every x, where ∆(x,X’) :=  (X’) -  (X’+x)  Objective: find minimum size set X’ with  (X’)=0

16 APBC 200516 Generic Greedy Algorithm Theorem: The generic greedy algorithm has an approximation factor of 1+ln ∆ max Corollary: 1+ln(nL) approximation for PCR primer selection  X’  {}  While  ( X’ ) > 0 Find x with maximum ∆( x,X’ ) X’  X’ + x

17 APBC 200517 Proof Sketch (1) x 1, x 2,…,x g be the elements selected by greedy, in the order in which they are chosen x* 1, x* 2,…,x* k be the elements of an optimum solution. Charging scheme: x i charges to x* j a cost of where  i j = ∆(x i,{x 1,…, x i-1 }  {x* 1,…,x* j }) Fact 1: Each x* j gets charged a total cost of at most 1+ln ∆ max

18 APBC 200518 Proof Sketch (2) Fact 2: Each x i charges at least 1 unit of cost

19 APBC 200519 Experimental Setting Datasets extracted from NCBI databases, L=1000 Dell PowerEdge 2.8GHz Xeon Compared algorithms –G-FIX: greedy primer cover algorithm [Pearson et al.] –MIPS-PT: iterative beam-search heuristic [Souvenir et al.] Restrict primers to L/2 bases around amplification locus –G-VAR: naïve modification of G-FIX First selected primer can be up to L bases away Opposite sequence truncated after selecting first primer –G-POT: potential function driven greedy algorithm

20 APBC 200520 Experimental Results, NCBI tests # Targets k G-FIX (Pearson et al.) G-VAR (G-FIX with dynamic truncation) MIPS-PT (Souvenir et al.) G-POT (Potential- function greedy) #PrimersCPU sec #PrimersCPU sec #PrimersCPU sec #PrimersCPU sec 20 8 70.0470.0881060.10 10 90.03100.08131590.08 12 140.04130.081826130.11 50 8 130.13150.302148 100.32 10 230.22240.3630150180.33 12 310.14320.3041246290.28 100 8 170.49200.8932226140.58 10 370.37370.7250844310.75 12 530.59480.84752601420.61

21 APBC 200521 #primers, as percentage of 2n (l=8) n

22 APBC 200522 #primers, as percentage of 2n (l=10) n

23 APBC 200523 #primers, as percentage of 2n (l=12) n

24 APBC 200524 CPU Seconds (l=10) n

25 APBC 200525 Conclusions Numerous combinatorial optimization problems arising in the area of high-throughput assay design Theoretical insights such as approximation results can lead to significant practical improvements Choosing the proper problem model is critical to solution efficiency

26 APBC 200526 Ongoing Work & Open Problems Degenerate primers Accurate hybridization model (melting temperature, secondary structure, cross hybridization,…) –In-silico MP-PCR simulator Partition into multiple multiplexed PCR reactions (Aumann et al. Wabi’03)

27 APBC 200527 Acknowledgments Financial support from UCONN’s Research Foundation

28 APBC 200528 Integer Program Formulation 0/1 variable x u for every vertex 0/1 variable y e for every edge e

29 APBC 200529 LP-Rounding Algorithm  Theorem [Konwar et al.’04]: The LP-rounding algorithm finds a feasible solution at most O(m 1/2 lnn) times larger than the optimum, where m is the maximum color class size, and n is the number of nodes  For primer selection, m  L 2  approximation factor is O(Llnn)  Better approximation? - Unlikely for minimum multi-colored subgraph problem (1) Solve linear programming relaxation (2) Select node u with probability x u (3) Repeat step 2 O(ln(n)) times and return selected nodes


Download ppt "APBC 20051 Improved Algorithms for Multiplex PCR Primer Set Selection with Amplification Length Constraints Kishori M. Konwar Ion I. Mandoiu Alexander."

Similar presentations


Ads by Google