Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSE280Stefano/Hossein Project: Primer design for cancer genomics.

Similar presentations


Presentation on theme: "CSE280Stefano/Hossein Project: Primer design for cancer genomics."— Presentation transcript:

1 CSE280Stefano/Hossein Project: Primer design for cancer genomics

2 CSE280Stefano/Hossein Cancer genomics In cancers, large genetic changes can occur, including deletions, inversions, and rearrangements of genomes In the early stages, only a few cells will show this deletion

3 CSE280Stefano/Hossein Polymerase Chain Reaction PCR is a technique for amplifying and detecting a specific portion of the genome Amplification takes place if the primers are ‘appropriate’ distance apart (<2kb)

4 CSE280Stefano/Hossein Assaying for Rare Variants PCR can be used to assay for a given genomic abnormality, even in a heterogenous population of tumor and normal cells Extract Genomic DNA PCR Distance too large for amplification Tumor cell Detection

5 CSE280Stefano/Hossein Primer Approximation Multiplex PCR (PAMP)* Multiple primers are optimally spaced, flanking a breakpoint of interest – Upstream of breakpoint, forward primers – Downstream of breakpoint, reverse primers The primers are run in a multiplex PCR reaction – Any pair can form a viable product Deletion Patient BPatient C

6 CSE280Stefano/Hossein Goal Input, a collection of primer locations and matrices of primer interactions – Forward/Forward, Forward/Reverse, Reverse/Reverse Identify a subset of primers that do not interact, are unique, maximizing the covered region

7 CSE280Stefano/Hossein Algorithms for Optimizing the Cost Preprocessing – Determining the pairs of primers that dimerize (Edges in the graph) – Filtering the primers to ensure “uniqueness” Simulated annealing 1. Start from an initial candidate set P, generated randomly or greedily. 2. List the neighboring sets P’ and compute 3. Select step s with a probability proportional to 4. Decrease the temperature T and go to step 2.

8 CSE280Stefano/Hossein Cost Function The cost function used takes coverage and dimerization into account Dimerization CoverageDensity

9 CSE280Stefano/Hossein Simulated Annealing: Define Neighbors Approach 1: – Set – E is the edge set corresponding to dimerizing pairs – Neighbors of P are formed by adding a vertex u to P and removing all vertices dimerizing with u ; i.e. Approach 2: – – No hard constraint on dimerizing pairs. – Neighbors of P are obtained by adding or removing one vertex from P.

10 CSE280Stefano/Hossein : indicator of primer i being selected. : indicator of candidate primer i being immediately after primer j. ILP Formulation Guaranteed optimality, but intractable for realistic problems Used here to assess the performance of simulated annealing

11 CSE280Stefano/Hossein Bounds and Numerical Results A Weak Theoretical Upper Bound: – Select all primers without dimerization constraints. – For any two adjacent primers with distance reduce the covered region by bp.

12 CSE280Stefano/Hossein Potential Improvements Improving the cost function formulation – Incorporating multiplexing sets Find an efficient technique to solve the optimization problem. Improve on the analytical bound – consider the effect of dimerization within the forward/reverse primer set.

13 Pairwise cost function Measures total possible number of sites that are uncovered given all forward and reverse primer combinations

14 Multiobjective cost function Taking coverage and multiplexing sets into account Minimizing both objectives, and resolving the dimerization constraint, given a possible solution containing mutliplexing sets S Missed coverage Sets

15 Using Fewer Integer Variables The formulation in the paper uses n 2 auxiliary variables, one for each pair of primers. – q ij =1 if and only if primers i and j are selected as two consecutive primers in the candidate set. Complexity of ILP (or IQP) generally grows exponentially with the number of integer variables. In practice, the distance between two consecutive primers in the solution is not much larger than d, otherwise there would be a large gap in the covered region. Assume a maximum g on the maximum distance Introduce a variable q ij if l i – l j < g The average number of variables is reduce to n(1+ρg) – ρ is the density of the primers in the initial set. – The number of integer variables becomes O(n).


Download ppt "CSE280Stefano/Hossein Project: Primer design for cancer genomics."

Similar presentations


Ads by Google