CSE280Stefano/Hossein Project: Primer design for cancer genomics.

Slides:



Advertisements
Similar presentations
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
Advertisements

Effective Heuristics for NP-Hard Problems Arising in Molecular Biology Richard M. Karp Bangalore, January 5, 2011.
Reference Assisted Nucleic Acid Sequence Reconstruction from Mass Spectrometry Data Gabriel Ilie 1, Alex Zelikovsky 2 and Ion Măndoiu 1 1 CSE Department,
COFFEE: an objective function for multiple sequence alignments
Computability and Complexity 23-1 Computability and Complexity Andrei Bulatov Search and Optimization.
Date:2011/06/08 吳昕澧 BOA: The Bayesian Optimization Algorithm.
Instructor Neelima Gupta Table of Contents Lp –rounding Dual Fitting LP-Duality.
Primer Selection Methods for Detection of Genomic Inversions and Deletions via PAMP Bhaskar DasGupta, University of Illinois at Chicago Jin Jun, and Ion.
Algorithms for Smoothing Array CGH data
Nature’s Algorithms David C. Uhrig Tiffany Sharrard CS 477R – Fall 2007 Dr. George Bebis.
Approximation Algorithms
Segmentation Graph-Theoretic Clustering.
Jan 6-10th, 2007VLSI Design A Reduced Complexity Algorithm for Minimizing N-Detect Tests Kalyana R. Kantipudi Vishwani D. Agrawal Department of Electrical.
UConn BioGrid REU Summer 2008 Primer Design for Multiplex PCR Nikoletta DiGirolamo.
Integer Programming Difference from linear programming –Variables x i must take on integral values, not real values Lots of interesting problems can be.
1 Efficient Placement and Dispatch of Sensors in a Wireless Sensor Network Prof. Yu-Chee Tseng Department of Computer Science National Chiao-Tung University.
May 25, GSU Biotech Symposium1 Minimum PCR Primer Set Selection with Amplification Length and Uniqueness Constraints Ion Mandoiu University of.
APBC Improved Algorithms for Multiplex PCR Primer Set Selection with Amplification Length Constraints Kishori M. Konwar Ion I. Mandoiu Alexander.
Applied Biosystems 7900HT Fast Real-Time PCR System I. Real-time RT-PCR analysis of siRNA-induced knockdown in mammalian cells (Amit Berson, Mor Hanan.
Approximation Algorithms: Bristol Summer School 2008 Seffi Naor Computer Science Dept. Technion Haifa, Israel TexPoint fonts used in EMF. Read the TexPoint.
1 Physical Mapping --An Algorithm and An Approximation for Hybridization Mapping Shi Chen CSE497 04Mar2004.
IN THE NAME OF GOD. PCR Primer Design Lecturer: Dr. Farkhondeh Poursina.
Particle Filtering in Network Tomography
Fixed Parameter Complexity Algorithms and Networks.
1 Local search and optimization Local search= use single current state and move to neighboring states. Advantages: –Use very little memory –Find often.
Beyond Silicon: Tackling the Unsolvable with DNA.
Researchers: Preet Bola Mike Earnest Kevin Varela-O’Hara Han Zou Advisor: Walter Rusin Data Storage Networks.
DATA MINING LECTURE 13 Pagerank, Absorbing Random Walks Coverage Problems.
EMIS 8373: Integer Programming NP-Complete Problems updated 21 April 2009.
Combinatorial Optimization Problems in Computational Biology Ion Mandoiu CSE Department.
CSCI 3160 Design and Analysis of Algorithms Chengyu Lin.
Energy-Efficient Sensor Network Design Subject to Complete Coverage and Discrimination Constraints Frank Y. S. Lin, P. L. Chiu IM, NTU SECON 2005 Presenter:
Optimization of personalized therapies for anticancer treatment Alexei Vazquez The Cancer Institute of New Jersey.
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
Identification of Copy Number Variants using Genome Graphs
6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)
CSE280Vineet Bafna CSE280a: Projects Vineet Bafna.
Probabilistic Algorithms Evolutionary Algorithms Simulated Annealing.
Presenter : Kuang-Jui Hsu Date : 2011/3/24(Thur.).
Implicit Hitting Set Problems Richard M. Karp Erick Moreno Centeno DIMACS 20 th Anniversary.
De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer.
Optimization Problems
DNA computing on a chip Mitsunori Ogihara and Animesh Ray Nature, 2000 발표자 : 임예니.
Lecture.6. Table of Contents Lp –rounding Dual Fitting LP-Duality.
D Nagesh Kumar, IIScOptimization Methods: M8L5 1 Advanced Topics in Optimization Evolutionary Algorithms for Optimization and Search.
Chapter 11 Introduction to Computational Complexity Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.
Efficient Point Coverage in Wireless Sensor Networks Jie Wang and Ning Zhong Department of Computer Science University of Massachusetts Journal of Combinatorial.
Common Intersection of Half-Planes in R 2 2 PROBLEM (Common Intersection of half- planes in R 2 ) Given n half-planes H 1, H 2,..., H n in R 2 compute.
Approximation Algorithms based on linear programming.
The NP class. NP-completeness Lecture2. The NP-class The NP class is a class that contains all the problems that can be decided by a Non-Deterministic.
Learning Hidden Graphs Hung-Lin Fu 傅 恆 霖 Department of Applied Mathematics Hsin-Chu Chiao Tung Univerity.
Projects CSE280A. Project 1: Primer design for fused transcripts Filled primers represent selection from the candidate pool. Edges indicate dimerization.
Unconstrained Submodular Maximization Moran Feldman The Open University of Israel Based On Maximizing Non-monotone Submodular Functions. Uriel Feige, Vahab.
Arun Kumar. B M.Sc 1st Year Biotechnology SSBS
Prof. Yu-Chee Tseng Department of Computer Science
The NP class. NP-completeness
A Study of Group-Tree Matching in Large Scale Group Communications
PCR TECHNIQUE
The Taxi Scheduling Problem
Computability and Complexity
Segmentation Graph-Theoretic Clustering.
MURI Kickoff Meeting Randolph L. Moses November, 2008
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Algorithms for Budget-Constrained Survivable Topology Design
3.3 Network-Centric Community Detection
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
A Multi-Exonic BRCA1 Deletion Identified in Multiple Families through Single Nucleotide Polymorphism Haplotype Pair Analysis and Gene Amplification with.
15th Scandinavian Workshop on Algorithm Theory
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Multiobjective Optimization
Presentation transcript:

CSE280Stefano/Hossein Project: Primer design for cancer genomics

CSE280Stefano/Hossein Cancer genomics In cancers, large genetic changes can occur, including deletions, inversions, and rearrangements of genomes In the early stages, only a few cells will show this deletion

CSE280Stefano/Hossein Polymerase Chain Reaction PCR is a technique for amplifying and detecting a specific portion of the genome Amplification takes place if the primers are ‘appropriate’ distance apart (<2kb)

CSE280Stefano/Hossein Assaying for Rare Variants PCR can be used to assay for a given genomic abnormality, even in a heterogenous population of tumor and normal cells Extract Genomic DNA PCR Distance too large for amplification Tumor cell Detection

CSE280Stefano/Hossein Primer Approximation Multiplex PCR (PAMP)* Multiple primers are optimally spaced, flanking a breakpoint of interest – Upstream of breakpoint, forward primers – Downstream of breakpoint, reverse primers The primers are run in a multiplex PCR reaction – Any pair can form a viable product Deletion Patient BPatient C

CSE280Stefano/Hossein Goal Input, a collection of primer locations and matrices of primer interactions – Forward/Forward, Forward/Reverse, Reverse/Reverse Identify a subset of primers that do not interact, are unique, maximizing the covered region

CSE280Stefano/Hossein Algorithms for Optimizing the Cost Preprocessing – Determining the pairs of primers that dimerize (Edges in the graph) – Filtering the primers to ensure “uniqueness” Simulated annealing 1. Start from an initial candidate set P, generated randomly or greedily. 2. List the neighboring sets P’ and compute 3. Select step s with a probability proportional to 4. Decrease the temperature T and go to step 2.

CSE280Stefano/Hossein Cost Function The cost function used takes coverage and dimerization into account Dimerization CoverageDensity

CSE280Stefano/Hossein Simulated Annealing: Define Neighbors Approach 1: – Set – E is the edge set corresponding to dimerizing pairs – Neighbors of P are formed by adding a vertex u to P and removing all vertices dimerizing with u ; i.e. Approach 2: – – No hard constraint on dimerizing pairs. – Neighbors of P are obtained by adding or removing one vertex from P.

CSE280Stefano/Hossein : indicator of primer i being selected. : indicator of candidate primer i being immediately after primer j. ILP Formulation Guaranteed optimality, but intractable for realistic problems Used here to assess the performance of simulated annealing

CSE280Stefano/Hossein Bounds and Numerical Results A Weak Theoretical Upper Bound: – Select all primers without dimerization constraints. – For any two adjacent primers with distance reduce the covered region by bp.

CSE280Stefano/Hossein Potential Improvements Improving the cost function formulation – Incorporating multiplexing sets Find an efficient technique to solve the optimization problem. Improve on the analytical bound – consider the effect of dimerization within the forward/reverse primer set.

Pairwise cost function Measures total possible number of sites that are uncovered given all forward and reverse primer combinations

Multiobjective cost function Taking coverage and multiplexing sets into account Minimizing both objectives, and resolving the dimerization constraint, given a possible solution containing mutliplexing sets S Missed coverage Sets

Using Fewer Integer Variables The formulation in the paper uses n 2 auxiliary variables, one for each pair of primers. – q ij =1 if and only if primers i and j are selected as two consecutive primers in the candidate set. Complexity of ILP (or IQP) generally grows exponentially with the number of integer variables. In practice, the distance between two consecutive primers in the solution is not much larger than d, otherwise there would be a large gap in the covered region. Assume a maximum g on the maximum distance Introduce a variable q ij if l i – l j < g The average number of variables is reduce to n(1+ρg) – ρ is the density of the primers in the initial set. – The number of integer variables becomes O(n).