Computational Molecular Biology Non-unique Probe Selection via Group Testing.

Slides:



Advertisements
Similar presentations
Problems and Their Classes
Advertisements

Covers, Dominations, Independent Sets and Matchings AmirHossein Bayegan Amirkabir University of Technology.
Lecture 24 MAS 714 Hartmut Klauck
Approximation Algorithms
Walks, Paths and Circuits Walks, Paths and Circuits Sanjay Jain, Lecturer, School of Computing.
Approximation Algorithms Chapter 14: Rounding Applied to Set Cover.
Totally Unimodular Matrices
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
Private Approximation of Search Problems Amos Beimel Paz Carmi Kobbi Nissim Enav Weinreb Ben Gurion University Research partially Supported by the Frankel.
Complexity ©D Moshkovitz 1 Approximation Algorithms Is Close Enough Good Enough?
Approximation Algorithms Chapter 5: k-center. Overview n Main issue: Parametric pruning –Technique for approximation algorithms n 2-approx. algorithm.
Combinatorial Algorithms
Complexity 16-1 Complexity Andrei Bulatov Non-Approximability.
Computability and Complexity 23-1 Computability and Complexity Andrei Bulatov Search and Optimization.
Complexity 15-1 Complexity Andrei Bulatov Hierarchy Theorem.
Totally Unimodular Matrices Lecture 11: Feb 23 Simplex Algorithm Elliposid Algorithm.
A general approximation technique for constrained forest problems Michael X. Goemans & David P. Williamson Presented by: Yonatan Elhanani & Yuval Cohen.
1 Combinatorial Dominance Analysis The Knapsack Problem Keywords: Combinatorial Dominance (CD) Domination number/ratio (domn, domr) Knapsack (KP) Incremental.
Computability and Complexity 24-1 Computability and Complexity Andrei Bulatov Approximation.
Job Scheduling Lecture 19: March 19. Job Scheduling: Unrelated Multiple Machines There are n jobs, each job has: a processing time p(i,j) (the time to.
Accurate Method for Fast Design of Diagnostic Oligonucleotide Probe Sets for DNA Microarrays Nazif Cihan Tas CMSC 838 Presentation.
Linear Programming and Parameterized Algorithms. Linear Programming n real-valued variables, x 1, x 2, …, x n. Linear objective function. Linear (in)equality.
NP-complete and NP-hard problems. Decision problems vs. optimization problems The problems we are trying to solve are basically of two kinds. In decision.
Approximation Algorithms Motivation and Definitions TSP Vertex Cover Scheduling.
The Maximum Independent Set Problem Sarah Bleiler DIMACS REU 2005 Advisor: Dr. Vadim Lozin, RUTCOR.
1 Spanning Tree Polytope x1 x2 x3 Lecture 11: Feb 21.
1 Physical Mapping --An Algorithm and An Approximation for Hybridization Mapping Shi Chen CSE497 04Mar2004.
Gene expression & Clustering (Chapter 10)
APPROXIMATION ALGORITHMS VERTEX COVER – MAX CUT PROBLEMS
Approximating Minimum Bounded Degree Spanning Tree (MBDST) Mohit Singh and Lap Chi Lau “Approximating Minimum Bounded DegreeApproximating Minimum Bounded.
Approximating the Minimum Degree Spanning Tree to within One from the Optimal Degree R 陳建霖 R 宋彥朋 B 楊鈞羽 R 郭慶徵 R
Stochastic Protection of Confidential Information in SDB: A hybrid of Query Restriction and Data Perturbation ( to appear in Operations Research) Manuel.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
Approximation Algorithms
Computational Molecular Biology Non-unique Probe Selection via Group Testing.
Based on slides by Y. Peng University of Maryland
Computational Molecular Biology Introduction and Preliminaries.
Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.
Lecture 10 Applications of NP-hardness. Knapsack.
Combinatorial Optimization Problems in Computational Biology Ion Mandoiu CSE Department.
CSCI 3160 Design and Analysis of Algorithms Chengyu Lin.
Nonunique Probe Selection and Group Testing Ding-Zhu Du.
NP-Complete Problems. Running Time v.s. Input Size Concern with problems whose complexity may be described by exponential functions. Tractable problems.
Gene expression & Clustering. Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species –Dynamic.
Approximation Algorithms Department of Mathematics and Computer Science Drexel University.
Fixed parameter algorithms for protein similarity search under mRNA structure constrains A joint work by: G. Blin, G. Fertin, D. Hermelin, and S. Vialette.
Basic properties Continuation
NP-completeness NP-complete problems. Homework Vertex Cover Instance. A graph G and an integer k. Question. Is there a vertex cover of cardinality k?
NPC.
C&O 355 Lecture 19 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A.
An Algorithm for the Consecutive Ones Property Claudio Eccher.
Computational Molecular Biology
Approximation Algorithms Greedy Strategies. I hear, I forget. I learn, I remember. I do, I understand! 2 Max and Min  min f is equivalent to max –f.
Approximation Algorithms based on linear programming.
The NP class. NP-completeness Lecture2. The NP-class The NP class is a class that contains all the problems that can be decided by a Non-Deterministic.
Learning Hidden Graphs Hung-Lin Fu 傅 恆 霖 Department of Applied Mathematics Hsin-Chu Chiao Tung Univerity.
Computational Molecular Biology Pooling Designs – Inhibitor Models.
Trees.
Theory of Computational Complexity Probability and Computing Chapter Hikaru Inada Iwama and Ito lab M1.
The NP class. NP-completeness
Approximating Set Cover
Lecture 2-6 Polynomial Time Hierarchy
Haim Kaplan and Uri Zwick
Computational Molecular Biology
Computability and Complexity
Coverage Approximation Algorithms
Problem Solving 4.
Lecture 2-5 Applications of NP-hardness
Computational Molecular Biology
Locality In Distributed Graph Algorithms
Presentation transcript:

Computational Molecular Biology Non-unique Probe Selection via Group Testing

My T. Thai 2

My T. Thai 3

My T. Thai 4

My T. Thai 5 DNA Microarrays  DNA Microarrays are small, solid supports onto which the sequences from thousands of different genes are immobilized, or attached, at fixed locations.  Contain a very large number of genes in a small size chip.  A tool for performing large numbers of DNA-RNA hybridization experiments in parallel.

My T. Thai 6 Applications  Quantitative analysis of expression levels of individual genes  The comparison of cell samples from different tissues.  Computational diagnostics.  Qualitative analysis of an unknown sample  Identification of micro-bacterial organisms.  Detection of contamination of biotechnological products.  Identification of viral subtypes.

My T. Thai 7 Unique Probes vs. Non-unique Probes  Unique probes  Gene-specific probes or signature probes.  Difficult to find such probes  Non-unique probes  Hybridize to more than one target.  Difficult to design the test based on non-unique probes

My T. Thai 8 Probe-Target Matrix  12 probe candidates.  4 targets (genes).  For target set S, define P(S) as set of probes reacting to any target in S.  P({1, 2}) = {1, 2, 3, 4, 7, 8, 9, 10, 12}.  P({2, 3}) = {1, 3, 4, 5, 6, 7, 8, 9, 12}.  Symmetric set difference: P({1, 2})∆P({2, 3}) = {2, 5, 6, 10}. Probes that separate two sets.

My T. Thai 9 Probe-Target Matrix  Non-unique probe for each sequence group probe1(p1) probe2(p2) probe3(p3) s1 group1 group2 group3 s2 s3 s4 s5 s6 s7 s8 s9 probe4(p4) probe5(p5) group1group2group3 t1t2t3t4t5t6t7t8t9 p p p p p

My T. Thai 10 The Problem  Given a sample with m items and a set of n non- unique proble  Goal: determine the presence or absence of targets

My T. Thai 11 The Approach  3 Steps:  Pre-select suitable probe candidates and compute the probe-target incidence matrix H.  Select a minimal set of probes and compute a suitable design matrix D (a sub-matrix of H).  Decode the result.

My T. Thai 12 An Example  Matrix Hsub-matrix D

My T. Thai 13 Observation  What is the property of a sub-matrix D?  If we want to identify at most d targets without testing errors, D should be either d-separable matrix or d-disjunct matrix  With at most e errors, the matrix should have the e- error correcting. That is, the Hamming distance between any two unions must be at least 2e + 1

My T. Thai 14 Non-unique probe selection via group testing  Given a matrix H  Find a submatrix D such that D is d-separable or d-disjunct with the minimum number of rows  (We can easily extend this definition to the error correcting model)

My T. Thai 15 Min-d-DS  Minimum-d-Disjunct Submatrix:  Given a binary matrix M, find a submatrix H with minimum number of rows and the same number of columns such that H is d-disjunct

My T. Thai 16 Complexity  Theorem: Min-d-DS is NP-hard for any fixed d ≥ 1  Proof: Reduce the 3 dimensional matching into it YZX a b c

My T. Thai 17 Complexity

My T. Thai 18 Approximation  Pair: (c 0, )  Cover: a probe is said to cover a pair (c 0, ) if the incident entry at c 0 is 1 where the rest is 0.  Greedy approach:  While all pairs not covered yet, at each iteration:  Choose a probe that can cover the most un-covered pairs  Approximation ratio: 1 + (d+1)ln n  If NP is not contained by DTIME(n^{log log n}), then no approximation has performance (1-ε)ln n for any ε > 0.

My T. Thai 19 Pool Size is 2  Consider a case when each probe can hybridize with exactly 2 targets  Min-1-separable submatrix is also called the minimum test cover  The minimum test cover is APX-complete  The Min-d-DS is really polynomial-time solvable.

My T. Thai 20 Lemma  Consider a collection C of pools of size at most 2. Let G be the graph with all items as vertices and all pools of size 2 as edges. Then  C gives a d-disjunct matrix if and only if every item not in a singleton pool has degree at least d+1 in G.

My T. Thai 21 Proof

My T. Thai 22

My T. Thai 23 Theorem  Min-d-DS is polynomial-time solvable in the case that all given pools have size exactly 2  Proof: Given a graph G representing M  then finding a minimum d-DS is equivalent to finding a subgraph H with minimum number of edges such that all the vertices has a degree at least d + 1  Equivalent to maximize number of edges in G – H such that every vertex v has the degree at most d G (v) – d -1 in G - H

My T. Thai 24 Complexity  Min-1-DS is NP-hard in the case that all given pools have size at most 2.  Proof: Reduce Vertex-Cover  Min-d-DS is MAX SNP-complete in the case that all given pools have size at most 2 for d ≥ 2  Proof:  Reduce VC-CUBIC  Given a cubic graph G, find the minimum vertex-cover of G

My T. Thai 25 Approximation  Consist of 2 steps:  Step 1  Compute a minimum solution of the polynomial- time solvable problem as mentioned  Step 2  Choose all singleton pools at vertices with degree less than d+ 1 in H

My T. Thai 26 Approximation Ratio Analysis  Suppose all given pools have size at most 2. Let s be the number of given singleton pools. Then any feasible solution of Min-d-DS contains at least s+ (n-s)(d+1)/2 pools.

My T. Thai 27 Proof

My T. Thai 28 Theorem  The feasible solution obtained in the above algorithm is a polynomial-time approximation with performance ratio 1+2/(d+1).

My T. Thai 29 Proof  Suppose H contains m edges and k vertices of degree at least d+1.  Suppose an optimal solution containing s* singletons and m* pools of size 2.  Then m < m* and (n-k)-s*< 2m*/(d+1). (n-k)+m < s*+m*+ 2m*/(d+1) < (s*+m*)(1+2/(d+1)).

My T. Thai 30 More Challenging  Experimental Errors  False negative:  Pool (probes) contains some positive targets  But return the negative outcome  False positive:  Pool contains all negative targets  But return the positive outcome

My T. Thai 31 An e-Error Correcting Model  Assume that there is at most e errors in testing  (d,e)-disjunct matrix: for any column t j, t j must have at least e entries not contained in the union of other d columns.  Theorem: (d+e)-disjunct matrix without any isolated column is (d,e)-disjunct matrix

My T. Thai 32 Decoding Algorithm with e Errors  Theorem: If the number of errors is at most e, then the number of negative pools containing a positive item is always smaller than the number of negative pools containing a negative item  Algorithm:  Assume there is exactly d positive ones  compute the number of negative results containing each item and select d smallest ones. Time complexity: O(tn)

My T. Thai 33 In S(d,n) sample  What if the sample contain at most d positive (not exactly d positive)  The previous theorem holds, however, the decoding algorithm will not work  If still decoding on (d,e)-disjunct, the time complexity is O((n + t)t e ) where t is the number of selected probes

My T. Thai 34 Decoding  Theorem: Suppose testing done on a (d, 2e)- disjunct matrix H with at most e errors, a positive item will appear in at most e negative results.  Proof. Since there are at most e errors, a target can appear in at most e negative results (due to errors). However, a negative item appears in at least 2e +1- e = e +1 > e negative results. It implies that a positive item appears in at most e negative results.

My T. Thai 35 Decoding Algorithm  Algorithm: For each item, we just need to count the number of negative results containing it. If this number is less than e, then this item is positive.  Linear Decoding.