Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computational Molecular Biology Non-unique Probe Selection via Group Testing.

Similar presentations


Presentation on theme: "Computational Molecular Biology Non-unique Probe Selection via Group Testing."— Presentation transcript:

1 Computational Molecular Biology Non-unique Probe Selection via Group Testing

2 My T. Thai mythai@cise.ufl.edu 2

3 My T. Thai mythai@cise.ufl.edu 3

4 My T. Thai mythai@cise.ufl.edu 4

5 My T. Thai mythai@cise.ufl.edu 5 DNA Microarrays  DNA Microarrays are small, solid supports onto which the sequences from thousands of different genes are immobilized, or attached, at fixed locations.  Contain a very large number of genes in a small size chip.  A tool for performing large numbers of DNA-RNA hybridization experiments in parallel.

6 My T. Thai mythai@cise.ufl.edu 6 Applications  Quantitative analysis of expression levels of individual genes  The comparison of cell samples from different tissues.  Computational diagnostics.  Qualitative analysis of an unknown sample  Identification of micro-bacterial organisms.  Detection of contamination of biotechnological products.  Identification of viral subtypes.

7 My T. Thai mythai@cise.ufl.edu 7 Unique Probes vs. Non-unique Probes  Unique probes  Gene-specific probes or signature probes.  Difficult to find such probes  Non-unique probes  Hybridize to more than one target.  Difficult to design the test based on non-unique probes

8 My T. Thai mythai@cise.ufl.edu 8 Probe-Target Matrix  12 probe candidates.  4 targets (genes).  For target set S, define P(S) as set of probes reacting to any target in S.  P({1, 2}) = {1, 2, 3, 4, 7, 8, 9, 10, 12}.  P({2, 3}) = {1, 3, 4, 5, 6, 7, 8, 9, 12}.  Symmetric set difference: P({1, 2})∆P({2, 3}) = {2, 5, 6, 10}. Probes that separate two sets.

9 My T. Thai mythai@cise.ufl.edu 9 Probe-Target Matrix  Non-unique probe for each sequence group probe1(p1) probe2(p2) probe3(p3) s1 group1 group2 group3 s2 s3 s4 s5 s6 s7 s8 s9 probe4(p4) probe5(p5) group1group2group3 t1t2t3t4t5t6t7t8t9 p1 110001000 p2 000011100 p3 101000010 p4 011100000 p3 000000111

10 My T. Thai mythai@cise.ufl.edu 10 The Problem  Given a sample with m items and a set of n non- unique proble  Goal: determine the presence or absence of targets

11 My T. Thai mythai@cise.ufl.edu 11 The Approach  3 Steps:  Pre-select suitable probe candidates and compute the probe-target incidence matrix H.  Select a minimal set of probes and compute a suitable design matrix D (a sub-matrix of H).  Decode the result.

12 My T. Thai mythai@cise.ufl.edu 12 An Example  Matrix Hsub-matrix D

13 My T. Thai mythai@cise.ufl.edu 13 Observation  What is the property of a sub-matrix D?  If we want to identify at most d targets without testing errors, D should be either d-separable matrix or d-disjunct matrix  With at most e errors, the matrix should have the e- error correcting. That is, the Hamming distance between any two unions must be at least 2e + 1

14 My T. Thai mythai@cise.ufl.edu 14 Non-unique probe selection via group testing  Given a matrix H  Find a submatrix D such that D is d-separable or d-disjunct with the minimum number of rows  (We can easily extend this definition to the error correcting model)

15 My T. Thai mythai@cise.ufl.edu 15 Min-d-DS  Minimum-d-Disjunct Submatrix:  Given a binary matrix M, find a submatrix H with minimum number of rows and the same number of columns such that H is d-disjunct

16 My T. Thai mythai@cise.ufl.edu 16 Complexity  Theorem: Min-d-DS is NP-hard for any fixed d ≥ 1  Proof: Reduce the 3 dimensional matching into it YZX a b c 0 11 1 1 1 1 1

17 My T. Thai mythai@cise.ufl.edu 17 Complexity

18 My T. Thai mythai@cise.ufl.edu 18 Approximation  Pair: (c 0, )  Cover: a probe is said to cover a pair (c 0, ) if the incident entry at c 0 is 1 where the rest is 0.  Greedy approach:  While all pairs not covered yet, at each iteration:  Choose a probe that can cover the most un-covered pairs  Approximation ratio: 1 + (d+1)ln n  If NP is not contained by DTIME(n^{log log n}), then no approximation has performance (1-ε)ln n for any ε > 0.

19 My T. Thai mythai@cise.ufl.edu 19 Pool Size is 2  Consider a case when each probe can hybridize with exactly 2 targets  Min-1-separable submatrix is also called the minimum test cover  The minimum test cover is APX-complete  The Min-d-DS is really polynomial-time solvable.

20 My T. Thai mythai@cise.ufl.edu 20 Lemma  Consider a collection C of pools of size at most 2. Let G be the graph with all items as vertices and all pools of size 2 as edges. Then  C gives a d-disjunct matrix if and only if every item not in a singleton pool has degree at least d+1 in G.

21 My T. Thai mythai@cise.ufl.edu 21 Proof 0 1 1 0 1 1 0 0 1 0 0 1

22 My T. Thai mythai@cise.ufl.edu 22

23 My T. Thai mythai@cise.ufl.edu 23 Theorem  Min-d-DS is polynomial-time solvable in the case that all given pools have size exactly 2  Proof: Given a graph G representing M  then finding a minimum d-DS is equivalent to finding a subgraph H with minimum number of edges such that all the vertices has a degree at least d + 1  Equivalent to maximize number of edges in G – H such that every vertex v has the degree at most d G (v) – d -1 in G - H

24 My T. Thai mythai@cise.ufl.edu 24 Complexity  Min-1-DS is NP-hard in the case that all given pools have size at most 2.  Proof: Reduce Vertex-Cover  Min-d-DS is MAX SNP-complete in the case that all given pools have size at most 2 for d ≥ 2  Proof:  Reduce VC-CUBIC  Given a cubic graph G, find the minimum vertex-cover of G

25 My T. Thai mythai@cise.ufl.edu 25 Approximation  Consist of 2 steps:  Step 1  Compute a minimum solution of the polynomial- time solvable problem as mentioned  Step 2  Choose all singleton pools at vertices with degree less than d+ 1 in H

26 My T. Thai mythai@cise.ufl.edu 26 Approximation Ratio Analysis  Suppose all given pools have size at most 2. Let s be the number of given singleton pools. Then any feasible solution of Min-d-DS contains at least s+ (n-s)(d+1)/2 pools.

27 My T. Thai mythai@cise.ufl.edu 27 Proof

28 My T. Thai mythai@cise.ufl.edu 28 Theorem  The feasible solution obtained in the above algorithm is a polynomial-time approximation with performance ratio 1+2/(d+1).

29 My T. Thai mythai@cise.ufl.edu 29 Proof  Suppose H contains m edges and k vertices of degree at least d+1.  Suppose an optimal solution containing s* singletons and m* pools of size 2.  Then m < m* and (n-k)-s*< 2m*/(d+1). (n-k)+m < s*+m*+ 2m*/(d+1) < (s*+m*)(1+2/(d+1)).

30 My T. Thai mythai@cise.ufl.edu 30 More Challenging  Experimental Errors  False negative:  Pool (probes) contains some positive targets  But return the negative outcome  False positive:  Pool contains all negative targets  But return the positive outcome

31 My T. Thai mythai@cise.ufl.edu 31 An e-Error Correcting Model  Assume that there is at most e errors in testing  (d,e)-disjunct matrix: for any column t j, t j must have at least e + 1 1-entries not contained in the union of other d columns.  Theorem: (d+e)-disjunct matrix without any isolated column is (d,e)-disjunct matrix

32 My T. Thai mythai@cise.ufl.edu 32 Decoding Algorithm with e Errors  Theorem: If the number of errors is at most e, then the number of negative pools containing a positive item is always smaller than the number of negative pools containing a negative item  Algorithm:  Assume there is exactly d positive ones  compute the number of negative results containing each item and select d smallest ones. Time complexity: O(tn)

33 My T. Thai mythai@cise.ufl.edu 33 In S(d,n) sample  What if the sample contain at most d positive (not exactly d positive)  The previous theorem holds, however, the decoding algorithm will not work  If still decoding on (d,e)-disjunct, the time complexity is O((n + t)t e ) where t is the number of selected probes

34 My T. Thai mythai@cise.ufl.edu 34 Decoding  Theorem: Suppose testing done on a (d, 2e)- disjunct matrix H with at most e errors, a positive item will appear in at most e negative results.  Proof. Since there are at most e errors, a target can appear in at most e negative results (due to errors). However, a negative item appears in at least 2e +1- e = e +1 > e negative results. It implies that a positive item appears in at most e negative results.

35 My T. Thai mythai@cise.ufl.edu 35 Decoding Algorithm  Algorithm: For each item, we just need to count the number of negative results containing it. If this number is less than e, then this item is positive.  Linear Decoding.


Download ppt "Computational Molecular Biology Non-unique Probe Selection via Group Testing."

Similar presentations


Ads by Google