Presentation is loading. Please wait.

Presentation is loading. Please wait.

Deconvoluting BAC-gene Relationships Using a Physical Map Y. Wu 1, L. Liu 1, T. Close 2, S. Lonardi 1 1 Department of Computer Science & Engineering 2.

Similar presentations


Presentation on theme: "Deconvoluting BAC-gene Relationships Using a Physical Map Y. Wu 1, L. Liu 1, T. Close 2, S. Lonardi 1 1 Department of Computer Science & Engineering 2."— Presentation transcript:

1 Deconvoluting BAC-gene Relationships Using a Physical Map Y. Wu 1, L. Liu 1, T. Close 2, S. Lonardi 1 1 Department of Computer Science & Engineering 2 Department of Botany & Plant Sciences

2 Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Selective sequencing Many organisms are unlikely to be sequenced in the near future due to the large size and highly repetitive content of their genomesMany organisms are unlikely to be sequenced in the near future due to the large size and highly repetitive content of their genomes Selective sequencing: obtain the sequence of a small set of BAC clones that contain a specific set of genes of interestSelective sequencing: obtain the sequence of a small set of BAC clones that contain a specific set of genes of interest How do we identify these BAC clones? BAC-gene deconvolution problemHow do we identify these BAC clones? BAC-gene deconvolution problem

3 Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 An illustration of the problem

4 Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 An illustration of the problem

5 Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 An illustration of the problem

6 Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Hybridization with probes The presence of a gene in a BAC can be determined by an hybridization experiment (e.g., using a unique probe designed from it)The presence of a gene in a BAC can be determined by an hybridization experiment (e.g., using a unique probe designed from it) Given that typically BAC clones and probes could be in the order of tens of thousands, carrying out an experiment for each pair (BAC,probe) is usually unfeasibleGiven that typically BAC clones and probes could be in the order of tens of thousands, carrying out an experiment for each pair (BAC,probe) is usually unfeasible Group testing (or pooling) has to be usedGroup testing (or pooling) has to be used

7 Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Hybridization with pools of probes Probes can be arranged into pools for group testing. However, in order to achieve exact deconvolution this strategy could be still unfeasible due to the large number of poolsProbes can be arranged into pools for group testing. However, in order to achieve exact deconvolution this strategy could be still unfeasible due to the large number of pools Question: Can we use a small number of pools (e.g., 1- or 2-decodable pool design) and still achieve accurate deconvolution?Question: Can we use a small number of pools (e.g., 1- or 2-decodable pool design) and still achieve accurate deconvolution?

8 Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Dealing with the limitations of pooling Answer: Yes, if one compensates for the lack of information obtained by a weak pooling design with the knowledge of the overlapping structure of the BACsAnswer: Yes, if one compensates for the lack of information obtained by a weak pooling design with the knowledge of the overlapping structure of the BACs In this way, the number of pools required is reduced  less expensive/time- consumingIn this way, the number of pools required is reduced  less expensive/time- consuming

9 Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Hybridization data h(b,p)=1 (pool p hybridizes to BAC b) –b must contain at least one of the probes/genes represented by p –positive information h(b,p)=0 (pool p does not hybridize to BAC b) –b cannot contain any of the probes/genes represented by p –negative information

10 Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Deconvolution problem Given h(b,p) for all pairs (b,p) the deconvolution problem is to establish a one-to-many assignment between the probes p and the clones b in such a way that it satisfies the value of hGiven h(b,p) for all pairs (b,p) the deconvolution problem is to establish a one-to-many assignment between the probes p and the clones b in such a way that it satisfies the value of h 1.Basic deconvolution: uses only on information obtained from group testing 2.Improved deconvolution: also uses the physical map

11 Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Input to the basic deconvolution h p1p1p1p1 p2p2p2p2 p3p3p3p3 p4p4p4p4 b1b1b1b11000 b2b2b2b21100 b3b3b3b30110 b4b4b4b40011 b5b5b5b50001 Hybridization table p i is a pool b j is a BAC u k is a probe/gene

12 Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Input to the basic deconvolution h p1p1p1p1 p2p2p2p2 p3p3p3p3 p4p4p4p4 b1b1b1b11 b2b2b2b211 b3b3b3b311 b4b4b4b411 b5b5b5b51 u1u1u1u1 u2u2u2u2 u3u3u3u3 u4u4u4u4 u5u5u5u5 u6u6u6u6 u7u7u7u7 u8u8u8u8 u9u9u9u9 p1p1p1p1111 p2p2p2p2111 p3p3p3p3111 p4p4p4p4111 Pool content table Hybridization table p i is a pool b j is a BAC u k is a probe/gene

13 Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Positive information u1u1u1u1 u2u2u2u2 u3u3u3u3 u4u4u4u4 u5u5u5u5 u6u6u6u6 u7u7u7u7 u8u8u8u8 u9u9u9u9 b 1,p 1 111 b 2,p 1 111 b 2,p 2 111 b 3,p 2 111 b 3,p 3 111 b 4,p 3 111 b 4,p 4 111 b 5,p 4 111 p i is a pool b j is a BAC u k is a probe/gene

14 Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Negative information u1u1u1u1 u2u2u2u2 u3u3u3u3 u4u4u4u4 u5u5u5u5 u6u6u6u6 u7u7u7u7 u8u8u8u8 u9u9u9u9 b1b1b1b10000000 b2b2b2b200000 b3b3b3b3000000 b4b4b4b400000 b5b5b5b50000000 p i is a pool b j is a BAC u k is a probe/gene

15 Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Combining positive & negative u1u1u1u1 u2u2u2u2 u3u3u3u3 u4u4u4u4 u5u5u5u5 u6u6u6u6 u7u7u7u7 u8u8u8u8 u9u9u9u9 b 1,p 1 111 b 2,p 1 111 b 2,p 2 111 b 3,p 2 111 b 3,p 3 111 b 4,p 3 111 b 4,p 4 111 b 5,p 4 111 p i is a pool b j is a BAC u k is a probe/gene

16 Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Combining positive & negative u1u1u1u1 u2u2u2u2 u3u3u3u3 u4u4u4u4 u5u5u5u5 u6u6u6u6 u7u7u7u7 u8u8u8u8 u9u9u9u9 b 1,p 1 11 b 2,p 1 111 b 2,p 2 11 b 3,p 2 11 b 3,p 3 11 b 4,p 3 11 b 4,p 4 111 b 5,p 4 11 Each row represents a constraint to be satisfiedEach row represents a constraint to be satisfied If a row contains only one “1”, then the relationship between the BAC and probe is resolved exactlyIf a row contains only one “1”, then the relationship between the BAC and probe is resolved exactly p i is a pool b j is a BAC u k is a probe/gene

17 Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Physical map-assisted deconvolution Basic deconvolution is not sufficientBasic deconvolution is not sufficient BACs are assembled into contigs by FPC (a contig is a set of BAC clones)BACs are assembled into contigs by FPC (a contig is a set of BAC clones) We assume the probes are unique  each probe can belong to exactly one contigWe assume the probes are unique  each probe can belong to exactly one contig Contig 1Contig 2

18 Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Optimization problem We formulate the following optimization problemWe formulate the following optimization problem The problem is NP-complete (proof in the paper, reduction from 3SAT)The problem is NP-complete (proof in the paper, reduction from 3SAT)

19 Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Integer Linear Programming The optimization problem can be solved via integer linear programming (ILP)The optimization problem can be solved via integer linear programming (ILP)

20 Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 LP and randomized rounding The ILP is relaxed to the corresponding LP, then the LP is solved exactly (via the GLPK package)The ILP is relaxed to the corresponding LP, then the LP is solved exactly (via the GLPK package) Optimal solution to the LP is mapped to a valid solution to the ILP via randomized roundingOptimal solution to the LP is mapped to a valid solution to the ILP via randomized rounding We prove that our method achieves approximation ratio (1-e -1 )We prove that our method achieves approximation ratio (1-e -1 )

21 Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Experimental results on rice genome Whole genome sequence for rice is availableWhole genome sequence for rice is available BAC library and fingerprinting data are available from AGIBAC library and fingerprinting data are available from AGI BAC-end sequences are also available from GenbankBAC-end sequences are also available from Genbank Physical map was built using FPCPhysical map was built using FPC Coordinates of the BAC on the genome were determined by BLASTing BAC-end sequences against the genomeCoordinates of the BAC on the genome were determined by BLASTing BAC-end sequences against the genome

22 Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Experimental results on rice genome Rice unigenes are available from NCBIRice unigenes are available from NCBI Unique probes for the unigenes were designed by the Oligospawn softwareUnique probes for the unigenes were designed by the Oligospawn software Experiments focused on chromosome IExperiments focused on chromosome I Probe pools were designed following the shifted transversal design (STD)Probe pools were designed following the shifted transversal design (STD) Dataset: 2,002 probes and 2,629 BACsDataset: 2,002 probes and 2,629 BACs

23 Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Experimental results 1-decodable pooling design

24 Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Experimental results 2-decodable pooling design

25 Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Experimental results

26 Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Findings We proposed a new method to solve the BAC-gene deconvolution problem based on integer linear programmingWe proposed a new method to solve the BAC-gene deconvolution problem based on integer linear programming Experimental results show that our method is accurate and effectiveExperimental results show that our method is accurate and effective

27 Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Thank you FundingFunding Serdar Bozdag (UC Riverside) for providing the rice data (fingerprinting and hybridization)Serdar Bozdag (UC Riverside) for providing the rice data (fingerprinting and hybridization)

28 Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

29 Hybridization with pools of probes Probes can be arranged into pools for group testingProbes can be arranged into pools for group testing In order to achieve exact deconvolution this strategy can be still unfeasibleIn order to achieve exact deconvolution this strategy can be still unfeasible The reason: a BAC may contain several, if not tens of genes  the “decodability” of the pool design has to be high to achieve exact deconvolution  …The reason: a BAC may contain several, if not tens of genes  the “decodability” of the pool design has to be high to achieve exact deconvolution  …

30 Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Hybridization with pools of probes …  the pool size has to be small, which implies that the number of pools will be large…  the pool size has to be small, which implies that the number of pools will be large Question: Can we use a low decodability (1- or 2-decodable) pool design and still achieve good deconvolution?Question: Can we use a low decodability (1- or 2-decodable) pool design and still achieve good deconvolution?

31 Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Physical map-assisted deconvolution For example, if we knew that BAC and BAC b j are 80% overlapping  if a probe p belongs to BAC, it is very likely that p also belongs to b jFor example, if we knew that BAC b i and BAC b j are 80% overlapping  if a probe p belongs to BAC b i, it is very likely that p also belongs to b j On the other hand, if we knew that BAC and BAC b j are not overlapping  if a probe p belongs to BAC, then it is very unlikely that probe p also belong to BAC b jOn the other hand, if we knew that BAC b i and BAC b j are not overlapping  if a probe p belongs to BAC b i, then it is very unlikely that probe p also belong to BAC b j

32 Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Physical map-assisted deconvolution Basic deconvolution step is not sufficientBasic deconvolution step is not sufficient The overlapping structure of the BACs is used to resolve additional relationships between BACs and probesThe overlapping structure of the BACs is used to resolve additional relationships between BACs and probes

33 Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Sketch of the algorithm

34 Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Perfect physical map Cut the chromosome at the points where a BAC starts or endsCut the chromosome at the points where a BAC starts or ends Let’s call the resulting pieces fragmentsLet’s call the resulting pieces fragments Each fragment is covered by a set of BACsEach fragment is covered by a set of BACs Assume the probes are unique, therefore, each probe can only belong to one fragmentAssume the probes are unique, therefore, each probe can only belong to one fragment f1f1 f2f2 f3f3 f4f4 f5f5

35 Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Optimization problem Optimization problem is similarly formulatedOptimization problem is similarly formulated

36 Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 ILP

37 Solving the optimization problem The above problem is NP-completeThe above problem is NP-complete It is solved via ILP followed by LP relaxation and randomized roundingIt is solved via ILP followed by LP relaxation and randomized rounding Similar performance guarantee can be provedSimilar performance guarantee can be proved

38 Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Sketch of the algorithm

39 Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007 Experimental results


Download ppt "Deconvoluting BAC-gene Relationships Using a Physical Map Y. Wu 1, L. Liu 1, T. Close 2, S. Lonardi 1 1 Department of Computer Science & Engineering 2."

Similar presentations


Ads by Google