Presentation is loading. Please wait.

Presentation is loading. Please wait.

Reconstruction of DNA sequencing by hybridization Ji-Hong Zhang, Ling-Yun Wu and Xiang-Sun Zhang Institute of Applied Mathematics,

Similar presentations


Presentation on theme: "Reconstruction of DNA sequencing by hybridization Ji-Hong Zhang, Ling-Yun Wu and Xiang-Sun Zhang Institute of Applied Mathematics,"— Presentation transcript:

1 Reconstruction of DNA sequencing by hybridization Ji-Hong Zhang, Ling-Yun Wu and Xiang-Sun Zhang ZHANGroup@aporc.org Institute of Applied Mathematics, AMSS, CAS

2 Bioinformatics Human Genome Project Large molecule data in biology, such as DNA and protein Knowledge of mathematics, computer science, information science, physics, system science, management science as well as biology Genomics DNA sequencing Gene prediction Sequence alignment

3 DNA Sequencing …ACGTGACTGAGGACCGTG CGACTGAGACTGACTGGGT CTAGCTAGACTACGTTTTA TATATATATACGTCGTCGT ACTGATGACTAGATTACAG ACTGATTTAGATACCTGAC TGATTTTAAAAAAATATT…

4 DNA Sequencing (shotgun) cut many times at random known dist forward-reverse linked reads ~500 bp target DNA

5 DNA Sequencing (SBH) DNA array (DNA chip) with 4 3 probes Target DNA: AAATGCG

6 Sequencing by Hybridization Hybridize target to array containing a spot for each possible k-tuple (k-mer) The spectrum of a sequence multi-set of all its k-long substrings (k-tuples) Goal reconstruct the sequence from its spectrum Pevzner (1989): reconstruction is polynomial But …

7 Uniqueness of Reconstruction Different sequences can have the same spectrum: ACT, CTA, TAC  ACTAC  TACTA Non-uniqueness Probability

8 Experiment Errors Hybridization experiments are error prone False negative error k-tuple appears in target DNA but does not appear in its measured spectrum Repetition of k-tuple False positive error k-tuple does not appear in target DNA but does appear in its measured spectrum

9 Sequencing by Hybridization Target DNA …… TTTTACGC ……  Spectrum Errors: Positive (misread) / Negative (missing, repetition) TTT TTA TAC ACG CGC Ideal case TTT TTA TAC ACG CGC TGA With errors

10

11 SBH Reconstruction Problem In the case of error-free SBH experiments A desired solution of SBH is just a feasible solution including all k-tuple in the specturm For the general case There is no additional information except spectrum and the length of target DNA A feasible solution composed of a maximum cardinality subset of the spectrum shall be a reasonable desired solution

12 SBH Reconstruction Problem Ideal case (without repetitions and errors) Equivalent to finding an Eulerian path in a corresponding graph (Pevzner, 1989) A linear time algorithm (Fleischner, 1990) General case is NP-hard problem Branch and bound Heuristics Extensions PSBH (Positional SBH) SBH with length error

13 Motivations Give some criteria which can determine the most possible k-tuples at both ends and in the middle of all possible reconstructions of the target DNA These criterions greatly reduce ambiguities in the reconstruction of DNA Transform the negative errors into the positive errors These means enables us to handle both types of errors easily Separate the repetitions from both type of errors

14 Methods Estimate the number of k-tuples that does not occur in a solution Adjacency matrix (connection matrix) Give a lower bound of k-tuples that does not occur in all solutions from k-tuple i to j

15 Methods Determine the most possible k-tuples at both ends Reconstruct from the most possible end pairs to get an upper bound of SBH problem Purge the end pairs that can not have better solution than current upper bound

16 Methods Transform the negative errors into the positive errors Artificial k-tuple  Fill in all the possible gaps due to false negative error Negative error level  The maximal number of allowed consecutively missing k- tuples  Reduce the number of artificial k-tuples

17 Computational Experiments 109 DNA sequence from GenBank Simulate the SBH experiments Error models Randomly (probabilistic model) Systematically (one base mismatched model)

18

19

20 Conclusions Ideal case (without repetitions and errors) can be solved in polynomial time (Pevzner, 1989) General case is NP-hard problem Design efficient algorithms Ji-Hong Zhang, Ling-Yun Wu and Xiang-Sun Zhang. A new approach to the reconstruction of DNA sequencing by hybridization. Bioinformatics, vol 19(1), pages 14-21, 2003. Xiang-Sun Zhang, Ji-Hong Zhang and Ling-Yun Wu. Combinatorial optimization problems in the positional DNA sequencing by hybridization and its algorithms. System Sciences and Mathematics, vol 3, 2002. (in Chinese) Ling-Yun Wu, Ji-Hong Zhang and Xiang-Sun Zhang. Application of neural networks in the reconstruction of DNA sequencing by hybridization. In Proceedings of the 4th ISORA, 2002.


Download ppt "Reconstruction of DNA sequencing by hybridization Ji-Hong Zhang, Ling-Yun Wu and Xiang-Sun Zhang Institute of Applied Mathematics,"

Similar presentations


Ads by Google