Presentation is loading. Please wait.

Presentation is loading. Please wait.

De Novo Assembly of Mitochondrial Genomes from Low Coverage Whole-Genome Sequencing Reads Fahad Alqahtani and Ion Mandoiu University of Connecticut Computer.

Similar presentations


Presentation on theme: "De Novo Assembly of Mitochondrial Genomes from Low Coverage Whole-Genome Sequencing Reads Fahad Alqahtani and Ion Mandoiu University of Connecticut Computer."— Presentation transcript:

1 De Novo Assembly of Mitochondrial Genomes from Low Coverage Whole-Genome Sequencing Reads Fahad Alqahtani and Ion Mandoiu University of Connecticut Computer Science and Engineering Department

2 Outline Introduction & prior work Our approach Preliminary results Conclusion and future work

3 Nuclear Genome Vs. Mitochondrial Genome Source:https://www.fbi.gov/about-us/lab/forensic-science-communications/fsc/july1999/dnalist.htm/dnaf1.htm

4 mt10k pipeline Read filtering using BLAST against database of mitogenomes 3 De Novo assemblers

5 mt10k Results 20/60 complete circular mitogenomes Source: http://nar.oxfordjournals.org/content/early/2014/10/07/nar.gku917.full

6 MITObim Mitochondrial Baiting and Iterative Mapping Source:http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3711436/

7 ARC Assembly by Reduced Complexity Steps: 1) align sequence reads to reference sequences of related species 2) use alignment results to distribute reads into target specific bins 3) perform assemblies for each bin (target) to produce contigs 4) replace previous reference targets with assembled contigs and iterate. Source: http://biorxiv.org/content/biorxiv/early/2015/01/31/014662.full.pdf

8 Outline Introduction & prior work Our approach Preliminary results Conclusion and future work

9 Our Approach Reads Read filtering Geneious Circular mtDNA Sequence Unlike previous similarity-based filters, we use k-mer coverage- based read classifiers Less biased than use of related species genome Final assembly done with circular-genome aware assembler (Geneious)

10 K-mer coverage histogram

11 Coverage is Uniform across Mitogenome, but Varies b/w Individuals & Sequencing Centers source:Li, Mingkun, et al. "Transmission of human mtDNA heteroplasmy in the Genome of the Netherlands families: support for a variable-size bottleneck." Genome research 26.4 (2016): 417-426. So, we need to learn the coverage from each sample

12 COI Gene Cytochrome c oxidase subunit 1 mitochondrial region ("COI", ~648 base pairs long) has been selected as a “DNA barcode” for taxonomic classification Barcode of Life Datasystem (BOLD) has 4,954K COI sequences from 168K animal species

13 Detailed pipeline BOLD/ GenBank COI gene K-mers & Counts Jellyfish Hashtable of k-mers with coverage similar to COI COI k-mer coverage distribution K-mer classifier Reads Read filtering Geneious Circular mtDNA Sequence Keep if ≥l-3k k-mers in hashtable

14 K-mer Classifiers 1.Likelihood Ratio K-mer Classifier: Keep k-mer x if P(x | µ COI, σ COI ) > P(x | µ genome, σ genome ) 2. Coverage K-mer Classifier: Keep k-mer x if | coverage(x) - µ COI | <= 3σ COI

15 Outline Introduction & prior work Our approach Preliminary results Conclusion and future work

16 Human Data 4 individuals from 1000 Genomes project 2 Male, 2 Female One male and one female are siblings Illumina paired-end reads Up to 5 million reads Read length: 108bp Insert length: between 100bp and 600bp Ground truth Generated by 1000 Genomes Project by mapping reads to reference genome

17 Tammar wallaby (Macropus eugenii) Illumina paired-end reads 10 Million reads Read length:100bp Insert lengths:108bp and 550bp Ground truth Macropus eugenii voucher ABTC18205 mitochondrion, partial genome Sequence ID: gb|KJ868119.1|gb|KJ868119.1| Length: 16865

18 Results (Human 2M reads) SampleClassifier OutputHisatGeneious Length Edit Distance # reads Is circular? Male 1 None2,000,0002,920Yes16,5704 Ratio1,994,1282,920Yes16,5704 Coverage1,926,1262,920Yes16,5704 Female 1 None2,000,0003,796No Ratio1,996,6583,796No Coverage1,920,1043,796No Male 2 None2,000,0003,788Yes16,373 275 Ratio1,995,1583,788Yes16,373 275 Coverage1,937,1463,776Yes16,373 275 Female 2 None2,000,0002,936Yes16,569 5 Ratio1,997,0622,936Yes16,569 5 Coverage1,934,2662,936Yes16,569 5

19 Results (Human 5M reads) SampleClassifier OutputHisatGeneious Length Edit Distance # reads Is circular? Male 1 None5,000,0007,248Yes16,5704 Ratio661,7306,846Yes16,5704 Coverage4,808,0807,238Yes16,5704 Female 1 None5,000,0009,514Yes16,5704 Ratio546,5868,942Yes16,5704 Coverage4,794,0669,514Yes16,5704 Male 2 None5,000,0009,864Yes16,5686 Ratio667,1009,668Yes16,5686 Coverage4,829,7169,864Yes16,5686 Female 2 None5,000,0007,682Yes16,5677 Ratio646,7407,308Yes16,5677 Coverage4,835,6227,682Yes16,5677

20 Results (Tammar 10M reads) Insert lengthClassifier Output HisatGeneious Length Blast #Reads#readsIs circular?% indentity# gaps 500 None10M7,232Yes16,9399915 Ratio642,8747,084Yes16,9399915 Coverage795,9946,578Yes16,9399915 180 None10M7,778Yes16,9439915 Ratio1,658,2327,720Yes16,9439915 Coverage9,525,1227,032Yes16,9439915

21 Outline Introduction & prior work Our approach Preliminary results Conclusion and future work

22 Conclusion & Future Work Preliminary results show high success rate in assembling complete circular mitogenomes Future work: Improved k-mer classifier accuracy by incorporating GC bias Direct comparison with previous methods (mt10k, MITObim, ARC) Application to different sequencing technologies (i.e. Ion Torrent) Detection of heteroplasmies Assembly of mitogenomes from metagenomic samples Assembly of chloroplast genomes from low coverage DNA sequencing of plants

23 GC-content bias in coverage

24 THANK YOU FOR YOUR ATTENTION ANY QUESTIONS?

25 How De Novo Assembly works? source:http://www.nature.com/nmeth/journal/v9/n4/full/nmeth.1935.html

26 Human mtDNA Copy Number Source: Miller, Francis J., et al. "Precise determination of mitochondrial DNA copy number in human skeletal and cardiac muscle by a PCR ‐ based assay: lack of change of copy number with age." Nucleic acids research 31.11 (2003): e61-e61. The mtDNA copy number also varies from tissue to tissue (6970 +/- 920 in heart muscle compared to 3650 +/- 620 in skeletal muscle). Right Atrium of Heart Skeletal Muscle


Download ppt "De Novo Assembly of Mitochondrial Genomes from Low Coverage Whole-Genome Sequencing Reads Fahad Alqahtani and Ion Mandoiu University of Connecticut Computer."

Similar presentations


Ads by Google