Presentation is loading. Please wait.

Presentation is loading. Please wait.

FuzzyPath - A Hybrid De novo Assembler using Solexa and 454 Short Reads Zemin Ning The Wellcome Trust Sanger Institute.

Similar presentations


Presentation on theme: "FuzzyPath - A Hybrid De novo Assembler using Solexa and 454 Short Reads Zemin Ning The Wellcome Trust Sanger Institute."— Presentation transcript:

1 FuzzyPath - A Hybrid De novo Assembler using Solexa and 454 Short Reads Zemin Ning The Wellcome Trust Sanger Institute

2 Outline of the Talk:  Assembly strategy  Read extension using base qualities and read pairs  Repeat junctions and single base variation  Fuzzy kmers – how to find mismatches  Assemblies with mixed Solexa and 454 reads  Solexa reads guided by a closely related reference  Long Solexa reads with 70 bps  Future Work

3 Assembly Strategy Selexa reads assembler to extend long reads of 1-2Kb Genome/Chromosome Capillary reads assembler Phrap/Phusion forward-reverse paired reads 30-70 bp known dist ~500 bp 30-70 bp

4 Kmer Extension & Repeat Junctions

5 Quality Filters on Junctions

6 Repetitive Contig and Read Pairs Depth For each hit read in the contig, contig index and offset are stored. Insert length Current read position Contig start Pair read position Depth

7 Handling of Single Base Variations

8 ACGTAACTAACAGTT 00 01 10 11 00 00 01 11 00 00 01 00 10 11 11 ACGTAACTCACAGTT 00 01 10 11 00 00 01 11 01 00 01 00 10 11 11 ACGTAACT ACAGTT 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 Number of Mismatches between Two Kmers

9 Use of Kmers with Mismatches

10 Mixed Solexa and 454 Reads L = ~250 bp L-K+1 kmers L-N-K+1 kmers Pileup of 454 reads at a repeat junction

11 Pileup of Solexa and 454 Reads

12 Guided by A Closely Related Reference L = 3000 bp L-K+1 kmers L-N-K+1 kmers Pileup of shredded reads at a repeat junction

13 Pileup of Solexa and Shredded Reads

14 Long Solexa Reads with 70 bp L = 70 bp L-K+1 kmers Pileup of long Solexa reads at a repeat junction

15 Pileup of Long 70 bp Solexa Reads

16 Solexa reads : Number of reads: 3,084,185; Finished genome size: 2,007,491 bp; Read length:39 and 36 bp; Estimated read coverage: ~55X; Number of 454 reads:100,000; Read coverage of 454:10X; Assembly features: - contig stats Total number of contigs: 73; Total bases of contigs: 1,999,817 bp N50 contig size: 62,508; Largest contig:162,190 Averaged contig size: 27,394; Contig coverage over the genome: ~99 %; Contig extension errors: 2 Mis-assembly errors:3 S.Suis P1/7 Solexa/454 Assembly

17 Shredded reads : Number of reads: 1,338,161; Finished genome size: 2,007,491 bp; Read length:36; Estimated read coverage: 24X; Insert size:500 bp; Assembly features: Paired _Data Not_Paired Number of contigs: 35317 Total assembled bases: 1.996 Mb1.956 Mb N50 contig size: 243,03913,929 Largest contig: 474,070 33,460 Averaged contig size: 57,0436,168 Contig coverage: >99.0 %>99.0 % Contig extension errors: 0 0 Mis-assembly errors: 32 S.Suis P1/7 with Shredded Pair-end Reads

18 Solexa reads : Number of reads: 6,346,317; Finished genome size: 4.7 Mbp; Read length:33 bp; Estimated read coverage: ~40 X; Shredded reference of SpA: 10X; Assembly features: - contig stats Total number of contigs: 66; Total bases of contigs: 4,615,704 bp N50 contig size: 168,793; Largest contig:401,700 Averaged contig size: 69,934; Contig coverage over the genome: ~98 %; Contig extension errors: 0 Mis-assembly errors:2 Salmonella delhi5 Solexa Assembly Guided by A Close Reference

19 Shredded reads : Number of reads: 1,338,161; Finished genome size: 2,007,491 bp; Read length:36; Estimated read coverage: 24X; Insert size:500 bp; Assembly features: Paired _Data Not_Paired Number of contigs: 35317 Total assembled bases: 1.996 Mb1.956 Mb N50 contig size: 243,03913,929 Largest contig: 474,070 33,460 Averaged contig size: 57,0436,168 Contig coverage: >99.0 %>99.0 % Contig extension errors: 0 0 Mis-assembly errors: 32 S Suis P1/7 Shredded Read Assembly

20 Acknowledgements:  Yong Gu  Ben Blackburne  Hannes Ponstingl  Harold Swerdlow  Michael Quail  Tony Cox  Richard Durbin


Download ppt "FuzzyPath - A Hybrid De novo Assembler using Solexa and 454 Short Reads Zemin Ning The Wellcome Trust Sanger Institute."

Similar presentations


Ads by Google