The Genome Assemblies of Tasmanian Devil Zemin Ning The Wellcome Trust Sanger Institute
The Phusion2 pipeline Flow-sorting sequencing Assigning contigs to individual chromosomes The Devil genome assembly Future work Outline of the Talk:
Phusion2 Assembly Pipeline Solexa Reads Assembly Reads Group Data Process Long Insert Reads Supercontig Contigs PRono Fuzzypath Velvet Phrap 2x75 or 2x100 Base Correction GraphRP
Mis-assembly Errors: Contig Break based on Inconsistent Pairs
Mis-assembly Errors: Contig Break Based on Pair Coverage
Pipeline of Contig Gap Closure
Sequencing T. Devil on Illumina: Strategy Tumour or normal genomic DNA Fragments of defined size 0.5, 2, 5, 7, 8, 10 kb Sequencing 2x100bp reads short insert 2x50bp mate pairs Alignment using bwa, smalt Somatic mutations Germline variants Sequencing performed at Illumina De novo Assembly
Table 1 Run ID, Template names, Number of reads and Chromosome size 4972_1 chr1 IL20_4972: _1 chr2 IL21_4967: _1 chr3 IL30_4971: _1 chr4 IL14_4964: _1 chr5 IL17_4969: _2 chr6 IL17_4969: _3 chrx IL17_4969: Read mapping coefficient: e = Size_of_Chr/Num_reads_in_lane
Chr1EAS25_101_ Chr1EAS25_101_ Chr2EAS25_101_ Chr2EAS25_101_ Chr3EAS25_101_ Chr3EAS25_101_ Chr4EAS188_173_ Chr4EAS188_173_ Chr5EAS188_173_ Chr5EAS188_173_ Chr6EAS188_173_ Chr6EAS188_173_ ChrxEAS188_173_ New Data Sequenced by Illumina
Table 2 Chr_ID, Chr_size, Contigs_assigned Bases_assigned N_reads Chr Chr Chr Chr Chr Chr Chrx Unassigned (1.8%)
Table 2 Chr_ID, Chr_size, Contigs_assigned Bases_assigned Mb Chr Chr Chr Chr Chr Chr Chrx Unassigned New Data Sequenced by Illumina
Unassigned contigs were placed by supercontigs using mate pairs
Solexa reads : Number of read pairs: 650 Million; Finished genome size: 3.3 GB; Read length:2x100bp; Estimated read coverage: ~40X; Insert size: 410/ bp; Mate pair data:2k,4k,5k,6k,8k,10k Number of reads clustered:591 Million Assembly features: - stats Contigs Supercontigs Total number of contigs: 237,29135,974 Total bases of contigs: 2.93 Gb3.17 Gb N50 contig size: 20,1391,847,186 Largest contig:189,8665,315,556 Averaged contig size: 12,35488,254 Contig coverage on genome: ~94%>99% Ratio of placed PE reads:~92%? Genome Assembly Normal – T. Devil
Acknowledgements: Elizabeth Murchuson David McBride Fengtang Yang Mike Stratton Ole Schulz-Trieglaff Dirk Evers David Bentley