The Genome Assemblies of Tasmanian Devil Zemin Ning The Wellcome Trust Sanger Institute.

Slides:



Advertisements
Similar presentations
Large Plant Genome Assemblies using Phusion2 Zemin Ning The Wellcome Trust Sanger Institute.
Advertisements

Click to edit Master title style Irys data analysis January 10 th, 2014.
Introduction to Short Read Sequencing Analysis
Sequencing Informatics Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
Sequencing and Assembly Cont’d. CS273a Lecture 5, Win07, Batzoglou Steps to Assemble a Genome 1. Find overlapping reads 4. Derive consensus sequence..ACGATTACAATAGGTT..
Novel multi-platform next generation assembly methods for mammalian genomes The Baylor College of Medicine, Australian Government and University of Connecticut.
Sequencing and Assembly Cont’d. CS273a Lecture 5, Aut08, Batzoglou Steps to Assemble a Genome 1. Find overlapping reads 4. Derive consensus sequence..ACGATTACAATAGGTT..
Sequencing Informatics Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
Evaluation of PacBio sequencing to improve the sunflower genome assembly Stéphane Muños & Jérôme Gouzy Presented by Nicolas Langlade Sunflower Genome Consortium.
Compartmentalized Shotgun Assembly ? ? ? CSA Two stated motivations? ?
Genome Assembly Bonnie Hurwitz Graduate student TMPL.
High Throughput Sequencing
Next generation sequencing Xusheng Wang 4/29/2010.
GeVab: Genome Variation Analysis Browsing Server Korean BioInformation Center, KRIBB InCoB2009 KRIBB
Solanum lycopersicum Chromosome 4 Sequencing Update SOL Germany– October 2008 Wellcome Trust Medical Photographic Library.
Kerstin Howe, Mario Caccamo, Ian Sealy The Zebrafish Genome Sequencing Project Bioinformatics resources.
PE-Assembler: De novo assembler using short paired-end reads Pramila Nuwantha Ariyaratne.
Developing Bioinformatics Tools for Genome Analysis Zemin Ning The Wellcome Trust Sanger Institute.
Tomato Chromosome 4: A Mapping & Sequencing Update 28 th September 2005 Christine Nicholson Mapping Core Group Welcome Trust Sanger Institute, UK.
Introduction to Short Read Sequencing Analysis
Sequence assembly using paired- end short tags Pramila Ariyaratne Genome Institute of Singapore SOC-FOS-SICS Joint Workshop on Computational Analysis of.
Steps in a genome sequencing project Funding and sequencing strategy source of funding identified / community drive development of sequencing strategy.
High throughput sequencing: informatics & software aspects Gabor T. Marth Boston College Biology Department BI543 Fall 2013 January 29, 2013.
Detection of Genomic Rearrangements in K562 cells using Paired End Sequencing Rosa Maria Alvarez Massachusetts Institute of Technology Class of 2009.
NGS sequencing and Genome Assemblies from Animals and Large Plants Zemin Ning The Wellcome Trust Sanger Institute.
Fuzzypath – Algorithms, Applications and Future Developments
The Changing Face of Sequencing
Solanum lycopersicum Chromosome 4 Sequencing Update UK-SOL– Dec 2008 Wellcome Trust Medical Photographic Library.
FuzzyPath Assemblies - from Mixed Solexa/454 Datasets to Extremely GC Biased Genomes Zemin Ning The Wellcome Trust Sanger Institute.
Gerton Lunter Wellcome Trust Centre for Human Genetics From calling bases to calling variants: Experiences with Illumina data.
Stratton Nature 45: 719, 2009 Evolution of DNA sequencing technologies to present day DNA SEQUENCING & ASSEMBLY.
Plants.ensembl.org / The transPLANT project is funded by the European Commission within its 7 th Framework Programme under the thematic.
Problems of Genome Assembly James Yorke and Aleksey Zimin University of Maryland, College Park 1.
Finishing tomato chromosomes #6 and #12 using a Next Generation whole genome shotgun approach Roeland van Ham, CBSG, NL René Klein Lankhorst, EUSOL Giovanni.
Cancer Genome Assemblies and Variations between Normal and Tumour Human Cells Zemin Ning The Wellcome Trust Sanger Institute.
Assembly of Paired-end Solexa Reads by Kmer Extension using Base Qualities Zemin Ning The Wellcome Trust Sanger Institute.
Comparative analyses of the potato and tomato transcriptomes
Genome De Novo Assemblies and Applications in NGS Sequencing Zemin Ning The Wellcome Trust Sanger Institute.
FuzzyPath - A Hybrid De novo Assembler using Solexa and 454 Short Reads Zemin Ning The Wellcome Trust Sanger Institute.
The Wellcome Trust Sanger Institute
13 th January 2008 Plant & Animal Genome Conference Progress with Sequencing Tomato Chromosome 4 Clare Riddle Tomato Project Group Wellcome Trust Sanger.
Ke Lin 23 rd Feb, 2012 Structural Variation Detection Using NGS technology.
COMPUTATIONAL GENOMICS GENOME ASSEMBLY
Accessing and visualizing genomics data
Chapter 5 Sequence Assembly: Assembling the Human Genome.
Cross_genome: Assembly Scaffolding using Cross-species Synteny Zemin Ning High Performance Assembly.
1 Comparative analyses of the potato and tomato transcriptomes David Francis, AllenVan Deynze, John Hamilton, Walter De Jong, David Douches, Sanwen Huang,
Meet the ants Camponotus floridanus Carpenter ant Harpegnathos saltator Jumping ant Solenopsis invicta Red imported fire ant Pogonomyrmex barbatus Harvester.
ALLPATHS: De Novo Assembly of Whole-Genome Shotgun Microreads
Phusion2 Assemblies and Indel Confirmation Zemin Ning The Wellcome Trust Sanger Institute.
Variation Detections and De novo Assemblies from Next-gen Data Zemin Ning The Wellcome Trust Sanger Institute.
Sequence Alignment and Genome Assembly Zemin Ning The Wellcome Trust Sanger Institute.
When the next-generation sequencing becomes the now- generation Lisa Zhang November 6th, 2012.
JERI DILTS SUZANNA KIM HEMA NAGRAJAN DEEPAK PURUSHOTHAM AMBILY SIVADAS AMIT RUPANI LEO WU Genome Assembly Final Results
Short Read Sequencing Analysis Workshop
Phusion2 and The Genome Assembly of Tasmanian Devil
Cross_genome: Assembly Scaffolding using Cross-species Synteny
CAP5510 – Bioinformatics Sequence Assembly
COMPUTATIONAL GENOMICS GENOME ASSEMBLY
Gapless genome assembly of Colletotrichum higginsianum reveals chromosome structure and association of transposable elements with secondary metabolite.
Denovo genome assembly of Moniliophthora roreri
Professors: Dr. Gribskov and Dr. Weil
A Hybrid Assembly System in Zebrafish Pooled Clones
Ssaha_pileup - a SNP/indel detection pipeline from new sequencing data
Very important to know the difference between the trees!
Next-generation DNA sequencing
Introduction to Sequencing
Sequence the 3 billion base pairs of human
Assembly of Solexa tomato reads
Canadian Bioinformatics Workshops
Presentation transcript:

The Genome Assemblies of Tasmanian Devil Zemin Ning The Wellcome Trust Sanger Institute

 The Phusion2 pipeline  Flow-sorting sequencing  Assigning contigs to individual chromosomes  The Devil genome assembly  Future work Outline of the Talk:

Phusion2 Assembly Pipeline Solexa Reads Assembly Reads Group Data Process Long Insert Reads Supercontig Contigs PRono Fuzzypath Velvet Phrap 2x75 or 2x100 Base Correction GraphRP

Mis-assembly Errors: Contig Break based on Inconsistent Pairs

Mis-assembly Errors: Contig Break Based on Pair Coverage

Pipeline of Contig Gap Closure

Sequencing T. Devil on Illumina: Strategy Tumour or normal genomic DNA Fragments of defined size 0.5, 2, 5, 7, 8, 10 kb Sequencing 2x100bp reads short insert 2x50bp mate pairs Alignment using bwa, smalt Somatic mutations Germline variants Sequencing performed at Illumina De novo Assembly

Table 1 Run ID, Template names, Number of reads and Chromosome size 4972_1 chr1 IL20_4972: _1 chr2 IL21_4967: _1 chr3 IL30_4971: _1 chr4 IL14_4964: _1 chr5 IL17_4969: _2 chr6 IL17_4969: _3 chrx IL17_4969: Read mapping coefficient: e = Size_of_Chr/Num_reads_in_lane

Chr1EAS25_101_ Chr1EAS25_101_ Chr2EAS25_101_ Chr2EAS25_101_ Chr3EAS25_101_ Chr3EAS25_101_ Chr4EAS188_173_ Chr4EAS188_173_ Chr5EAS188_173_ Chr5EAS188_173_ Chr6EAS188_173_ Chr6EAS188_173_ ChrxEAS188_173_ New Data Sequenced by Illumina

Table 2 Chr_ID, Chr_size, Contigs_assigned Bases_assigned N_reads Chr Chr Chr Chr Chr Chr Chrx Unassigned (1.8%)

Table 2 Chr_ID, Chr_size, Contigs_assigned Bases_assigned Mb Chr Chr Chr Chr Chr Chr Chrx Unassigned New Data Sequenced by Illumina

Unassigned contigs were placed by supercontigs using mate pairs

Solexa reads : Number of read pairs: 650 Million; Finished genome size: 3.3 GB; Read length:2x100bp; Estimated read coverage: ~40X; Insert size: 410/ bp; Mate pair data:2k,4k,5k,6k,8k,10k Number of reads clustered:591 Million Assembly features: - stats Contigs Supercontigs Total number of contigs: 237,29135,974 Total bases of contigs: 2.93 Gb3.17 Gb N50 contig size: 20,1391,847,186 Largest contig:189,8665,315,556 Averaged contig size: 12,35488,254 Contig coverage on genome: ~94%>99% Ratio of placed PE reads:~92%? Genome Assembly Normal – T. Devil

Acknowledgements:  Elizabeth Murchuson  David McBride  Fengtang Yang  Mike Stratton  Ole Schulz-Trieglaff  Dirk Evers  David Bentley