Cross_genome: Assembly Scaffolding using Cross-species Synteny

Slides:



Advertisements
Similar presentations
Large Plant Genome Assemblies using Phusion2 Zemin Ning The Wellcome Trust Sanger Institute.
Advertisements

Final Results Genome Assembly Team Kelley Bullard, Henry Dewhurst, Kizee Etienne, Esha Jain, VivekSagar KR, Benjamin Metcalf, Raghav Sharma, Charles Wigington,
Click to edit Master title style Irys data analysis January 10 th, 2014.
Kelley Bullard, Henry Dewhurst, Kizee Etienne, Esha Jain, VivekSagar KR, Benjamin Metcalf, Raghav Sharma, Charles Wigington, Juliette Zerick Genome Assembly.
Bioinformatics for Whole-Genome Shotgun Sequencing of Microbial Communities By Kevin Chen, Lior Pachter PLoS Computational Biology, 2005 David Kelley.
CS273a Lecture 5, Win07, Batzoglou Quality of assemblies—mouse N50 contig length Terminology: N50 contig length If we sort contigs from largest to smallest,
Some new sequencing technologies. Molecular Inversion Probes.
Elephant Seg Dup Analysis 1.Genome 2.Parameters for Pipeline 3.Analysis.
DNA Sequencing. The Walking Method 1.Build a very redundant library of BACs with sequenced clone- ends (cheap to build) 2.Sequence some “seed” clones.
Sequencing and Assembly Cont’d. CS273a Lecture 5, Win07, Batzoglou Steps to Assemble a Genome 1. Find overlapping reads 4. Derive consensus sequence..ACGATTACAATAGGTT..
Novel multi-platform next generation assembly methods for mammalian genomes The Baylor College of Medicine, Australian Government and University of Connecticut.
CS273a Lecture 4, Autumn 08, Batzoglou Hierarchical Sequencing.
Genome sequencing. Vocabulary Bac: Bacterial Artificial Chromosome: cloning vector for yeast Pac, cosmid, fosmid, plasmid: cloning vectors for E. coli.
Genome Assembly Bonnie Hurwitz Graduate student TMPL.
JAMES LINDSAY*, HAMED SALOOTI, ALEX ZELIKOVSKI, ION MANDOIU* ACM-BCB 2012 Scaffolding Large Genomes Using Integer Linear Programming University of Connecticut*Georgia.
De-novo Assembly Day 4.
Kerstin Howe, Mario Caccamo, Ian Sealy The Zebrafish Genome Sequencing Project Bioinformatics resources.
CUGI Pilot Sequencing/Assembly Projects Christopher Saski.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
PE-Assembler: De novo assembler using short paired-end reads Pramila Nuwantha Ariyaratne.
Developing Bioinformatics Tools for Genome Analysis Zemin Ning The Wellcome Trust Sanger Institute.
Opera: Reconstructing optimal genomic scaffolds with high- throughput paired-end sequences Song Gao, Niranjan Nagarajan, Wing-Kin Sung National University.
Improving the Accuracy of Genome Assemblies July 17 th 2012 Roy Ronen *,1, Christina Boucher *,1, Hamidreza Chitsaz 2 and Pavel Pevzner 1 1. University.
Sequence assembly using paired- end short tags Pramila Ariyaratne Genome Institute of Singapore SOC-FOS-SICS Joint Workshop on Computational Analysis of.
NGS sequencing and Genome Assemblies from Animals and Large Plants Zemin Ning The Wellcome Trust Sanger Institute.
Fuzzypath – Algorithms, Applications and Future Developments
The Changing Face of Sequencing
Advancing Science with DNA Sequence Metagenome definitions: a refresher course Natalia Ivanova MGM Workshop September 12, 2012.
FuzzyPath Assemblies - from Mixed Solexa/454 Datasets to Extremely GC Biased Genomes Zemin Ning The Wellcome Trust Sanger Institute.
Problems of Genome Assembly James Yorke and Aleksey Zimin University of Maryland, College Park 1.
Finishing tomato chromosomes #6 and #12 using a Next Generation whole genome shotgun approach Roeland van Ham, CBSG, NL René Klein Lankhorst, EUSOL Giovanni.
Cancer Genome Assemblies and Variations between Normal and Tumour Human Cells Zemin Ning The Wellcome Trust Sanger Institute.
Assembly of Paired-end Solexa Reads by Kmer Extension using Base Qualities Zemin Ning The Wellcome Trust Sanger Institute.
Comparative analyses of the potato and tomato transcriptomes
Genome De Novo Assemblies and Applications in NGS Sequencing Zemin Ning The Wellcome Trust Sanger Institute.
University of Connecticut School of Engineering Assembler Reference Abyss Simpson et al., J. T., Wong, K., Jackman, S. D., Schein, J. E., Jones,
The Genome Assemblies of Tasmanian Devil Zemin Ning The Wellcome Trust Sanger Institute.
FuzzyPath - A Hybrid De novo Assembler using Solexa and 454 Short Reads Zemin Ning The Wellcome Trust Sanger Institute.
The Wellcome Trust Sanger Institute
Mojavensis: Issues of Polymorphisms Chris Shaffer GEP 2009 Washington University.
Ke Lin 23 rd Feb, 2012 Structural Variation Detection Using NGS technology.
COMPUTATIONAL GENOMICS GENOME ASSEMBLY
Accessing and visualizing genomics data
Cross_genome: Assembly Scaffolding using Cross-species Synteny Zemin Ning High Performance Assembly.
Meet the ants Camponotus floridanus Carpenter ant Harpegnathos saltator Jumping ant Solenopsis invicta Red imported fire ant Pogonomyrmex barbatus Harvester.
ALLPATHS: De Novo Assembly of Whole-Genome Shotgun Microreads
Genome Research 12:1 (2002), Assembly algorithm outline ● Input and trimming ● Overlap detection ● Error correction ● Evaluation of alignments.
Phusion2 Assemblies and Indel Confirmation Zemin Ning The Wellcome Trust Sanger Institute.
Variation Detections and De novo Assemblies from Next-gen Data Zemin Ning The Wellcome Trust Sanger Institute.
When the next-generation sequencing becomes the now- generation Lisa Zhang November 6th, 2012.
JERI DILTS SUZANNA KIM HEMA NAGRAJAN DEEPAK PURUSHOTHAM AMBILY SIVADAS AMIT RUPANI LEO WU Genome Assembly Final Results
GENOME ORGANIZATION AS REVEALED BY GENOME MAPPING WHY MAP GENOMES? HOW TO MAP GENOMES?
Sequencing and Assembly of the WheatD Genome using BAC Pools A Preliminary Study Daniela Puiu Sept 23rd 2013.
Sequencing, de novo assembling, and annotating the genome of the endangered Chinese crocodile lizard, shinisaurus crocodilurus Jian gao, qiye li, zongji.
Short Read Sequencing Analysis Workshop
Sequence assembly Jose Blanca COMAV institute bioinf.comav.upv.es.
Phusion2 and The Genome Assembly of Tasmanian Devil
COMPUTATIONAL GENOMICS GENOME ASSEMBLY
Gapless genome assembly of Colletotrichum higginsianum reveals chromosome structure and association of transposable elements with secondary metabolite.
Denovo genome assembly of Moniliophthora roreri
M. roreri de novo genome assembly using abyss/1.9.0-maxk96
Jeong-Hyeon Choi, Sun Kim, Haixu Tang, Justen Andrews, Don G. Gilbert
Professors: Dr. Gribskov and Dr. Weil
A Hybrid Assembly System in Zebrafish Pooled Clones
Ssaha_pileup - a SNP/indel detection pipeline from new sequencing data
Very important to know the difference between the trees!
Jin Zhang, Jiayin Wang and Yufeng Wu
Padova sequencing contribution:
CSCI 1810 Computational Molecular Biology 2018
Volume 10, Issue 6, Pages (June 2017)
Presentation transcript:

Cross_genome: Assembly Scaffolding using Cross-species Synteny Zemin Ning High Performance Assembly 1

Can synteny help? And How? Scaffolding Contig gap closure

RACA - Reference-assisted chromosome assembly

Q = scaff(i)*232 + contig_loci(j) Lattice of Target - Reference Target sequence Reference Scaffold 1 Scaffold 2 Scaffold 3 Q = scaff(i)*232 + contig_loci(j) Lattice of Target - Reference

After Noise Cleaning Gap_size = Y - X Scaffold 3 Scaffold 2 Scaffold 1 Target sequence Reference Scaffold 1 After Noise Cleaning Gap_size = Y - X Y Scaffold 3 X Scaffold 2

Cases Shouldn’t Join Reference Target Reference Target Gap_size Scaffold 1 Scaffold 2 Reference Target Gap_size Scaffold 1 Scaffold 2

GAGE: Human Chr14 and RACA using Orangutan Assembler N_bases N_scaffs N50 (Mb) Original 88.8 418 81.6 Allpahts-LG RACA 86.8 Cross_genome 89 221 85.5 78.6 1472 0.37 Bambus2 72.1 1094 13.7 86.5 498 0.4 CABOG 81.4 86.3 46 89.7 0.88 MSR-CA 83.4 89.6 94.7 30975 0.075 SGA 57.4 94.8 29662 77.3 108 38477 0.453 SOAPdenovo 84.4 102.8 12955 78.9 143.8 61455 0.84 Velvet 123 139.4 3278 8.71

Scaffold N50 for Other Genome Assemblies Original Cross_g References Panda 1.3Mb 25Mb Dog, Human Tibetan Antelope 2.6Mb 42Mb Cattle, Dog, Human Tasmanian Devil 1.8Mb 6.8Mb Opossum Availability ftp://ftp.sanger.ac.uk/pub/users/zn1/merge/cross_genome/

Improve gorilla assembly using human reference Contig gap size re-estimation Improve gorilla assembly using human reference Combined Gorilla-Human Assembly Read Alignment Pair-wise/Multiple Read Clustering Local Assembly Final Gorilla Assembly

Re-estimate Contig Gap Sizes from Reference New gap size Local assembly based on clustered reads Ref seq inserted Gap size New gap size Target sequence Reference sequence

Assemblies using Synteny-guided Method Gorilla Genome - Real Data Human Chr6 - Simulation Gorilla Genome - Real Data Reads: 2x100 with 500bp insert 60X Original Assembly Contig N50 24.3kb 13.5kb Average contig length 6850bp 6940bp N of clusters (100000 pairs) 504 5807 43.7kb 24.0kb Gap closed 7809 10433 N of base errors in gap closed regions 256 subs and 12 indels (24bps) N/A

Gorilla - Merge with other De novo Assemblies Original assembly (dev5) Merge with Fermi* Merge with Masurca+ Contig N50 13.5kb 30.2kb 53.1kb Average length 6850 12577 18768 Largest contig 215kb 391.2kb 448.8kb N of gaps closed 182661 257167 *Fermi assembler: https://github.com/lh3/fermi/ +Masurca assembler: http://www.genome.umd.edu/masurca.html

Gs = (Kn – Ks)/D = 4.5x109 Kn = 125.4x109 – Total number of kmer words; Ks = 2.4x109 - Number of single copy kmer words; D = 27 - Depth of kmer occurrence

Original Contig (query) against New Assembly after Contig Break

Alignment Inconsistency

Original Contig (query) against New Assembly after Contig Break

Alignment Inconsistency

The Gorilla Assemblies Original New Total number of contigs: 464,875 285,139 N50 contig size: 11.7kb 23.9kb Largest contig: 191,556 322,733 Averaged contig size: 6085 9928

Acknowledgements: Hanness Ponstingl Frank Liu – Nanjing University of Information Technology (NUIT) Yan Li – (NUIT) Gorilla genome sequencing data BGI – Panda and Tibetan Antelope assemblies