FuzzyPath Assemblies - from Bacterial to Mammalian Genomes and Zebrafish Finishing Zemin Ning The Wellcome Trust Sanger Institute.

Slides:



Advertisements
Similar presentations
1 Radio Maria World. 2 Postazioni Transmitter locations.
Advertisements

The Fall Messier Marathon Guide
Números.
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
AGVISE Laboratories %Zone or Grid Samples – Northwood laboratory
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
/ /17 32/ / /
Reflection nurulquran.com.
EuroCondens SGB E.
Worksheets.
Slide 1Fig 26-CO, p.795. Slide 2Fig 26-1, p.796 Slide 3Fig 26-2, p.797.
Addition and Subtraction Equations
AIDS epidemic update Figure AIDS epidemic update Figure 2007 Estimated adult (15–49 years) HIV prevalence rate (%) globally and in Sub-Saharan Africa,
Commission Payout Dates Starts Thursday, Ends Wednesday Initials paid 2 weeks later on Thursday Residuals paid out on 20 th of each month Can track sales.
AIDS epidemic update Figure AIDS epidemic update Figure 2007 Estimated adult (15–49 years) HIV prevalence rate (%) globally and in Sub-Saharan Africa,
Add Governors Discretionary (1G) Grants Chapter 6.
CALENDAR.
Ninth Grade Distribution All Students = 301 (100%) (less level 1 & 2 students) (less level 3 students in full transition) (less students in full.
1 1  1 =.
1  1 =.
CHAPTER 18 The Ankle and Lower Leg
Summative Math Test Algebra (28%) Geometry (29%)
ASCII stands for American Standard Code for Information Interchange
Who Wants To Be A Millionaire?
£1 Million £500,000 £250,000 £125,000 £64,000 £32,000 £16,000 £8,000 £4,000 £2,000 £1,000 £500 £300 £200 £100 Welcome.
The 5S numbers game..
突破信息检索壁垒 -SciFinder Scholar 介绍
A Fractional Order (Proportional and Derivative) Motion Controller Design for A Class of Second-order Systems Center for Self-Organizing Intelligent.
The basics for simulations
© 2010 Concept Systems, Inc.1 Concept Mapping Methodology: An Example.
Mental Math Math Team Skills Test 20-Question Sample.
MCQ Chapter 07.
TCCI Barometer March “Establishing a reliable tool for monitoring the financial, business and social activity in the Prefecture of Thessaloniki”
1 Prediction of electrical energy by photovoltaic devices in urban situations By. R.C. Ott July 2011.
TCCI Barometer March “Establishing a reliable tool for monitoring the financial, business and social activity in the Prefecture of Thessaloniki”
Progressive Aerobic Cardiovascular Endurance Run
Charging at 120 and 240 Volts 120-Volt Portable Vehicle Charge Cord 240-Volt Home Charge Unit.
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
When you see… Find the zeros You think….
LN-251 SimINERTIAL Performance
2011 WINNISQUAM COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=1021.
Before Between After.
2011 FRANKLIN COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=332.
ST/PRM3-EU | | © Robert Bosch GmbH reserves all rights even in the event of industrial property rights. We reserve all rights of disposal such as copying.
Subtraction: Adding UP
Numeracy Resources for KS2
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
Static Equilibrium; Elasticity and Fracture
Fractions Simplify: 36/48 = 36/48 = ¾ 125/225 = 125/225 = 25/45 = 5/9
ANALYTICAL GEOMETRY ONE MARK QUESTIONS PREPARED BY:
Resistência dos Materiais, 5ª ed.
Lial/Hungerford/Holcomb/Mullins: Mathematics with Applications 11e Finite Mathematics with Applications 11e Copyright ©2015 Pearson Education, Inc. All.
UNDERSTANDING THE ISSUES. 22 HILLSBOROUGH IS A REALLY BIG COUNTY.
A Data Warehouse Mining Tool Stephen Turner Chris Frala
Chart Deception Main Source: How to Lie with Charts, by Gerald E. Jones Dr. Michael R. Hyman, NMSU.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
Introduction Embedded Universal Tools and Online Features 2.
Schutzvermerk nach DIN 34 beachten 05/04/15 Seite 1 Training EPAM and CANopen Basic Solution: Password * * Level 1 Level 2 * Level 3 Password2 IP-Adr.
Fuzzypath – Algorithms, Applications and Future Developments
FuzzyPath Assemblies - from Mixed Solexa/454 Datasets to Extremely GC Biased Genomes Zemin Ning The Wellcome Trust Sanger Institute.
Cancer Genome Assemblies and Variations between Normal and Tumour Human Cells Zemin Ning The Wellcome Trust Sanger Institute.
Assembly of Paired-end Solexa Reads by Kmer Extension using Base Qualities Zemin Ning The Wellcome Trust Sanger Institute.
The Genome Assemblies of Tasmanian Devil Zemin Ning The Wellcome Trust Sanger Institute.
FuzzyPath - A Hybrid De novo Assembler using Solexa and 454 Short Reads Zemin Ning The Wellcome Trust Sanger Institute.
Phusion2 Assemblies and Indel Confirmation Zemin Ning The Wellcome Trust Sanger Institute.
Variation Detections and De novo Assemblies from Next-gen Data Zemin Ning The Wellcome Trust Sanger Institute.
Sequence Alignment and Genome Assembly Zemin Ning The Wellcome Trust Sanger Institute.
A Hybrid Assembly System in Zebrafish Pooled Clones
Ssaha_pileup - a SNP/indel detection pipeline from new sequencing data
Presentation transcript:

FuzzyPath Assemblies - from Bacterial to Mammalian Genomes and Zebrafish Finishing Zemin Ning The Wellcome Trust Sanger Institute

Assembly Strategy Selexa reads assembler to extend long reads of 1-2Kb Genome/Chromosome Capillary reads assembler Phrap/Phusion forward-reverse paired reads bp known dist ~500 bp bp

Kmer Extension & Repeat Junctions

Handling of Single Base Variations

ACGTAACTAACAGTT ACGTAACTCACAGTT ACGTAACT ACAGTT Fuzzy Kmers Number of Mismatches between Two Kmers

Means to handle repeats: - Base quality - Base quality - Read pair - Read pair - Fuzzy kmers - Fuzzy kmers - Closely related reference - Closely related reference or Sanger reads or Sanger reads Kmer Extension & Repeat Junctions Pileup of other reads like 454, Sanger etc at a repeat junction Consensus

Pileup of Solexa and 454 Reads

Solexa reads : Number of reads: 3,084,185; Finished genome size: 2,007,491 bp; Read length:39 and 36 bp; Estimated read coverage: ~55X; Number of 454 reads:100,000; Read coverage of 454:10X; Assembly features: - contig stats Total number of contigs: 73; Total bases of contigs: 1,999,817 bp N50 contig size: 62,508; Largest contig:162,190 Averaged contig size: 27,394; Contig coverage over the genome: ~99 %; Contig extension errors: 2 Mis-assembly errors:3 S.Suis P1/7 Solexa/454 Assembly

Solexa reads : Number of reads: 6,000,000; Finished genome size: ~4.8 Mbp; Read length:2x37 bp; Estimated read coverage: ~92.5 X; Insert size: 170/ bp; Assembly features: - contig stats Solexa454 Total number of contigs: 75;390 Total bases of contigs: 4.80 Mbp4.77 Mb N50 contig size: 139,35325,702 Largest contig:395,600 62,040 Averaged contig size: 63,96912,224 Contig coverage on genome: ~99.8 %99.4% Contig extension errors: 0 Mis-assembly errors:04 Salmonella seftenberg Solexa Assembly from Pair-End Reads

library organismread lengthMb sequencegenomemean generatedsize (Mb)coverage PCR-free B. pertussis ST242 x PCR-free E. coli 0422 x PCR-free P. falciparum 3D72 x PCR-free B. pertussis ST242 x PCR-free P. falciparum 3D72 x PCR-free E. coli 0422 x standard-245 P. falciparum 3D72 x standard-368 P. falciparum 3D72 x standard-851 P. falciparum 3D72 x standard-883 P. falciparum clin2 x Extremely GC Biased Genomes GC 68.0% 50.5% 19.0% 50.8% 19.0% 68.0% 19.0%

Solexa reads :2x36 bp2x76 bp Number of reads: 14.0m9.77m Finished genome size: 23 Mbp23 Mbp Estimated read coverage: 43x64x Insert size: 170 bp170 bp Assembly features: Total number of contigs: 26, Total bases of contigs: 19.2 Mbp21.1 Mb N50 contig size: Largest contig: Averaged contig size: Contig coverage on genome: ~83.5 %91.7% Contig extension errors: ?? Mis-assembly errors:?? Malaria 3D7 Assemblies

Solexa reads : Number of reads: 7,055,348; Finished genome size: 5.35 Mbp; Read length:2x36bp; Estimated read coverage: ~95X; Insert size: 170/ bp; Assembly features: - contig stats Total number of contigs: 168; Total bases of contigs: 5.19 Mbp N50 contig size: 85,886; Largest contig:337,768 Averaged contig size: 30,886; Contig coverage over the genome: ~99 %; Contig extension errors: 1 Mis-assembly errors:2 E.Coli strain 042 Assembly

Solexa reads : Number of reads: 86.5 million; Finished genome size: 95.2 Mbp; Read length:2x36bp; Estimated read coverage: ~65X; Insert size: 120/ bp; Assembly features: - contig stats Total number of contigs: 55,802; Total bases of contigs: 75.8 Mbp N50 contig size: 2,322; Largest contig:17,859 Averaged contig size: 1,358; Contig coverage over the genome: ~80 %; Contig extension errors: ? Mis-assembly errors:? Mouse Chromosome 17 Assembly

Clone Name Length (bp) FinishedCloning VectorSpeciesCapillary Data Pathway zH117H YespTARBAC2.1D. rerio/nfs/repository/d0012/zH117H1 zH141B YespTARBAC2.1D. rerio/nfs/repository/d0012/zH141B18 zH151M YespTARBAC2.1D. rerio/nfs/repository/d0014/zH151M17 zH117E YespTARBAC2.1D. rerio/nfs/repository/d0015/zH117E7 zH137D YespTARBAC2.1D. rerio/nfs/repository/d0023/zH137D22 zH97A YespTARBAC2.1D. rerio/nfs/repository/d0027/zH97A24 zH146D YespTARBAC2.1D. rerio/nfs/repository/d0040/zH146D21 zH140N YespTARBAC2.1D. rerio/nfs/repository/d0013/zH140N19 zH147D YespTARBAC2.1D. rerio/nfs/repository/d0011/zH147D24 bE2F YespTARBAC1.3_BamHIS. scrofa/nfs/repository/d0027/bE2F11 bE156J YespTARBAC1.3_BamHIS. scrofa/nfs/repository/d0041/bE156J20 bE240L *NopTARBAC1.3_BamHIS. scrofa/nfs/repository/d0012/bE240L11 * Finished length may be shorter or longer once complete Pooled Clones: Zfish 9, Pig 3

Mapping of Solexa Reads On the Reference

extended long reads of 1-2Kb bp Insert ~300 bp bp Solexa assembly Genome/Chromosome Assembly Fishing WGS Reads WGS Reads 5X Combined Reads FuzzyPath Phusion or Phrap Phusion

Solexa reads : Number of reads: 4.3 million; Finished genome size: 1.72 Mbp; Read length:2x36bp; Estimated read coverage: ~180X; Insert size: 260/ bp; Zfish DH reads:12,539 Assembly features: - contig stats Solexa Hybrid_Ctg Hybrid_Super N contigs: Bases: 1.25 Mbp1.68 Mbp1.69 Mbp N50 size: 4,97525,81774,598 Largest23,906 79,730144,808 Averaged: 2,51311,07217,815 Coverage: ~72.6 %~73%~73% Errors:??? Zfish and “Pig” Clone Assemblies

Acknowledgements:  Yong Gu  James Bonfiled  Helen Beasley  Siobhan Whitehead  Daniel Turner  Michael Quail  Tony Cox  Richard Durbin