Presentation on theme: "Current Sequencing Technologies and Data Generation"— Presentation transcript:
1 Current Sequencing Technologies and Data Generation Corbin Jones & Piotr MieczkowskiDepartment of Biology, College of Arts and Sciences, Carolina Center for Genome Sciences Department of Genetics, School of Medicine, University of North Carolina at Chapel Hill
3 Pooling for multiplexing Sample SubmissionData flow3 µl of SampleSonicationQCLIMSSample flowTransfer to new plate/tubeSample failed y/nConcentrationSizeLeftover sampleDilutionSample flowSame plateEnd RepairAdenylationAdapter LigationLeftover sampleSize Selection3 µl of SampleSAMPLEPCRQC15 nM dilutionFacilityPooling for multiplexing
4 Manual and Semiautomation in HTSF library prep workflow SonicationAutomated library size selectionMagnetic beads DNA size selectionSage – Pippin – Automated size selection system
5 Automation in HTSF Tecan – Freedom Evo system – 8 tip 2x48 (96) samples per week – DNA library prepAutomated sample normalization stepsPCR and qPCR preparationReagans distributionCan be adapted to small and medium scale protocols for Illumina and Ion TorrentWe have all necessary components for DNA/RNA extraction using Qiagen kitsCaliper – Sciclone system96 tip pipetting head(8-96 samples per run)TruSeq DNA library preparationTruSeq Exome EnrichmentSureSelect Agilent DNA capture and library preparation
6 Next-generation Sequencing (Deep Sequencing) Platforms Short readsGenome Analyzer IIx (GAIIx), HiSeq2000, HiSeq2500, MiSeq – IlluminaSOLiD 5500xl System – Applied BiosystemHeliScope™ Single Molecule Sequencer - HelicosLong readsGenome Sequencer FLX System (454) – RochePacBio RS - Pacific BiosciencePersonal Genome Machine, Ion Proton - Ion TorrentGridION – Oxford NanoporeMapping sequences to large DNA fragmentsNABsysBionanomatrix
7 UNC – HTSF 9 HiSeq 2000/2500 1 GA II PacBio Ion Torrent MiSeq (Jeff Dangl)PIOTRLiz Buda and Donghui TanAlso on campus:454 (Microbiome)454 jr. (Viral genomics)MiSeq – Kevin Weeks
8 HiSeq 2000/2500 – 100-160mln single end sequencing reads per lane. What type of sequencing should I choose for the Illumina sequencing project?HiSeq 2000/2500 – mln single end sequencing reads per lane.- ChIPseq – Single End 50 cycles (2-3 human samples per lane)- RNAseq – Single End 50 cycles (2-3 human samples per lane)If you are interested in splicing variants and fusion genes both Single End 100cycles and Paired End 2x50cycles will be better option for you.Whole Genome Sequencing – Paired End 2x100cycles (2-3 lanes per genome)Exome Capture - Paired End 2x100cycles (4 samples per lane)MiSeq – 3-7 mln single end sequencing reads per lane. Custom projects , fast turnaround.Metagenomics - 16S profile – Paired End 2x150cycles up to 24 samples per lane.Whole Microbial Genome Sequencing - Paired End 2x150cycles
9 SHORT READ PLATFORMS at UNC HiSeq 2000Initially capable of up to 600Gb per run in 13 days.Cost of resequencing one human genome:Now UNC PI - (30x coverage) about $6,000Now for outside of UNC - (30x coverage) about $9,000HiSeq 2500Initially capable of up 100Gb per run in 27hours.Cost per genome - ???
10 MiSeq Small capacity system. PE 2x150cycles in 27hours. PE 2 x 250bp coming soon – error rate for read 1 – less than 1%; read 2 about 1.2%.In preparation – PE 2 x 400bp – error rate for read1 about 2%; read 2 about 4%.In preparation – Longer insert size possible 1.5kb
11 PacBio RS Single molecule resolution in real time Short waiting time for result and simple workflowGenerate basecalls in <1 dayPolymerase speed ≥1 base per secondNo amplification requiredBias not introducedMore uniform coverageDirect observationDistinguish heterogeneous samplesSimultaneous kinetic measurementsLong readsIdentify repeats and structural variantsLess coverage requiredInformation contentOne assay, multiple applicationsGenetic variation (SVs to SNPs)MethylationEnzymologyC2 chemistry – installed March 2012Long reads 6-10kbMeidan size of molecules 3kbStill 15% error rateNo strobe sequencingSoftware focus on:De novo assemblyHi quality CCS consensus readsIn preparationLoad long molecules by magnetic beadsModified nucleotides detection
12 LS – long sequencing reads CCS – high quality sequencing reads PacBio RS – two sequencing modesLS – long sequencing readsSample PreparationStandardLarge insert sizes (2kb-10kb)Generates one pass on each molecule sequencedCCS – high quality sequencing readsCircular ConsensusSmall insert sizes 500bpGenerates multiple passes on each molecule sequenced
13 Example Data: 1 smart cell Pre-Filter # of Bases 180,320,136 bp Post-Filter # of Bases 165,424,592 bpPre-Filter # of Reads Post-Filter # of Reads 52801Pre-Filter Mean Readlength 2399 bp Post-Filter Mean Readlength 3133 bpPre-Filter Mean Read Quality Post-Filter Mean Read Quality 0.827% Adapter Dimer (0-10bp) 1.94 %% Short Insert (11-100bp) 0.47 %
14 Personal Genome Machine – Ion Torrent (life technologies) Three types of semiconductor chips:314 – 20MbMb318 – 1GbRead length depends on base composition bp (200cycles)System is enabled for Paired End 2x100cyclesThe fastest sequencing system on the market.How it works:H+ ion is released during base incorporation. Individual polymerases attached to beads are positioned in tiny wells that rest on a tiny pH meter.Recommendation:Resequencing applications which require fast turnaround of samples- Amplicons (PCR products)Small and medium size genomesCustom DNA capture applications
15 PGM/Ion Torrent Data 316 chip Thr.note the homopolymer problemsTotal Number of Bases [Mbp] 77.65‣ Number of Q17 Bases [Mbp] 36.11‣ Number of Q20 Bases [Mbp] 27.33Total Number of Reads 368,860Mean Length [bp] 211Longest Read [bp] 380
16 Library Preparation from Low Quantities of DNA or RNA Microfluidics stationary and portable systemsMondrian SP System – NuGEN TechnologiesHuman libraries from 5ng of total DNA. Only 10-15% of duplicate reads.Ultralow DNA library systemsSoon:Ultralow RNA library systemsLibraries from total RNA with rRNA depletion.Advanced Liquid Logic from RTP
18 Ion Proton System Human genome in one day Cost of reagents $1000 per runError rate around 1.2%Human Genome, RNAseq, ChIPseqIon Proton Chip I – 10Gb(Whole Exome capture experiments)Ion Proton Chip II – 100GbWhole human Genome resequencing
19 Oxford Nanopore – new view on sequencing Hemolysin – pore - inner diameter of 1nm, about 100,000 times smaller than that of a human hair.
20 Oxford Nanopore DNA sequencing Error rate 4%, prediction for end of the year 0.1 – 2%.
22 Oxford Nanopore – new concepts MinION- 150Mb per run- Tested 48kb read length$900 per instrument500 pores per deviceGridION- XXXMb per run- Tested 48kb read length$XXX per instrument2000 pores per device, soon 8000 poresCost per human genome $1500.
23 Oxford Nanopore – applications DNA sequencingProtein detectionProtein DNA interactionSmall molecule detection96 well plates for 96 samplesControlled time of sequencing
24 Intelligent BioSystems Mini20 System (manufactured by Azco Biotech) Amplification by rolony methodSequencing by Synthesis with announced 100 base reads, but expect to compete with Sanger down the roadDesigned for clinical labs20 independent flow cells, no queue for loading, run asynchronously20M reads/flow cell, 4 GB/ flow cellPotential problems with repeatsSystem cost $120K, $150 flow cell (disposable), full costs per sample not clear yet.Entering early access now, expect commercial shipping late 2012
25 Genia TechnologiesVery early stage announcement – Backed by Life Technologies (at least 1 year away)Describe system as a cross between Ion Torrent and Oxford NanoporeElectronic “Active Control” technology enables highly efficient nanopore-membrane assembly and control of DNA movement through the channelInitially used α-Hemolysin and claimed 98% raw accuracy with that but now are using an undisclosed pore for further development.Claim sensitivity 1-2 orders of magnitude greater than Oxford Nanopore.Ramping up pore density to 100K pores/chip by end of 2012.Plan to market a mobile reader for <$1K and per sample costs <$100Plan early access in late 2012, commercial shipment 2013