How will new sequencing technologies enable the HMP? Elaine Mardis, Ph.D. Associate Professor of Genetics Co-Director, Genome Sequencing Center Washington.

Slides:



Advertisements
Similar presentations
Recombinant DNA Technology
Advertisements

The Past, Present, and Future of DNA Sequencing
The Good, Bad, and Ugly of Next-Gen Sequencing
Reference mapping and variant detection Peter Tsai Bioinformatics Institute, University of Auckland.
High-Throughput Sequencing Technologies
MCB Lecture #9 Sept 23/14 Illumina library preparation, de novo genome assembly.
The 454 and Ion PGM at the Genomics Core Facility Dr. Deborah Grove, Director for Genetic Analysis Genomics Core Facility Huck Institutes of the Life Sciences.
Canadian Bioinformatics Workshops
Next-generation sequencing – the informatics angle Gabor T. Marth Boston College Biology Department AGBT 2008 Marco Island, FL. February
RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.
9 Genomics and Beyond Brief Chapter Outline
Greg Phillips Veterinary Microbiology
Physical Mapping I CIS 667 February 26, Physical Mapping A physical map of a piece of DNA tells us the location of certain markers  A marker is.
Mining SNPs from EST Databases Picoult-Newberg et al. (1999)
RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.
A Contract Research and Services Organization. Ideas to Life! A Contract Research and Services Organization  Xcelris is a Specialty Contract Research.
Genome Analysis Determine locus & sequence of all the organism’s genes More than 100 genomes have been analysed including humans in the Human Genome Project.
High Throughput Sequencing
Delon Toh. Pitfalls of 2 nd Gen Amplification of cDNA – Artifacts – Biased coverage Short reads – Medium ~100bp for Illumina – 700bp for 454.
Department of Bioinformatics and Computational Biology
CS 6293 Advanced Topics: Current Bioinformatics
Next Generation DNA Sequencing Platforms: Evolving Tools for
Diabetes and Endocrinology Research Center The BCM Microarray Core Facility: Closing the Next Generation Gap Alina Raza 1, Mylinh Hoang 1, Gayan De Silva.
NGS Data Generation Dr Laura Emery. Overview The NGS data explosion Sequencing technologies An example of a sequencing workflow Bioinformatics challenges.
Update on Next-Generation Sequencing
The impact of next-generation sequencing technology of genetics Elaine R. Mardis – 11 February Washington School of Medicine, Genome Sequencing Center.
Next Now-Generation Genomics: methods and applications for modern disease research Aaron J. Mackey, Ph.D. Center for Public Health.
Next generation sequencing Xusheng Wang 4/29/2010.
Biostatistics-Lecture 15 High-throughput sequencing and sequence alignment Ruibin Xi Peking University School of Mathematical Sciences.
Considerations for Analyzing Targeted NGS Data Introduction Tim Hague, CTO.
Mapping protein-DNA interactions by ChIP-seq Zsolt Szilagyi Institute of Biomedicine.
High Throughput Sequencing Methods and Concepts
CUGI Pilot Sequencing/Assembly Projects Christopher Saski.
DNA Methylation mapping
DNA Cloning and PCR.
PERFORMANCE COMPARISON OF NEXT GENERATION SEQUENCING PLATFORMS Bekir Erguner 1,3, Duran Üstek 2, Mahmut Ş. Sağıroğlu 1 1Advanced Genomics and Bioinformatics.
NEXT – GEN SEQUENCING TECHNIQUES
High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.
Detection of Genomic Rearrangements in K562 cells using Paired End Sequencing Rosa Maria Alvarez Massachusetts Institute of Technology Class of 2009.
Next Generation DNA Sequencing
Chromatin Immunoprecipitation DNA Sequencing (ChIP-seq)
The iPlant Collaborative
Gerton Lunter Wellcome Trust Centre for Human Genetics From calling bases to calling variants: Experiences with Illumina data.
PHYSICAL MAPPING AND POSITIONAL CLONING. Linkage mapping – Flanking markers identified – 1cM, for example Probably ~ 1 MB or more in humans Need very.
SEQUENCING – THE BENCHTOPS. Roche 454 Junior Same technology as 454 FLX Read length: 400 bases Paired-end 100,000 reads 12 hours (instrument time) Output.
1. 2 VARIANTS OF PCR APPLICATIONS OF PCR MECHANICS OF PCR WHAT IS PCR? PRIMER DESIGN.
UK NGS Sequencing Update July 2009 Dr Gerard Bishop - Division of Biology Dr Sarah Butcher – Centre for Bioinformatics.
Molecular Cloning.
Third Generation Sequencing. Today Illumina – Solexa sequencing technology 454 Life sciences – 454 sequencer Applied Biosystem – SOLiD system Tomorrow.
CyVerse Workshop Transcriptome Assembly. Overview of work RNA-Seq without a reference genome Generate Sequence QC and Processing Transcriptome Assembly.
A brief guide to sequencing Dr Gavin Band Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for Health.
When the next-generation sequencing becomes the now- generation Lisa Zhang November 6th, 2012.
Canadian Bioinformatics Workshops
Reliable Identification of Genomic Variants from RNA-seq Data Robert Piskol, Gokul Ramaswami, Jin Billy Li PRESENTED BY GAYATHRI RAJAN VINEELA GANGALAPUDI.
Library QA & QC Day 1, Video 3
Introduction to Illumina Sequencing
16S rRNA Experimental Design
Next-generation sequencing technology
DNA Sequencing Second generation techniques
Short Read Sequencing Analysis Workshop
Next generation sequencing
RNA-Seq for the Next Generation RNA-Seq Intro Slides
Preprocessing Data Rob Schmieder.
Quality Control & Preprocessing of Metagenomic Data
Next-generation sequencing technology
2nd (Next) Generation Sequencing
The impact of next-generation sequencing technology on genetics
Massively Parallel Sequencing: The Next Big Thing in Genetic Medicine
Next-generation DNA sequencing
Genomic DNA Sample Preparation
Presentation transcript:

How will new sequencing technologies enable the HMP? Elaine Mardis, Ph.D. Associate Professor of Genetics Co-Director, Genome Sequencing Center Washington University School of Medicine

Advantages of Next Gen Platforms No sub-cloning, no use of E. coli as host - cloning bias abolished - one FTE can keep several instruments busy Each sequence is from a unique DNA molecule - quantitation is possible through “counting” - enhanced dynamic range - detection of rare variants Multiple sequence-based assays on one platform

New Sequencing Platforms Roche FLX Sequencer Illumina 1G Analyzer ABI SOLiD  Sequencer Helicos Single-molecule sequencer

Roche FLX: Vital Statistics >100Mb data/7 hours/$16K Read lengths average 250 bp Accuracy is hindered by homopolymer run in/dels Coverage model is higher than for 3730 data © Elaine Mardis, Ph.D. Currently: By year’s end: Improved pipeline and read assembly software Paired end reads 400 bp read lengths Bar-code tagging of libraries

Illumina 1G Analyzer: Vitals 1 Gb/4 days/$ bp read lengths, 8 channel flow cell Read accuracy is highest in 1st 25 bp, ~1% overall error rate Biased representation of high AT regions Currently: By year’s end: Paired end read capability 50 bp read lengths Improved short read mapping, assembly algorithms (?)

Cross-Platform Comparisons Platform cost $350K$500K$395K Read length 650 bp +250 bp40-50 bp Cost/run $55$16,000$3-5,000 Mbp/day Cost/Mbp $880$160$5 Accuracy high No subs, Indels at homopolymers high Paired end reads YesComingYes* Criterion 3730Roche Illumina © Elaine Mardis, Ph.D.

AB SOLiD™: Vital Statistics 500Mb-1Gb/5 days/?$$ 50 base pair read lengths/ paired end or fragment reads Ligation based sequencing with high accuracy due to 2-base encoding Analysis software is unknown Early access platform due Q3 of ‘07

HeliScope sequencer Single molecule detection obviates PCR amplification step >25Mbp/hour initial data rate, 1000Mbp/hour ultimately with <1% error rate Short read lengths, single molecule sequencing with high fidelity Two 25 channel flow cells Read mapping/assembly capability (?)

Comparative metagenomics: Cecal contents of obese mice (ob/ob) and lean littermates EXPERIMENTAL DESIGN: 1)Remove cecal contents of 2 ob/ob, 2 +/+, and 1 ob/+ C57Bl/6J mice and isolate DNA. 2)454 pyrosequencing of total DNA - 350,000 reads/mouse (one ob/ob, one +/+ mouse). 3)Compare data from each mouse to all known bacterial sequences. 4)Use data clustering methods to examine similarities and differences between all 5 mice that were sequenced. 5)Perform microbiota transplantation to test for ability to transfer phenotype to gnotobiotic mice. © Elaine Mardis, Ph.D.

Next Gen RNA Sequencing Our laboratory has developed a robust full-length cDNA process for 454-based sequencing of eukaryotic transcriptomes that features low input of total RNA, enzyme-based normalization and the ability to preferentially sequence the 5’ ends of cDNAs. We presently are working to modify this approach for sequencing microbiotal transcriptomes and clinical isolates likely to contain viral RNA genomes (e.g. nasal lavage samples). © Elaine Mardis, Ph.D.

Illumina ‘Mockagenomics’ Experiment We created two mock metagenomic samples by combining known bacterial and human genomic DNAs and sequenced them by Illumina platform to generate short (30bp) reads. We plan to compare the relative strengths of classification by assembly and alignment to those of “signature” characterization (GC content, kmer analysis) for short read data

Practical Issues DNA quality and quantity Value of paired end vs. fragment reads Normalization vs. quantitation Depth of “search space”

Sample prep Evaluate DNA Fragment (2-500bp) Repair ends Adapter ligate Enrich Amplify on bead(Roche/AB) or on glass slide (Illumina) Evaluate DNA Fragment (2.5kb) Repair ends Adapter ligate Methylate Restrict adapters Circularize 2° restriction with type IIS enzyme Purify tags+adapter Amplify Fragment reads Paired end reads

Paired End Libraries Internal Adapter 25 base Tag #1 25 base Tag #2 Mate Pair Library EcoP15I or fragmentation

Sequencing: PESP#1PESP#2 NaIO 4 U.S.E.R. Read 1 (25 to 40 cycles)Read 2 (25-40 cycles) Total cycles 3-primer PE method Graft: P7:P7diol:9TUP5 [P7+P7diol] = [9TUP5] P7diol & 9TUP5 linearisable P7 non-linearisable Cluster formation: Heterogeneous clusters containing: P7/9TUP5 bridges P7diol/9TUP5 bridges P7diol/9TUP5P7/9TUP5

What are the issues? Consented sample availability!! Read length and accuracy Sample complexity Sensitivity to detect Coverage and cost DNA vs. RNA Bioinformatics-based analyses

Bioinformatics Challenges Most daunting issue: the ability to analyze enormous data sets intelligently and efficiently Metagenomic analysis tools are now emerging for next gen sequence data Testing and implementation into analysis pipelines will follow Output is only as good as the depth of the search space and the depth of coverage for any given combination of sample & sequencer