Molecular Genomic Imaging Center (CEGS) Harvard / Wash U George Church, Rob Mitra Greg Porreca, Jay Shendure Sequencing by Ligation on Polony Beads with.

Slides:



Advertisements
Similar presentations
Next-Generation Sequencing: Methodology and Application
Advertisements

MCB 317 Genetics and Genomics Topic 11, part 2 Genomics.
This presentation was originally prepared by C. William Birky, Jr. Department of Ecology and Evolutionary Biology The University of Arizona It may be used.
High-Throughput Sequencing Technologies
PCR Polymerase Chain Reaction Mariam Cortes Tormo Miami Children’s Hospital Research institute 2013.
Harvard Med, WashU, MIT NIH-CEGS, DARPA, PhRMA, DOE-GTL Sequencing: Helicos, Ambergen, Caliper, BC-Agencourt-GTC, Synthesis: Nimblegen, Atactic/Invitrogen,
Accurate Multiplex Polony Sequencing of an Evolved Bacterial Genome Jay Shendure, Gregory J. Porreca, Nikos B. Reppas, Xiaoxia Lin, John P. McCutcheon.
28-Apr 8:15AM – 8:45AM Next-Gen Seq Data Management Thanks to: Advancing Personal Genetics with Second Generation Sequencing.
Thanks to: DARPA BioComp DNA&RNA Polonies: Mitra, Shendure, Zhu Protein MS: Jaffe, Leptos Metabolism/Proliferation models : Segre, Vitkup, Badarinarayana.
Single Cell, RNA, & Chromosome Sequencing Technologies
Lecture ONE: Foundation Course Genetics Tools of Human Molecular Genetics I.
Emerging Sequencing Technologies TAC Presentation Jay Shendure February 27, 2004.
Thanks to: Washington U, Harvard-MIT Broad Inst., DARPA-BioSpice, DOE-GTL, EU-MolTools, NGHRI-CEGS, NHLBI-PGA, NIGMS-SysBio, PhRMA, Lipper Foundation Agencourt,
Polymerase chain reaction: Starting with VERY SMALL AMOUNTS OF DNA (sometimes a few molecules), one can amplify the DNA enough to detect it by electrophoresis.
George Church Thu 27-Apr :30-11 Broad-MPG Thanks to: New Sequencing Technologies & Diploid Personal Genomes NHGRI Seq Tech 2004: Agencourt, 454,
HST Advisory Council Thursday 16-Nov :00 to 2:20 PM Personal Genomes & Medicine Thanks to: Broad Inst., DARPA-BioComp, DOE-GTL, EU-MolTools, NGHRI-CEGS,
Informatics challenges and computer tools for sequencing 1000s of human genomes Gabor T. Marth Boston College Biology Department Cold Spring Harbor Laboratory.
Delon Toh. Pitfalls of 2 nd Gen Amplification of cDNA – Artifacts – Biased coverage Short reads – Medium ~100bp for Illumina – 700bp for 454.
CS 6293 Advanced Topics: Current Bioinformatics
Next Generation DNA Sequencing Platforms: Evolving Tools for
University of Oklahoma Genome Center4/14/12.
Diabetes and Endocrinology Research Center The BCM Microarray Core Facility: Closing the Next Generation Gap Alina Raza 1, Mylinh Hoang 1, Gayan De Silva.
Chapter 14 Jizhong Zhou and Dorothea K. Thompson.
High-Throughput Sequencing Technologies
Finishing the Human Genome
Sequencing Technologies
Library Preparation Application dependant, using standard molecular biological techniques. Fragment library oligo kit: (per library)$35 GeneAmp dNTP blend:
Genome Sequencing and Assembly
Computational research for medical discovery at Boston College Biology Gabor T. Marth Boston College Department of Biology
DNA Cloning and PCR.
Fig Chapter 12: Genomics. Genomics: the study of whole-genome structure, organization, and function Structural genomics: the physical genome; whole.
Stratton Nature 45: 719, 2009 Evolution of DNA sequencing technologies to present day DNA SEQUENCING & ASSEMBLY.
HaloPlexHS Get to Know Your DNA. Every Single Fragment.
PHYSICAL MAPPING AND POSITIONAL CLONING. Linkage mapping – Flanking markers identified – 1cM, for example Probably ~ 1 MB or more in humans Need very.
Molecular Testing and Clinical Diagnosis
SEQUENCING – THE BENCHTOPS. Roche 454 Junior Same technology as 454 FLX Read length: 400 bases Paired-end 100,000 reads 12 hours (instrument time) Output.
ABC for the AEA Basic biological concepts for genetic epidemiology Martin Kennedy Department of Pathology Christchurch School of Medicine.
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.
Ultra-High Throughput DNA Sequencing on the 454/Roche GS-FLX
February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles.
Chapter 10: Genetic Engineering- A Revolution in Molecular Biology.
Polymerase Chain Reaction (PCR) Nahla Bakhamis. Multiple copies of specific DNA sequences; ‘Molecular Photocopying’
目录 The Principle and Application of Common Used Techniques in Molecular Biology chapter 18.
MCT = Molecular Colony Technique Alexander Chetverin Institute of Protein Research of the Russian Academy of Sciences References: NAR(10)2349 from 1993.
Whole-Genome Optical Mapping
Covariance in RNA ref " Covariance M ij =  fx i x j log 2 [fx i x j /(fx i fx j )] M=0 to 2 bits; x=base type x i x j see Durbin et al p
When the next-generation sequencing becomes the now- generation Lisa Zhang November 6th, 2012.
Sequencing Transcriptomes Do Me a SOLiD. Overview – Library Construction RNA ◦Isolate & Bioanalyze ◦rRNA Depletion ◦Fragment ◦Bioanalyze Amplified Library.
Population sequencing using short reads: HIV as a case study Vladimir Jojic et.al. PSB 13: (2008) Presenter: Yong Li.
16S rRNA Experimental Design
Next-generation sequencing technology
DNA Sequencing Second generation techniques
Next generation sequencing
Next Generation Sequencing
Next-generation sequencing technology
Very important to know the difference between the trees!
Chapter 20: DNA Technology and Genomics
Sequencing Technologies
Polymerase Chain Reaction (PCR)
Jianbin Wang, H. Christina Fan, Barry Behr, Stephen R. Quake  Cell 
ULTRASEQUENCING. Next Generation Sequencing: methods and applications.
Expansion of Interstitial Telomeric Sequences in Yeast
Massively Parallel Sequencing: The Next Big Thing in Genetic Medicine
High-Throughput Sequencing Technologies
High-Throughput Sequencing Technologies
Human Telomerase Activation Requires Two Independent Interactions between Telomerase RNA and Telomerase Reverse Transcriptase  James R. Mitchell, Kathleen.
Next-generation DNA sequencing
Chapter 20: DNA Technology and Genomics
HuD protein specifically recognizes and binds the Msi1 3′UTR sequence.
Volume 49, Issue 5, Pages (March 2013)
Presentation transcript:

Molecular Genomic Imaging Center (CEGS) Harvard / Wash U George Church, Rob Mitra Greg Porreca, Jay Shendure Sequencing by Ligation on Polony Beads with Nick Reppas, Kun Zhang, Shawn Douglas, Mike Wang, Abraham Rosenbaum, Agencourt Personal Genomics, Stem Cells, ELSI Synthetic Biology

Polymerase colony 2 vs. 1 immobilized primer in situ polonies vs. emulsion PCR beads single molecule vs. multi-molecule detection dNTP extension (SBE) vs. ligation (SBL) (>=3X error 1e-6, 1/10 cost of ABI E.coli ) Shendure, Porreca, Mitra, Church Single chromosomes : haplotyping (Zhang) Single cells : full sequence (Zhang & Martiny) Single RNA molecules : RNA splicing (Zhu, Varma)

1. In vitro construction of a complex mate-paired library 2. Template amplification to one micron beads by emulsion PCR 3. Cyclic Array Sequencing by Ligation (SBL) Polony Sequencing Overview

~1 kb genomic fragment paired genomic tags (17 to 18 bp each) common sequences MmeI Fisseq-F -R T30Tag 2Tag 1 Fisseq-FLeftRight Mid Seq2Seq1 In vitro construction of a complex, mate-paired library 43 bp Total = bp amplicon

(1)Emulsion PCR to 1 micron beads Dressman et al. PNAS'03 Template Amplification

Enrichment by Hybridization Selector Bead

One of 750 megapixel frames of gel-immobilized 1.0 micron beads, 0.3 micron pixels, 4-colors

ACUCAUC… (3’)…TAGAGT????????????????TGAGTAG…(5’) 5’-Cy5-nnnnAnnnn-3’ 5’-Cy3-nnnnGnnnn-3’ 5’-TR-nnnnCnnnn-3’ 5’-Cy3+Cy5-nnnnTnnnn-3’ 5'PO 4 Sequencing by Ligation (SBL) with fluorescent combinatorial 9-mers Excitation Emission nm

Consensus AccuracyFalse Positives (E.coli)False Positives (Human) 1E-34,0003,000,000 1E-4 BERMUDA/ABI ,000 1E-6Polony-SBL Goal of Resequencing  Discovery of Uncommon Variation Why low error rates?

 trp/  tyrA pair of genomes shows the best co-growth (syntrophs) Reppas & Lin First Passage SecondPassage Genome engineering: Select for cross-feeding

Co-evolution of cross-feeding Trp- & Tyr- genome pair

~1 kb genomic fragment 980 ± 96 bp ~860,000 independent mate-pairing events

confirmed 776 bp deletion via tandem 8 bp repeats 1,974,001 (MG1655) 1,978,000 (MG1655) Aberrations in mate-pair distance indicative of rearrangements

Base-calling Tetrahedron Fluorescent SBL data quality measured by distance to the 4 vertices. A G C T

Mean accuracy = 99.5% Best 50% of base-calls are 99.9% accurate Q40 Q30 Q20 Raw Error Rate

Consensus error rates

PositionTypeGeneLocation ABI Confirmation Comments 986,334T > GompFTATA box Only in evolved strain 931,9608 bp dellrpframeshift Only in evolved strain 1,976, bp delinsB_5IS element MG1655 heterogeneity 3,957,960C > TppiC5' UTR MG1655 heterogeneity 4,654,533T > CcIGlu > Glu heterogeneity 4,647,960T > CORF61Lys > Gly heterogeneity 985,797T > GompFGlu > Ala(in progress) 454,864T > CtigGly > Gly (in progress) 4,648,691G > AexoPhe > Phe (in progress) Mutation Discovery in Engineered & Evolved Trp - Strain

ABI2004 Jun >2007 # bp/expt-2e73e7 3e8 60e9 Complexity (bp) -744e6 3e9 6e9 Avg Fold Cov83e Pix per bp Read-length (SBE) 25 (pair)35 42 $ / Q20 kb 8e-1 -8e-2 4e-2 1e-5 $/ 1X 3e9 b 2e6 - 2e5 5e4 1e2 (2e3) Indel Error 5e-30.6%1e-3 1e-3 1e-3 Subst Error 4e-34e-61e-3 1e-3 1e-3 3X Cons Err 1e-4 - 1e-6 3e-7 1e-7 Kb / min e3 1e6 Pix / sec-2e52e6 6e6 2e7 Enz $/mg Cost comparison & projection

>2007 # bp/expt 60e9 20X of 3e9 = 10X diploid Complexity (bp) 6e9 Automated 96-well libraries Avg Fold Cov 10 (Currently align.4 pix =.1 micron) Pix per bp 1 Sensitivity & align CCD & slide? Read-length 42 Is 34 enough? (next slide) $ / Q20 kb 1e-5 (20X 3e9) $/ 1X 3e9 b 1e2 (2e3) Need haplotyping too? (slide after next) Indel Error 1e-3 Subst Error 1e-3 3X Cons Err 1e-7 Kb / min 1e6 Pix / sec 2e7 Current camera is 3e7, but stage is 2e6 Enz $/mg 0.4 Realized for many recombinant proteins Challenges in $2000 genome

Assume paired 17-mers (i.e. read full tag length) with bp distance distribution (980 ±  = 96 bp observed) Exact Matching (34/34) Zero UniqueMultiple Paired, no substitutions Paired, one substitution Unpaired, no substitutions Single Substitution or Exact (33/34 or 34/34) ZeroUniqueMultiple Paired, no substitutions Paired, one substitution Unpaired, no substitutions Human Resequencing with Mate-Paired 17 bp Tags [simulation]

rs rs rs39284 rs rs GM10835 CGCGCCGCGC TATATTATAT TT=137 CT=2 (TC=1) CC= Mb Single chromosome molecule haplotypes

Amplifying & sequencing whole genomes from single cells  29 real-time amplification No template control Affymetrix quantitation of 2 independent amplifications Escherchia & Prochlorococcus Zhang, Martiny, Chisholm, Church, unpub.

Polymerase colony 2 vs. 1 immobilized primer in situ polonies vs. emulsion PCR beads single molecule vs. multi-molecule detection dNTP extension (SBE) vs. ligation (SBL) (>=3X error 1e-6, 1/10 cost of ABI E.coli ) Shendure, Porreca, Mitra, Church Single chromosomes : haplotyping (Zhang) Single cells : full sequence (Zhang & Martiny) Single RNA molecules : RNA splicing (Zhu, Varma)

Shared Resources STTR Polymerase libraries NEB MJR ABI Fuller CCDs spectra, cost, #pixels, sensistivity, speed software Cancer Genome NCAB clonal? enrichment MRD accuracy read length Cost estimates distribute template spreadsheet Roundtable I