Presentation is loading. Please wait.

Presentation is loading. Please wait.

Stratton Nature 45: 719, 2009 Evolution of DNA sequencing technologies - 1980 to present day DNA SEQUENCING & ASSEMBLY.

Similar presentations


Presentation on theme: "Stratton Nature 45: 719, 2009 Evolution of DNA sequencing technologies - 1980 to present day DNA SEQUENCING & ASSEMBLY."— Presentation transcript:

1 Stratton Nature 45: 719, 2009 Evolution of DNA sequencing technologies - 1980 to present day DNA SEQUENCING & ASSEMBLY

2 “The X Prize Foundation of Playa Vista, California, is offering a $10-million prize to the first team to accurately sequence the genomes of 100 people aged 100 or older, for $1,000 or less apiece and within 30 days [beginning September 5, 2013].” see Nature 487:417, July 26, 2012 $$$ Motivation to “spur DNA sequencing technologies, boost accuracy and drive down costs” …with an accuracy of <1 error per 1 million bases

3 … but it stops when dideoxynucleotide is incorporated 4 parallel sets of reactions: ddATP + 4 dNTPs ddCTP + 4 dNTPs etc. Fig. 4.2 Sanger chain termination method (Fred Sanger, 1977) - enzymatic synthesis of DNA strand complementary to “template” of interest Nobel Prizes: Sanger 1958 (protein structure) 1980 (DNA sequencing)

4 Fig. 4.2 Ratio of ddATP:dATP important to get appropriate size range of products - set of products each terminating with ddA - their sizes reflect positions of T in template DNA

5 Products (each differing in length by 1 nt) resolved on denaturing polyacrylamide gels... Automated sequencing profile Autoradiograph Fig. 4.1 Fig. 4.3 or by capillary electrophoresis …

6 Fig. 4.5 PRIMERS FOR SEQUENCING 1. “Universal” - forward & reverse 2. Custom-designed “internal” - use new sequence info to design primer to sequence next stretch If insert is too long to completely sequence using “universal primers”, can use this strategy to close a “sequencing gap”

7 … or can find another clone in library that has overlap, and sequence it using “universal” primers Fig. 3.35

8 if particular region of genome is not represented in clone library - can use a different vector to prepare a second clone library - then use probes (eg. oligomers) mapping to ends of contigs from first library to screen second library (maybe region was unstable in first vector) What if there is a “physical gap”? Fig. 4.17

9 Fig. 4.11 Which contigs are adjacent? … or by PCR You have 9 contigs & design oligomers mapping close to their ends (#1-18) Example of closing a “physical gap” screening by hybridization 8 7 1 2

10 What if “physical gap” is very short? - then sequence the PCR product directly - could use oligomers mapping to ends of contigs in PCR reactions with uncloned DNA template 3’… … 5’ 5’… … 3’ < 10 kb or so - this slide also illustrates a method for finding overlapping clones

11 Fig. 4.12 - repeat to “walk along” genome ASSEMBLING INFO FROM CLONES INTO CONTIGS 1. CHROMOSOME WALKING by hybridization - sequence from one clone is used as probe to screen library of clones to find overlapping one

12 But what if probe contains repeated sequences?  Problem avoided if use short unique-sequence probe (eg oligomer) mapping close to end of clone - so hybridizes to multiple clones … or if pre-hybridize with repeat sequence Fig. 3.34

13 2. CHROMOSOME WALKING by PCR Fig. 4.13 - reactions can be carried out as pools for more rapid screening - design primer pairs based on sequence at end of clone - use other clones in library for template DNA - will get PCR amplicon for any new clones with that sequence (combinatorial screening) Fig. 4.14

14 Fig. 4.15A 3. CLONE FINGERPRINTING Restriction profile fingerprint To identify overlapping clones: by finding features that they share or clones having STS in common (Fig.4.15D)

15 Fig. 4.10 Haemophilus genome project 1995 (1.8 Mbp) 1. DNA sonicated, fragments (1.6 – 2 kb) cloned in plasmid vectors 2. Shotgun sequencing of insert ends ~ 20,000 clones analyzed, 11 Mbp of sequence, scaffolds with sequencing gaps & physical gaps 4. Screened for overlapping clones – reduced to 42 contigs 3. Assembled into 140 contigs 5. Assumed gaps represented genome regions unstable in plasmid vector - switched to lambda vector 6. Probed library with oligomers from contig ends or used PCR with primer pairs from contig ends

16 “Cost per Megabase of DNA Sequence (or Why biologists panic about computing)” “Next generation” sequencing technologies National Human Genome Research Institute - major challenge to correctly assemble the massive amount of sequence data generated… and to interpret it !

17 Genome Res 11:3, 2001 - one dNTP is added at a time + enzyme (apyrase) that degrades dNTP if not incorporated into new strand, then next dNTP added - incorporation detected by chemiluminescence of pyrophosphate (PP i ) Fig. 4.9 1. Pyrosequencing C www.youtube.com/watch?v=kYAGFrbGl6E&feature=related

18 Medini Nat Rev Microbiol. 6:419, 2008 - DNA sheared, adaptors ligated, attached to bead & PCR amplified - beads captured in wells & pyrosequencing carried out in parallel on each DNA fragment Enzymes on beads and primer Sample preparationPyrosequencing PCR Polymerase PPi Light Genomic DNA - average read of ~ 700 (?) bp “Massively-parallel” pyrosequencing (on beads or chips) 454 technology... but “up to 1.6 million reactions can be carried out in parallel on a 6.4 cm 2 slide” “expect ~ 500 million nucleotides of sequence data per 10 hour run” (July 2010)

19 2. Illumina sequencing (parallel microchip) Sample preparation Sequencing by synthesis - average read of ~ 40-100 bp (short-read) - add adaptors to sheared DNA, attach to chip, then PCR “bridge amplification” - denature clusters of ~ 1000 copies of DNA molecules & sequential sequencing using four fluorophore-labelled nts SOLEXA technology Medini Nat Rev Microbiol. 6:419, 2008 www.youtube.com/watch?v=HtuUFUnYB9Y&feature=related Output (2 × 100 bp)600 Gb300 Gb Run Time (2 × 100 bp) ~11 days~8.5 days Paired-end Reads6 Billion3 Billion Single Reads3 Billion1.5 Billion Maximum Read Length** 2 × 100 bp Bases Above Q30*** > 85% (2 x 50 bp) > 80% (2 x 100 bp) HiSeq 2000 HiSeq 1000 (Illumina website Sept. 2012)

20 3. Single molecule real-time sequencing (Helicos, Pacific Biosciences) Metzker Nature Reviews Genetics 11:31, 2010 - continuous monitoring of nt incorporation (rather than termination as in Sanger method…) and no amplification - formation of phosphodiester bond releases fluorophore - nanoscale wells on chip so ~ one DNA polymerase molecule per well (Helicos website Sept. 2012) - read length 25 to 55 bases, 21-35 Gigabases per run

21 Chin et al. New Eng J Med 364:33, 2011 Press release, Dec 9,2010: “PacBio & Harvard Use Fast Gene Sequencer to Crack DNA Code of Haitian Cholera Strain” H1 and H2 strains were sequenced in < 24 hr with enough “reads” to cover the genomes 60 and 32 times, respectively.


Download ppt "Stratton Nature 45: 719, 2009 Evolution of DNA sequencing technologies - 1980 to present day DNA SEQUENCING & ASSEMBLY."

Similar presentations


Ads by Google