Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Past, Present, and Future of DNA Sequencing

Similar presentations


Presentation on theme: "The Past, Present, and Future of DNA Sequencing"— Presentation transcript:

1 The Past, Present, and Future of DNA Sequencing
Craig A. Praul Co- Director Genomics Core Facility Huck Institutes of the Life Sciences Penn State University

2 A very short history of DNA sequencing

3 I started from the conviction that, if different DNA species exhibited different biological activities, there should also exist chemically demonstrable differences between deoxyribonucleic acids. Edwin Chargaff

4 Milestones First Isolation of DNA : 1867 (Freidrich Meisher)
Composition of nucleic acids; tetranucleotide theory : (Phoebus Levine) G=C and A=T however, the G/C and A/T content of different organisms vary : 1950 (Edwin Chargaff) G/C content measured by annealing : 1968 (Mandel and Marmur) Maxam-Gilbert and Sanger Sequencing : 1977 Next-Generation Sequencing : 2005

5 Genomes Sequenced Virus – 3222 (Bacteriophage phi X 174, 5386 nt – 1977) Bacteria – 2289 (Haemophilus influenza, 1.8 x 106 nt – 1995) Eukarya – 168 (S. cerevisiae 1.2 x 107 nt – 1995; H. sapien, 3 x 109 nt -2001) Archaea – 152 (Methanococcus jannaschi , 1.7 x 106 nt – 1996)

6 Next-Generation Sequencing
Liu et al. Journal of Biomedicine and Biotechnology Volume 2012 (2012), Article ID , 11 pages doi: /2012/251364

7 Changes in instrument capacity*
ER Mardis. Nature 470, (2011) doi: /nature09796

8 Sequencing Cost Date Cost per Mb Cost per Genome Sep-01 $5,292.39
$95,263,072 Sep-02 $3,413.80 $61,448,422 Oct-03 $2,230.98 $40,157,554 Oct-04 $1,028.85 $18,519,312 Oct-05 $766.73 $13,801,124 Oct-06 $581.92 $10,474,556 Oct-07 $397.09 $7,147,571 Oct-08 $3.81 $342,502 Oct-09 $0.78 $70,333 Oct-10 $0.32 $29,092 Oct-11 $0.09 $7,743 Oct-12 $0.07 $6,618 Jan-13 $0.06 $5,671 Source - NHGRI :

9 Central Dogma of Molecular Biology
James Watson version RNA Protein DNA So once we have the genomic DNA sequence of a species we have all of the information there is? Really?

10 No, not really.

11

12 Illumina HiSeq and MiSeq
Massively parallel HiSeq : 150 or 180 million reads per lane MiSeq : 15 million reads per run Intermediate Read Length HiSeq : 100 nt or 150 nt MiSeq : 250 nt High total output per run HiSeq : 90 GB or 288 GB MiSeq : 8 GB

13 Sequencing Types Single Read Paired-end read Mate-pair read

14 Library Types Many different library preps : DNA, mate-pair, mRNA, miRNA, ChIP Fragmentation DNA : 300 – 500 nt RNA : 150 – 200 nt Attachment of appropriate adapters Complex : flow cell binding, F & R sequencing, BC Custom : Avoid if possible Removal of dimers/small inserts Amplification (or not)

15 Applications de Novo sequencing (genomes, transcriptomes)
Resequencing (genomes, exomes, custom sequence capture) RNA-seq (mRNA, miRNA, degradome) Chip-Seq Methyl-seq RIP-seq Amplicon

16 de Novo Experimental Design
Estimate of genome size Coverage (30 x – 100 x) Sequencing Type (paired-end or mate-pair) Example 100 MB genome, 100 x 100 nt paired-end reads (100 MB) x (30 x coverage) = 3 GB 3 GB / (200 nt for each pair of paired-end reads) = 15 million read pairs Replicates

17 Resequencing : Sequence Capture

18 RNA-seq Experimental Design
Estimate of transcriptome size (1-5% of genome ?) Coverage (30 x ?) mRNA or rRNA depleted RNA Relative abundance of transcripts you are interested in Sequencing Type (single read or paired-end) Simple transcriptome vs. complex transcriptome Splice variants Example 3 GB genome, 100 nt single reads (3 GB genome) x ( 5% transcriptome ) = 120 MB Transcriptome (120 MB transcriptome) x (30 x coverage) = 4.5 GB total sequence 4.5 GB / (100 nt for each read) = 45 million read pairs Replicates : Yes!!!! Biological not technical

19 ChIP-Seq

20

21 RIP-seq Source :

22 Methyl-seq 20 different types of base modifications in DNA are known and there are perhaps 200 modifications of RNA

23 Experimental Space: Next-Gen Platform
PacBio : x 106 reads/sample, 1000 – 3000 nt Whole transcript Roche 454 FLX+ : x 106 reads/sample, nt Small – Medium Genome de novo sequencing Long Amplicon Transcriptome PGM: 1-2 x 106 reads per sample, 400 nt Small genome de novo Medium Amplicon MiSeq: 1-2 x 106 reads per sample, 50 – 250 nt Small genome de Novo Small Amplicon HiSeq : x 106 reads per sample, 50 – 150 nt Counting Applications : RNA-seq, ChIP-seq, RIP-seq, Methyl-seq Large genome de novo and resequencing

24 Experimental Space: The Relevancy of “Classic” Techniques
Differential Gene Expression Northern blotting (1977) : 1 Probe – 20 samples Dot Blots (1987) : 100s of probes – 1 sample RT-PCR (1992) : 100s of probes – samples Microarrays (1995 ) : 100,000s of probes – 1 sample Next-gen sequencing (2005) : x 106 reads – 1 sample

25 The Future More Reads Longer Reads Faster Sequencing
Cheaper Sequencing New Applications


Download ppt "The Past, Present, and Future of DNA Sequencing"

Similar presentations


Ads by Google