Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 6293 Advanced Topics: Current Bioinformatics

Similar presentations


Presentation on theme: "CS 6293 Advanced Topics: Current Bioinformatics"— Presentation transcript:

1 CS 6293 Advanced Topics: Current Bioinformatics
Next-generation sequencing - technology

2 Outline First generation sequencing
Next generation sequencing (current) AKA: Second generation sequencing Massively parallel sequencing Ultra high-throughput sequencing Future generation sequencing Analysis challenges

3 Sanger sequencing (1st generation)
DNA is fragmented Cloned to a plasmid vector Cyclic sequencing reaction Separation by electrophoresis Readout with fluorescent tags Jay Shendure & Hanlee Ji, Nature Biotechnology 26, (2008)

4 Cyclic-array methods (next-generation)
DNA is fragmented Adaptors ligated to fragments Several possible protocols yield array of PCR colonies. Enyzmatic extension with fluorescently tagged nucleotides. Cyclic readout by imaging the array. Jay Shendure & Hanlee Ji, Nature Biotechnology 26, (2008)

5 Available next-generation sequencing platforms
Illumina/Solexa ABI SOLiD Roche 454 Polonator HeliScope 5

6 Emulsion PCR Fragments, with adaptors, are PCR amplified within a water drop in oil. One primer is attached to the surface of a bead. Used by 454, Polonator and SOLiD. Rothberg and Leomon Nat Biotechnol. 2008 Shendure and Ji Nat Biotechnol. 2008

7 454 Sequencing Stats: read lengths 200-300 bp
accuracy problem with homopolymers 400,000 reads per run costs $60 per megabase Rothberg and Leomon Nat Biotechnol. 2008

8 Bridge PCR DNA fragments are flanked with adaptors.
A flat surface coated with two types of primers, corresponding to the adaptors. Amplification proceeds in cycles, with one end of each bridge tethered to the surface. Used by illumina/Solexa.

9

10 All 4 labeled nucleotides
First Round All 4 labeled nucleotides Primers Polymerase

11 1. Take image of first cycle
2. Remove fluorophore 3. Remove block on 3’ terminus

12

13 Stats: read lengths up to 36 bp error rates 1-1.5% several million “spots” per lane (8 lanes) cost $2 per megabase

14 Conventional sequencing
Can sequence up to 1,000 bp, and per-base 'raw' accuracies as high as %. In the context of high-throughput shotgun genomic sequencing, Sanger sequencing costs on the order of $0.50 per kilobase. Jay Shendure & Hanlee Ji, Nature Biotechnology 26, (2008) 14

15 Sequence qualities In most cases, the quality is poorest toward the ends, with a region of high quality in the middle Uses of sequence qualities ‘Trimming’ of reads Removal of low quality ends Consensus calling in sequence assembly Confidence metric for variant discovery In general, newer approaches produce larger amounts of sequences that are shorter and of lower per-base quality Next-generation sequencing has error rate around 1% or higher

16 Phred Quality Score p=error probability for the base
if p=0.01 (1% chance of error), then q=20 p = , (99.999% accuracy), q = 50 Phred quality values are rounded to the nearest integer

17 Main Illumina noise factors
Schematic representation of main Illumina noise factors. (a–d) A DNA cluster comprises identical DNA templates (colored boxes) that are attached to the flow cell. Nascent strands (black boxes) and DNA polymerase (black ovals) are depicted. (a) In the ideal situation, after several cycles the signal (green arrows) is strong, coherent and corresponds to the interrogated position. (b) Phasing noise introduces lagging (blue arrows) and leading (red arrow) nascent strands, which transmit a mixture of signals. (c) Fading is attributed to loss of material that reduces the signal intensity (c). (d) Changes in the fluorophore cross-talk cause misinterpretation of the received signal (blue arrows; d). For simplicity, the noise factors are presented separately from each other. Erlich et al. Nature Methods 5: (2008) 17

18 Comparison of existing methods
Jay Shendure & Hanlee Ji, Nature Biotechnology 26, (2008)

19 Read length and pairing
ACTTAAGGCTGACTAGC TCGTACCGATATGCTG Short reads are problematic, because short sequences do not map uniquely to the genome. Solution #1: Get longer reads. Solution #2: Get paired reads.

20 Third generation Single-molecule sequencing Longer reads
no DNA amplification is involved Helicos HeliScope Pacific Biosciences SMRT Longer reads Roche/454 > 400bp Illumina/Solexa > 100bp Pacific Bioscience > 1000 bp and single molecule

21 Applications of next-generation sequencing
Jay Shendure & Hanlee Ji, Nature Biotechnology 26, (2008) 21

22 Analysis tasks Base calling Mapping to a reference genome
De novo or assisted genome assembly

23 References Next-generation DNA sequencing, Shendure and Ji, Nat Biotechnol Next-Generation DNA Sequencing Methods, Elaine R. Mardis, Annu. Rev. Genomics Hum. Genet. (2008) 9:387–402


Download ppt "CS 6293 Advanced Topics: Current Bioinformatics"

Similar presentations


Ads by Google