Presentation is loading. Please wait.

Presentation is loading. Please wait.

High-Throughput Sequencing Technologies

Similar presentations


Presentation on theme: "High-Throughput Sequencing Technologies"— Presentation transcript:

1 High-Throughput Sequencing Technologies
Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers

2 High-Throughput Sequencing Technologies
Genomic Data Analysis Course Moscow July 2013 Mark Reimers, Ph.D

3 Outline What can we do with next-generation sequencing?
De novo sequencing of simple genomes Re-sequence individual variations Generate genome-wide quantitative data for a variety of assays What technologies are now available and which are up-and-coming? Roche, Illumina, SOLiD, Ion Torrent, etc…

4 What is High-Throughput Sequencing?
Generating many thousands or millions of short (30 to 1,000 base) sequences by sequencing parts of longer (200+ base) DNA fragments Most research uses reads from one end of a fragment (single-end), but most technologies can be adapted to make paired-end reads on opposite strands __could add picture of fragment and read

5 Full Genome Re-sequencing has been done for many cancers and rare clinical disorders
Many times mutations have been found in regulatory regions

6 Exome sequencing is a cost-effective to identify de novo protein coding mutations
__ show how exome libraries are prepared exome libraries are prepared by capture on an array of synthetic oligos

7 Targeted re-sequencing of a few relevant genes can identify diverse critical mutations across a large number of cases

8 RNA-seq

9 ChIP-seq

10 DNA methylation profiling
mC  C C  U After PCR C  C U  T PCR+Seq

11 DNAse Hyper-sensitivity
DNAse I enzyme cuts DNA Much more likely to cut at open chromatin Two approaches: Cut slowly then fragment and sequence ends Cut rapidly then sequence short fragments

12 Mapping of chromatin interactions (5C)
(courtesy Elemento lab)

13 HTS Technologies Roche-454 (will close 2016) Illumina SOLiD
Ion Torrent Newer Technologies Outlook

14 Founded by Jonathan Rothberg as a secret project (code-named ‘454’) within CuraGen

15 Roche 454 Sequencing Metzker, NG 2010

16 Roche 454 Sequencing

17 Roche 454 Peak Heights Data

18 Advantages & Drawbacks
PRO Long reads are uniquely identifiable Relatively quick ~20 hours total CON Cost is relatively high Frequent errors in runs of bases Frequent G-A transitions

19 Best Uses of Roche 454 De novo small genome (prokaryote or small eukaryote genome) sequencing Metagenomics by16S profiling Used to be best for metagenomics by random sequencing new long reads from Illumina are competitive Targeted re-sequencing of small samples

20

21 Illumina (Solexa) Genome Analyzer and Flow Cell

22 Illumina On-Chip Amplification

23 Illumina (Solexa) Sequencing

24 Paired-End Illumina Method
Paired-end reads are easy on Illumina because the clusters are generated by ligated linkers. Different linkers and primers are attached to each end

25 Advantages & Drawbacks
PRO Very high throughput Most widespread technology so that comparisons seem easier CON Sequencing representation biases, especially at beginning Slow – up to a week for a run

26 Best Uses of Illumina Expression analysis (RNA-Seq)
Chromatin Immunoprecipitation (ChIP-Seq) Metagenomics by random sequencing

27 SOLiD Sequencing by Oligonucleotide Ligation and Detection

28 SOLiD History George Church licensed his ‘polony’ technique to Agencourt Personal Genomics ABI acquired the SOLiD technology from Agencourt in 2006

29 SOLiD Preparation Steps
Prepare either single or ‘mate-pair’ library from DNA fragments Attach library molecules to beads; amplify library by emulsion PCR Modify 3’ ends of clones; attach beads to surface

30 Emulsion PCR Emulsion PCR isolates individual DNA molecules along with primer-coated beads in aqueous droplets within an oil phase. A polymerase chain reaction (PCR) then coats each bead with clonal copies of the DNA molecule. The bead is immobilized for sequencing.

31

32 ABI SOLiD Sequencing Cycle

33 SOLiD Reads Each Base Twice
Most bases are matched by two primers in different ligation cycles

34 SOLiD Color Coding Scheme
Blue is color of homopolymer runs Mapping into color space If you translate color reads directly into base reads then every sequence with an error in the color calls will result in a frame-shift of the base calls. it is best to convert the reference sequence into color-space. There is one unambiguous conversion of a base reference sequence into color-space, but there are four possible conversions of a color string into base strings

35 Advantages & Drawbacks
PRO Very high throughput Di-base ligation ensures built-in accuracy check Low error rate for low-coverage Can handle repetitive regions easily CON Strong cycle-dependent biases (can be modeled and partly overcome – see Wu et al, Nature Methods, 2011) Low quality color calls (Phred < 20) are common Reported problems with paired ends – most mapped tags don’t map to the same chromosome

36

37 Ion Torrent Sample Prep
Emulsion PCR loads copies of unique sequences onto beads One bead is deposited in each well of a micro-machined plate

38 An Ion Torrent Chip From Ion Torrent promotional material

39 When a nucleotide is incorporated into a strand of DNA by a polymerase, a hydrogen ion is released
From Ion Torrent promotional material

40 Ion Torrent Sequencing Process
From Ion Torrent promotional material As in 454, nucleotides are washed over the nascent strand in a prescribed sequence. Each time a nucleotide is incorporated, hydrogen ions are released and detected.

41 Newest Machine – Ion Proton
$150K per machine Ion Proton I chip has 165 million sensors Intended for exomes Ion Proton II chip has 660 million sensors 50X more than 318 chip Claim $1K genome this year

42 Ion Torrent Signals Like 454, a series of pH signals over time as different nucleotides are added From promotional literature

43 Ion Torrent Signals Like 454, the reads don’t always make integer multiples, and some guessing is needed

44 Ion Torrent Advantages & Drawbacks
PRO Very high throughput potential Very fast (an afternoon) CON Homopolymer run errors are still a problem, but less so recently Very uneven loading of sequences wastes a lot of real estate on the chips No prospect of paired-end reads Homopolymer error rates Loading Density

45 Newer Technologies Complete Genomics Pacific Biosciences
Oxford Nanopore

46 Complete Genomics Service company only – no equipment sales
~$4,000 per human genome (2011 price) DNA Nanoball technology generates paired-end sequences plated at high density Sequenced by ligation

47 Pacific Biosciences Single-molecule real-time (SMRT) sequencing by circular strand technology using semiconductor technology Long reads promised at under $200 per genome High random error rates reported early Seems better now

48 Signals from Pac Bio Can Detect mC
From Agarwal et al, Nature Methods

49 Oxford Nanopore Single-molecule sequencing by threading DNA through a protein nanopore GridION is a general technology for sequencing polymers by measuring current – can do polypeptides also Problem … can’t read it fast enough… just this year they seem to have solved it


Download ppt "High-Throughput Sequencing Technologies"

Similar presentations


Ads by Google