High Throughput Sequencing Technologies

High Throughput Sequencing Technologies
Shrikant Mane Director, YCGA Co-Director, Keck Laboratory

Outline First-generation sequencing technology
Sanger sequencing Current massively parallel sequencing strategies “Second Generation” 454 Illumina Ion Torrent & Ion Proton “Third Generation” Pacific Biosciences Oxford Nanopore YCGA

Evolution of genomic technologies
Genetic mapping studies: Discovery of genes for well characterized Mendelian diseases. Dense SNP genotyping using microarray technology: GWAS for discovery of common variants in common disease High throughput sequencing: Discovery of rare variants in not previously recognized Mendelian diseases. The genomic technologies continues to change at a rapid pace, especially in the past 20 years. These technological advancement can be grouped in the three eras.. The development of complete genetic maps of the human genome in the 1980’s fueled the mapping of Mendelian loci in extended kindreds for dominant traits and predominantly in consanguineous kindreds for recessive traits. Further accelerated by the acquisition of the sequence of the human genome in 2001, this first Mendelian era identified over 2800 disease loci and profoundly changed our understanding of the biology and pathophysiology of every organ system. Labor intensive and slow process. A second era, was defined by the development of microarray technology and identification more that 10 million common variants in human genome. The microarrays were developed to genotype 500K to 5 Million SNPs in order to identify common variants associated with human disorders This era led to the identification of more thant 1000 loci that shows robust association with human disease that have changed the understanding of disease biology.. We have recently entered a third era of discovery, this one driven by spectacular reductions in the cost of DNA sequencing from ~$100,000 per million bases in 1998 to ~$0.10 today on the HiSeq instrument. Coupled with our development of robust methods for selectively sequencing complete coding regions of the genome, which harbor the overwhelming majority of Mendelian loci, and analytic methods to rapidly and with high sensitivity and specificity identify variations from the reference sequence, one can now sequence ostensibly all the genes in the human genome (the exome) to high levels of completion for ~$1000 (direct cost). This has provided fundamental new opportunities for identifying Mendelian loci that were previously elusive. Restriction length polymorphism, breeding experiment, linkage studies using satelite markers etc

Why High-Throughput DNA Sequencing Number of PubMed Articles
DNA sequencing can provide a deeper understanding about DNA/RNA than any other technology Microarray Technology revolutionized biomedical research, but has several limitations, which DNA sequencing may overcome As the cost of sequencing is rapidly decreasing, it is becoming affordable to perform sequencing at a genome level Why do we need highthroughput DNA sequencing Center at Yale? There is no doubt that microarray technology has revolutionalized the biomedical research but has several limitations such as indirect observation based on hybridization signals which can non specific due to cross hybridization and also is not sensitive enough to identify low levels of chages. Also microarrays can provide the information about what is representated on the chips . DNA sequencing may be able to over come some of these limitations to provide Number of PubMed Articles In recent years there has been an explosion of research articles using next generation sequencing technologies

Applications of High-throughput DNA Sequence Analyses
DNA Sequencing Applications Re-Sequencing Mutation/SNP discovery and profiling Interactome DNA Protein Interactions ChIP Seq Transcriptome Analysis Alternative splicing and allele specific expression microRNA Expression and Discovery Clinical diagnosis Epigenomics DNA Methylation De Novo Sequencing Population Metagenomics Copy Number Variation

First Generation: Sanger sequencing
( ) 1980 Nobel Prize in chemistry phi X 174 ~5300 bp gels read by hand radiolabeled dideoxyNTPs one lane per nucleotide 800 bp reads low throughput (several kb/gel)

Massively parallel sequencing of millions of template
Second-generation sequencing: Massively parallel sequencing of millions of template Illumina Ion Torrent-Proton

Second Generation: Massively Parallel Sequencing.
Throughput (24 hours): Mb (Sanger) 60,000- 1,200,000 Mb (HiSeq X Ten) Cost: $1500/Mb (Sanger) $0.04 /Mb (HiSeq 2500 and X) Read Lengths: ~800 bp (Sanger) ~ 100 – 600 (HiSeq) Error rates: < 0.5 % (Sanger) ~ 0.8 % (HiSeq)

Illumina next-generation sequencing platforms
MiSeq HiSeq 2000 HiSeq 2500 NextSeq 500 HiSeq X Ten

Comparison of MiSeq, NextSeq, HiSeq 2500 and HiSeq X Ten Sequencing
MiSeq NextSeq HiSeq 2500 HiSeq X Ten Focused power Flexible Power Production Power Population-scale whole human genome sequencing at $1000/genome Mid Output High Output Rapid Run Output/run Gb 3 to 15 20-40 30-120 20-360 100-2,000 3,200-3,600 Reads/run 25 M 130M 400M 600M 4,000M 6,000 M Run times 5-65 hrs. 15-26 hrs. 12-30 hrs. 7-40 hrs. 1 -6 days 3 days Gb/Day 6 37 96 215 330 1,200 Flow cells 1 1 or 2 CapEx $100,000 $250,000 $740,000 $10 Million (sold only in a pack of 10) HiSeq X Ten: 10 instruments most cost effective when operated at full capacity of 18,000 WGS/year

Overall Illumina Sequencing Workflow
Sample Preparation Sequencing Library Preparation Adapter1 Adapter2 Sequencing Primer Insert Cluster Generation Hybridizing Library to Flow Cell Creating clusters from individual molecules Introductory workflow--- good to start with the basics and go from here Explain that these 3 steps are 3 separate kits that one purchases. They can work with their salesperson to determine which kits and in what amounts they want to purchase. Emphasize that for any of our products (genomic, expression, chip, etc) that you follow these 3 basic steps: Sample Prep (library prep); Cluster Generation on a Flowcell, and Sequencing on the Genome Analzyer. For Sample Prep--- the processes used in the kits end up with a construct illustrated for all sequencing types--- 2 different adaptors, a sequencing primer, and an insert. If the group will do paired end, can mention it’ll be slightly different adapters, and different sequencing primers on both ends of the insert (will be confusing for a new group--- can come back to this slide later if someone asks). Cluster generation-- Show them the flowcell picture--- 8 lanes for 8 different samples. Library hybridizes to flowcell with individual molecules forming clusters that will be sequenced. The different molecules of the library are physically separated from one another so the sequence of each one can be determined. Sequencing by Synthesis--- describe the general process with the reversible terminators. Can introduce the concept that the GA has a “chemistry cycle” where you are removing the last block and then adding the next particular base, then an imaging cycle. Sequencing by Synthesis Add all 4 bases with Reversible Terminators Image 4 colors Remove Terminator, repeat

Genomic Sample Prep Workflow
Purified genomic DNA 1. Genomic DNA fragmentation Fragments of less than 800 bp 2. End-repair Blunt ended fragments with 5’-Phosphorylated ends 3. Klenow exo- with dATP 3’-dA overhang 4. Adapter ligation Adapter modified ends 5. Gel purification/bead Removal of unligated adapter 6. PCR Genomic DNA Library We’re using Genomic Sample Prep Workflow as an example of the basic sample prep protocol, each being different. All sample prep methods come with their own protocol which follow standard molecular biology cloning techniques. Adapter1 Adapter2 Sequencing Primer Insert

What is a Flow Cell? Ordered Flow Cells: highest number of clusters
A flow cell is a thick glass slide with 8 channels or lanes P5 oligo P7 Oligo Each lane is randomly coated with a lawn of oligos that are complementary to library adapters Adapter1 Adapter2 Insert Sequencing Primer Ordered Flow Cells: highest number of clusters

Cluster Generation: Template hybridization and Initial Extension
Original template is washed away Template hybridization Initial extension Denaturation 3' extension OH OH P P5 Grafted flowcell Initials steps for the PE chemistry are the same as the Single Read chemistry. single molecules bound to flow cell in a random pattern >200 million single molecules hybridize to the lawn of primers

Cluster Generation: Amplification
Result: two copies of covalently bound single-stranded templates Single-strand flips over to hybridize to adjacent oligos to form a bridge Hybridized primer is extended by polymerases Double-stranded bridge is denatured 2nd cycle denaturation 1st cycle extension 1st cycle annealing 1st cycle denaturation 2nd cycle annealing n=35 total 2nd cycle extension Amplification steps are also the same except that 28 cycles is recommended for any samples where the insert is greater than 200 bp. More cycles for samples with insets greater then 200 bp will cause the clusters to get too large after P5 resynthesis (see slide 6)

Cluster Generation: Linearization, Blocking and sequencing
Cluster Generation: Linearization, Blocking and sequencing primer hybridization dsDNA bridges are denatured complement strands are cleaved and washed away sequencing primer P5 Linearization Block with ddNTPS Denaturation and Sequencing Primer Hybridization Cluster Amplification The first linearization step uses the Linearization 1 Enzyme instead of Periodate. The blocking step still uses ddNTPs but uses Blocking Enzyme 1 and 2 in the PE protocol instead of terminal transferase for the Single Read protocol. Read 1 primer hybridization uses Read 1 PE Sequencing Primer. Enzymatic cleavage, uracyl incorporation enzyme Free 3’ ends are blocked to prevent unwanted DNA priming

Sequencing Resynthesis of P5 Strand (15Cycles) Sequencing First Read
OH Sequencing First Read Denaturation and De-Protection OH Denaturation and Hybridization P7 Linearization OH Sequencing Second Read Denaturation and Hybridization Block with ddNTPs The steps up to and including the first read sequencing are pretty much the same as for a single read. The first read sequencing is where the single read protocol would stop. For the PE protocol, it continues with deprotecting the P5 primer using deprotection enzyme. Resynthesis of the P5 strand occurs over 15 cycles. P7 linearization uses Linearization 2 Enzyme. Blocking again occurs with ddNTPS and Blocking enzyme 1 and 2. Sequencing read 2 uses Read 2 PE Sequencing Primer. 5” to 3”

Reversible Terminator Seq Chemistry
All 4 labeled nucleotides in 1 reaction (green, orange, red and blue) Advantages of reversible terminators: Only one base is added at a time Fluor can be cleaved off after the imaging. Thus, it does not emit color at the next cycle allowing only newly added base (with attached fluor) to emit the light Next cycle Incorporation Detection Deblock; fluor removal O DNA HN N 3’ 5’ free 3’ end X OH O PPP HN N cleavage site fluor 3’ block

Sequencing By Synthesis (SBS)
5’ 3’ 5’ Cycle 1: Add sequencing reagents First base incorporated Remove unincorporated bases G T C A Detect signal/Imaging T G Cleave off fluor and Deblock C A G T Cycle 2-n: Add sequencing reagents & repeat All four labeled nucleotides in one reaction High accuracy Base-by-base sequencing No problems with homopolymer repeats

Representation of Base Calling From Raw Data
T G C T A C G A T … 1 2 3 7 8 9 4 5 6 T T T T T T T G T … Colorized marketing slide to represent what is going on here. Point out that each photo here represents 4 images that you would usually get, one for each base. The software determines the “winner” for intensity and calls the base. The identity of each base of a cluster is read off from sequential images

Primary and Secondary Analysis Overview
Analysis Type Software Outputs Images/TIFF files Sequencing ICS/RTA Base Calling Intensities Primary Analysis ICS/RTA Consensus Assessment of Sequence And VAriation (CASAVA) : Bulldog N Alignments and Variant Detection Secondary Analysis

4 Ion Protons: coming soon
Ion Torrent PGM and Proton Ion PGM™ Sequencer 4 Ion Protons: coming soon First PostLight sequencing technology: Instead of using light as an intermediary, PGM creates a direct connection between the chemical and the digital worlds.

Uses semiconductor chips for sequencing.
The Chip is the Machine Uses semiconductor chips for sequencing. Ion 314 Chip v.2 Ion 316 Chip v.2 Ion 318 Chip v.2 Wells 1.3 million 6.3 million 11 million Output 200 base 400 base 30-50Mb 60-100Mb Mb 600 Mb-1Gb 600Mb-1Gb 1.2-2 Gb Ion PI chip: >165 million wells per chip: 8 to 10 Gb data per run Ion PII chips: ~100 Gb of data in ~4 hours

Base Calling When a nucleotide is incorporated into a strand of DNA, a Hydrogen ion is released as a by product. The H ion carries a charge which the PGM’s ion sensor can detect as a base.

Advantages and Current Limitations
Low equipment cost Rapid run times: 3 to 4 hours Simple Chemistry Limitations Homopolymers detection Error rates Slow on introducing newer chips: Overpromise PGM and Proton: two separate sequencing equipment Tedious Library prep protocols

Third generation sequencing
PacBio RS

The Third Generation Sequencing Platform: PacBio RS
Pacific Biosciences has developed Single Molecule Real Time (SMRT™) DNA sequencing technology: PacBio RS. This technology enables, for the first time, the observation of natural DNA synthesis by a DNA polymerase as it occurs. This technology delivers long reads at single molecule level and fast time to result, enabling a new paradigm in genomic analysis. Most people here are familiar with the Sanger sequencing which is the so callled first generation sequencing; and second generation sequencing technology such as illumina hiseq system. It starts with library prep with PCR amplification and cluster building. After sequencing, it generate tens of millions of short reads. Today I am going to introduce you the third generation sequencing platform pacbio RS, developed by Pacific Biosciences that can do single molecule real time sequencing. the technology is called SMRT for single molecule real time. This technology enables, for the first time, the observation of natural DNA synthesis by a DNA polymerase as it occurs. The major advantages of this new sequencing technology is that it can delivers long reads at single molecule level and fast time to results.

Pacific Biosciences SMRT® Technology

Key Applications for PacBio RS
Targeted sequencing SNP and structure variants detection Repetitive region Full length transcript profiling De novo assembly and genome finishing Bacteria genome Fungal genome Gap-captured sequencing Targeted captured sequencing Base modifications detection Methylations DNA damages First I want to show you a paper published in nature last week using pacbio sequencing to identifiy mutations in a kinase FLT3, which is associated with AML % of the aml patient would have this ITD mutation. This is an activating mutation and there are drugs can effectively inhibits the kinase. But the problem is that the drug develops resistance after certain time. And the drug resistance is likely caused by few mutations in the kinase domain. So to find out whether the ITD mutation and the drug resistance mutation are really the disease causing mutations, they have to determine whether any resistant mutations found were from the same strand as the FLT3-ITD. **Projects at YCGA YCGA PacBio RS

Comparisons Between PacBio RS and Illumina HiSeq
PacBio RS (Third generation) Illumina HiSeq (Second generation) Sequencing Chemistry Sequencing by synthesis (SBS) Single Molecule Real Time (SMRT) Sequencing substrate Smart Cell made up of 150,000 ZMWs Flow cell has made of 8 separate lanes Data output per day 1 to 2 billion/ day. 60-1,200 billion/day Cost/Mb $1.5/Mb $.04 per Mb Read Length Average up to 5 Kb 50bp to 150bp Error rates Raw: %. With 30x coverage: Q50 (< 0.01) 0.5 to 1 % Sample Library SMRT Bell template (Single-strand circular DNA) 250 bp to 10 Kb insert dsDNA with adaptors (175 bp to 1 Kb) As shown in this table. Bothe technologies are using the sequencing by synthesis chemistry. The difference for pacbio is that it performs the sequencing at the single molecule level and in real time. the sequencing comsumables are also different. In hiseq, the sequencing is carried on the flowcell, that has 8 lanes, each lane has millions of DNA clusters. For pacbio, the sequencing is carried on a SMRT cell, which is comprised of 150k microscopic holes called ZMWs which stands for Zero Mode Wavelenghth. Each Zmw is a can hold one dna molecule with primer and polymerase. For the base calling, illumina is using the images taken during the sequencing run. And pacbio is using the movies that is collected in real time while the dna synthesis is happening.

Upcoming Technologies

Oxford Nanopore Technology
DNA, RNA and Protein analysis Exonuclease Cyclodextrin Electrically resistant Lipid bilayer

Performance/Limitations…..?
Advantages Nanopores offer a label-free, electrical, single-molecule DNA sequencing method No costly fluorescent labeling reagents No need for expensive optical hardware and sophisticated instrumentation to detect DNA, RNA and Protein. Performance/Limitations…..? First data was released in Feb No updates since then No data available for the evaluation: High Error Rates - >5% Will start early access program in the next few months

The YCGA Laboratory at West Campus

Located in a newly renovated building.
YCGA was established in January 2009 through generous funding support and the strong commitment from the Yale University and School of Medicine Portion of the laboratory showing sequencing systems through the glass wall partition that separates laboratory from the rest of office and administrative area. Located in a newly renovated building. Approximately 7,000 Sq Ft laboratory and ~4,000 Sq Ft office space 20 staff

Sequencing Platforms at YCGA
10 Illumina HiSeqs One MiSeq One PacBio RS Ion PGM™ Sequencer Will acquire new Illumina sequencers introduced just few weeks ago YCGA is well equipped with cutting edge technologies . Since the technology keeps improving at a very fast pace, it has been a challenge to keep up with it. New technologies are expensive and some times we have to change the platform before we have recovered the investments. Despite numerous challenges YCGA has been very successful in keeping up with the change while maintaining data production and balancing operating budget.. YCGA has kept pace with cutting-edge sequencing technologies

Computer Infrastructure
BulldogN: provides ~1300 cores and 2.2 PB of high performance storage. Dave Frioni and their team from Yale ITS. Robert Bjornson, Ph.D. IT director for YCGA Nicholas Carriero, Ph.D.

Increasing Demand for Sequencing at YCGA
Increase in the number of Principle Investigators using YCGA over the past 4 years Trends of sequence data output at YCGA (average of 6 months)

Types of samples processed and runs of sequence read lengths carried out at YCGA in a typical month

Whole-Genome VS. Whole-Exome Sequencing
Protein coding genes (exome) constitute 1% of the human genome but harbor 85 % of disease causing mutations Significantly cheaper than sequencing entire genome Data storage challenges Validation challenges Maq, >25,000 exomes analyzed for several disorders including Cardiovascular, abnormal brain development, autism, liver, kidney, hypertension, skin and various tumors. 2.1M probes cover ~300,000 exons of 19,000 genes Total covered bases: 44.1Mb

Need for strong R&D efforts for Next-Generation sequencing operation
Optimization of sample preparation protocols for exome capture that have decreased the cost of a single human exome from $8,000 in 2009 to the current price of ~$500, while improving the quality of the data. Development of a highly efficient protocol to extract and repair DNA from formalin-fixed paraffin embedded blocks for genetic analysis. Improved protocols for gDNA-seq, RNA-seq, and ChIP-seq that show higher data complexity than traditional protocols, allow users to start with less material, and cost less. This point is extremely significant because >90% of our sequencing is human exomes. The improvements we have made have increased our data quality, decreased our costs, and allowed us to dramatically increase our throughput. There are likely billions of formalin-fixed paraffin embedded (FFPE) samples around the world. The fixation/storage process destroys the DNA and it was thought these samples would be unusable for genetic analysis. Our protocol allows us to use these samples for exome analysis and makes many new and interesting experiments possible that would otherwise be impossible to perform. By spending the time and money to improve all of our protocols – not just human exomes – we are able to offer the Yale community a variety of sample preparation options that produce the most complex data possible at some of the lowest costs in the country.

Scientific and economic impact of high throughput sequencing at Yale

List of select publications resulting form the next-generation sequencing usage at YCGA
Whole-exome sequencing identifies recessive WDR62 mutations in severe brain malformations. Bilguvar and Gunel Nature, v467, 2010 A Novel miRNA Processing Pathway Independent of Dicer Requires Argonaute2 Catalytic Activity. Cifuentes and Giraldez Science, v328, 2010 Mitotic recombination in patients with ichthyosis causes reversion of dominant mutations in KRT10. Choate and Lifton. Science, v330, 2010 Transcriptomic analysis of avian digits reveals conserved and derived digit identities in birds. Wang and Wagner Nature, v477, 2011 Transposom-mediated rewiring of gene regulatory networks contributed to the evolution of pregnancy in mammals. Lynch and Wagner Nature, Genet. v43, 2011 K+ channel mutations in adrenal aldosterone-producing adenomas and hereditary hypertension. Choi and Lifton Science, v331, 2011 Recessive LAMC3 mutations cause malformations of occipital cortical development. Barak and Gunel. Nat Genet., V43, 2011 Spatio-temporal transcriptome of the human brain. Kang and Sestan Nature, v478, 2011 Langerhans cells facilitate epithelial DNA damage and squamous cell carcinoma. Modi and Girardi Science, v335, 2012 Mutations in kelch-like 3 and cullin 3 causes hypertension and electrolyte abnormalities. Boyden and Lifton Nature, v482, 2012 De novo point mutations, revealed by whole-exome sequencing, are strongly associated with Autism Spectrum Disorders. Sanders and State Nature, v485, 2012 Exome sequencing identifies recurrent somatic RAC1 mutations in melanoma. Krauthammer and Halaban Nat Genet., V44, 2012 Genomic Analysis of Non-NF2 Meningiomas Reveals Mutations in TRAF7, KLF4, AKT1, and SMO. Clark and Gunel Science, v339, 2013 De novo mutations in histone-modifying genes in congenital heart disease. Zaidi and Lifton Nature, v498, 2013 Recessive mutations in DGKE cause atypical hemolytic-uremic syndrome. Lemaire and Lifton Nat Genet., V45, 2013 Somatic and germline CACNA1D calcium channel mutations in aldosterone-producing adenomas and primary aldosteronism. Scholl and Lifton The evolution of lineage-specific regulatory activities in the human embryonic limb. Cotney and Noonan Cell, v154, 2013 Mutations in DSTYK and dominant urinary tract malformations. Sanna-Cherchi and Gharavi N Eng J Med., V369, 2013 Nanog, Pou5f1 and SoxB1 activate zygotic gene expression during the maternal-to-zygotic transition. Lee and Giraldez Nature, 2013 Co-expression networks implicate human mid-fetal deep cortical projection neurons in the pathogenesis of autism. Willsey and State Cell, 2013 (In press)

Impact of High Throughput Sequencing: Partial Grant Funding
Mendelian center grant, NIH $12M (3y) Gilead cancer grant $40M (4y) Brain tumor gift $12M (4y) ARRA brain development (NIH) $ 3M (2y) ARRA kidney disease (NIH) $ 2M (2y) Simons autism sequencing $ 4M (3y) Brain transcriptome (NIH) $10M (2y) Congenital heart disease (NIH) $ 3M (4y) Melanoma Spore $12M (5y) Fidelity (Computer storage) $ 0.55M VA- Schizophrenia/Bipolar disorder $12.3 M Yale Comprehensive Cancer Center $14.0 M Total $ M

The Centers for Mendelian Genomics
Supported by NHGRI and NHLBI in Dec 2011

CMGs: Goals Discover the genes and variants responsible for as many Mendelian phenotypes as possible Develop and disseminate improved methods for disease gene discovery and analysis Create public resources to enhance research and discovery activities Educate colleagues and public regarding Mendelian disease Whole-Exome/whole-genome analysis is carried out at no cost and on a collaborative basis. Investigators with interesting patients or cohorts can contact us

Opportunities: DNA Sequencing and Personalized Medicine
Use of genomics, the science of looking at all of the information in the human genome, to tailor medical care to individuals based on their genetic makeup. Earlier interventions Improved diagnosis More effective drug development Better medical outcome DNA sequencing has a very bright future and will change the current way of medical practive.

# 1 invention of the 2008 year by time magazine

CLIA: The New Paradigm in Molecular Diagnostics
Conventional molecular testing- gene by gene Genomic testing using Exome analysis YCGA is carrying out clinical diagnostic work in collaboration with Dr. Allen Bale Over 500 exomes are analyzed for various disorders

Major Challenges Cost associated with being cutting-edge
A) Equipment: Rapid introduction of new Sequencing technologies: Investment challenge Challenges associated with new technologies: Upgrade and Breakdown B) Reagents: Constant change/introduction of new more reliable reagents (v3) C) Software upgrades/computer infrastructure challenges/analysis. E) Constant upgrade of protocols: Sequence capture: 6 versions and continues to be updated RNA-Seq: True seq Cheaper than older version

Sequencing a genome is simple finding a cause of a disease is not

Despite challenges, tremendous progress has been made at a rapid pace.
NGS will continue to make a huge impact in biology, bio-medicine and human health.

Thank You! Jim Noonan Yale University and Medical School and West Campus administration ITS, HPC and Bioinformatics staff YCGA staff Collaborating Yale Investigators

Questions?

Nanopore: Protein Analysis

High Throughput Sequencing Technologies

Similar presentations

Presentation on theme: "High Throughput Sequencing Technologies"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

High Throughput Sequencing Technologies

Similar presentations

Presentation on theme: "High Throughput Sequencing Technologies"— Presentation transcript:

Similar presentations

About project

Feedback