High Throughput Sequencing Technologies at YCGA

Slides:



Advertisements
Similar presentations
Next-Generation Sequencing: Methodology and Application
Advertisements

The Past, Present, and Future of DNA Sequencing
Vanderbilt Center for Quantitative Sciences Summer Institute Sequencing Analysis Yan Guo.
Next–generation DNA sequencing technologies – theory & practice
High-Throughput Sequencing Technologies
High Throughput Sequencing Technologies
Next-generation sequencing
The 454 and Ion PGM at the Genomics Core Facility Dr. Deborah Grove, Director for Genetic Analysis Genomics Core Facility Huck Institutes of the Life Sciences.
RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.
RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.
Chem 395 Bioanalytical Chemistry
High Throughput Sequencing
1 DNA Sequencing Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html
CS 6293 Advanced Topics: Current Bioinformatics
Next Generation DNA Sequencing Platforms: Evolving Tools for
Diabetes and Endocrinology Research Center The BCM Microarray Core Facility: Closing the Next Generation Gap Alina Raza 1, Mylinh Hoang 1, Gayan De Silva.
Update on Next-Generation Sequencing
The impact of next-generation sequencing technology of genetics Elaine R. Mardis – 11 February Washington School of Medicine, Genome Sequencing Center.
High-Throughput Sequencing Technologies
Dr Katie Snape Specialist Registrar in Genetics St Georges Hospital
Sequencing Technologies and Applications at JGI
6.3 Advanced Molecular Biological Techniques 1. Polymerase chain reaction (PCR) 2. Restriction fragment length polymorphism (RFLP) 3. DNA sequencing.
High Throughput Sequencing Methods and Concepts
Library Preparation Application dependant, using standard molecular biological techniques. Fragment library oligo kit: (per library)$35 GeneAmp dNTP blend:
DNA Cloning and PCR.
Restriction Nucleases Cut at specific recognition sequence Fragments with same cohesive ends can be joined.
Next-Generation Sequencing: Methodology and Application
Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome.
High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.
Chromatin Immunoprecipitation DNA Sequencing (ChIP-seq)
Molecular Techniques in Microbiology These include 9 techniques (1) Standard polymerase chain reaction Kary Mullis invented the PCR in 1983 (USA)Kary.
Review from last week. The Making of a Plasmid Plasmid: - a small circular piece of extra-chromosomal bacterial DNA, able to replicate - bacteria exchange.
Stratton Nature 45: 719, 2009 Evolution of DNA sequencing technologies to present day DNA SEQUENCING & ASSEMBLY.
How will new sequencing technologies enable the HMP? Elaine Mardis, Ph.D. Associate Professor of Genetics Co-Director, Genome Sequencing Center Washington.
HaloPlexHS Get to Know Your DNA. Every Single Fragment.
Molecular Biology Dr. Chaim Wachtel May 28, 2015.
Polymerase Chain Reaction (PCR)
SEQUENCING – THE BENCHTOPS. Roche 454 Junior Same technology as 454 FLX Read length: 400 bases Paired-end 100,000 reads 12 hours (instrument time) Output.
KEY CONCEPT Biotechnology relies on cutting DNA at specific places.
Human Genomics. Writing in RED indicates the SQA outcomes. Writing in BLACK explains these outcomes in depth.
Chapter 10: Genetic Engineering- A Revolution in Molecular Biology.
Molecular Genetic Technologies Gel Electrophoresis PCR Restriction & ligation Enzymes Recombinant plasmids and transformation DNA microarrays DNA profiling.
1 PCR: identification, amplification, or cloning of DNA through DNA synthesis DNA synthesis, whether PCR or DNA replication in a cell, is carried out by.
Human Genomics Higher Human Biology. Learning Intentions Explain what is meant by human genomics State that bioinformatics can be used to identify DNA.
Green with envy?? Jelly fish “GFP” Transformed vertebrates.
Introduction to Illumina Sequencing
Next-generation sequencing technology
From the double helix to the genome
Research Techniques Made Simple: Next-Generation Sequencing:
Biotechnology.
Next generation sequencing
Cancer Genomics Core Lab
Next Generation Sequencing
Sequencing technologies
DNA Sequencing -sayed Mohammad Amin Nourion -A’Kia Buford
Next-generation sequencing technology
Sequencing Technologies
AMPLIFYING AND ANALYZING DNA.
SOLEXA aka: Sequencing by Synthesis
B3- Olympic High School Bioinformatics
ULTRASEQUENCING. Next Generation Sequencing: methods and applications.
Massively Parallel Sequencing: The Next Big Thing in Genetic Medicine
High-Throughput Sequencing Technologies
High-Throughput Sequencing Technologies
Next Generation Sequencing Market Next Generation Sequencing Market.
BF nd (Next) Generation Sequencing
Genomic DNA Sample Preparation
SBI4U0 Biotechnology.
Next Generation Sequencing Market. Report Description and Highlights According to Renub Research market research report “Next Generation Sequencing (NGS)
Global Next Generation Sequencing (NGS) Market (By Products - Consumables, Platforms, Services, Sequencing Services, Bioinformatics, Technology, Applications, End Users, Regions), Key Company Profiles - Forecast to 2025
Presentation transcript:

High Throughput Sequencing Technologies at YCGA Shrikant Mane Director, YCGA Director, Keck Laboratory

First-generation sequencing technology Outline First-generation sequencing technology Sanger sequencing Current massively parallel sequencing strategies “Second Generation” 454 Illumina Ion Torrent & Ion Proton “Third Generation” Pacific Biosciences Oxford Nanopore YCGA

Goals of biomedical investigation Understand normal, healthy and disease biology Enable prevention and early diagnosis of disease Enable new effective treatments Utility of Next Generation Sequencing/genetics in medicine Unbiased approach to identify new pathways underlying basic physiology, health and disease

Evolution of genomic technologies Genetic mapping studies: Discovery of genes for well characterized Mendelian diseases. Dense SNP genotyping using microarray technology: GWAS for discovery of common variants in common disease. High throughput sequencing: Discovery of rare variants in not previously recognized Mendelian diseases and common diseases. Constant and rapid changes in genomic technologies have driven successive eras of discovery of loci underlying human traits. The development of complete genetic maps of the human genome in the 1980’s fueled the mapping of Mendelian loci in extended kindreds for dominant traits and predominantly in consanguineous kindreds for recessive traits. Further accelerated by the acquisition of the sequence of the human genome in 2001, this first Mendelian era identified over 2800 disease loci and profoundly changed our understanding of the biology and pathophysiology of every organ system. Labor intensive and slow process. A second era, was defined by the development of microarray technology and identification more that 10 million common variants in human genome. The microarrays were developed to genotype 500K to 5 Million SNPs in order to identify common variants associated with human disorders This era led to the identification of more than 1000 loci that shows robust association with human disease that have changed the understanding of disease biology.. We have recently entered a third era of discovery, this one driven by spectacular reductions in the cost of DNA sequencing from ~$100,000 per million bases in 1998 to ~$0.10 today on the HiSeq instrument. Coupled with our development of robust methods for selectively sequencing complete coding regions of the genome, which harbor the overwhelming majority of Mendelian loci, and analytic methods to rapidly and with high sensitivity and specificity identify variations from the reference sequence, one can now sequence ostensibly all the genes in the human genome (the exome) to high levels of completion for ~$1000 (direct cost). This has provided fundamental new opportunities for identifying Mendelian loci that were previously elusive.

Why High-Throughput DNA Sequencing Number of PubMed Articles DNA sequencing can provide a deeper understanding about DNA/RNA than any other technology Microarray Technology revolutionized biomedical research, but has several limitations, which DNA sequencing may overcome As the cost of sequencing is rapidly decreasing, it is becoming affordable to perform sequencing at a genome level Why do we need highthroughput DNA sequencing Center at Yale? There is no doubt that microarray technology has revolutionalized the biomedical research but has several limitations such as indirect observation based on hybridization signals which can non specific due to cross hybridization and also is not sensitive enough to identify low levels of chages. Also microarrays can provide the information about what is representated on the chips . DNA sequencing may be able to over come some of these limitations to provide Number of PubMed Articles In recent years there has been an explosion of research articles using next generation sequencing technologies

Applications of Next-gen Sequencing DNA Sequencing Applications Re-Sequencing Exome sequencing Mutation/SNP discovery and profiling Interactome DNA Protein Interactions ChIP Seq Transcriptome Analysis Alternative splicing and allele specific expression microRNA Expression and Discovery Diagnostic Services CLIA certification Epigenomics DNA Methylation De Novo Sequencing Population Metagenomics Copy Number Variation

First Generation: Sanger sequencing (1975-1977) 1980 Nobel Prize in chemistry phi X 174 ~5300 bp gels read by hand radiolabeled dideoxyNTPs one lane per nucleotide 800 bp reads low throughput (several kb/gel)

Second-generation sequencing Massively parallel sequencing of millions of template 454/Roche Illumina Ion Torrent-Proton

Second Generation: Massively Parallel Sequencing. Throughput (24 hours): 2.8 Mb (Sanger) 60,000 Mb (HiSeq) Cost: $1500/Mb (Sanger) $0.06 /Mb (HiSeq) Read Lengths: ~800 bp (Sanger) ~ 100 – 600 ( HiSeq- 454) Error rates: < 0.5 % (Sanger) ~ 0.8 -2%% (HiSeq)

Illumina next generation sequencing platform

HiSeq 2500 Sequencing System Fast turnaround and highest output in a single instrument 1 Instrument – 2 Run Modes High Output Mode 600 Gb in ~10.5 days Current v3 flow cell Current v3 reagents cBot required Rapid Run Mode 120Gb in ~1 day New 2-lane flow cell New reagents No cBot required User configurable 6 human genomes in 10.5 days 1 human genome in a day Highest Output Fastest turnaround

New sequencing platforms by Illumina HiSeq X Ten and HiSeq X Five: Production-scale human whole genome sequencing: 18,000 genomes/year at $ 1,500 cost/genome HiSeq 3000/HiSeq 4000: Up to 1.5 Tb/run. Whole genome as well as other applications including exome sequencing

Overall Illumina Sequencing Workflow Sample Preparation Sequencing Library Preparation Adapter1 Adapter2 Sequencing Primer Insert Cluster Generation Hybridizing Library to Flow Cell Creating clusters from individual molecules Introductory workflow--- good to start with the basics and go from here Explain that these 3 steps are 3 separate kits that one purchases. They can work with their salesperson to determine which kits and in what amounts they want to purchase. Emphasize that for any of our products (genomic, expression, chip, etc) that you follow these 3 basic steps: Sample Prep (library prep); Cluster Generation on a Flowcell, and Sequencing on the Genome Analzyer. For Sample Prep--- the processes used in the kits end up with a construct illustrated for all sequencing types--- 2 different adaptors, a sequencing primer, and an insert. If the group will do paired end, can mention it’ll be slightly different adapters, and different sequencing primers on both ends of the insert (will be confusing for a new group--- can come back to this slide later if someone asks). Cluster generation-- Show them the flowcell picture--- 8 lanes for 8 different samples. Library hybridizes to flowcell with individual molecules forming clusters that will be sequenced. The different molecules of the library are physically separated from one another so the sequence of each one can be determined. Sequencing by Synthesis--- describe the general process with the reversible terminators. Can introduce the concept that the GA has a “chemistry cycle” where you are removing the last block and then adding the next particular base, then an imaging cycle. Sequencing by Synthesis Add all 4 bases with Reversible Terminators Image 4 colors Remove Terminator, repeat

Genomic Sample Prep Workflow Purified genomic DNA 1. Genomic DNA fragmentation Fragments of less than 800 bp 2. End-repair Blunt ended fragments with 5’-Phosphorylated ends 3. Klenow exo- with dATP 3’-dA overhang 4. Adapter ligation Adapter modified ends 5. Gel purification/bead Removal of unligated adapter 6. PCR Genomic DNA Library We’re using Genomic Sample Prep Workflow as an example of the basic sample prep protocol, each being different. All sample prep methods come with their own protocol which follow standard molecular biology cloning techniques. Adapter1 Adapter2 Sequencing Primer Insert

What is a Flow Cell? A flow cell is a thick glass slide with 8 channels or lanes Each lane is randomly coated with a lawn of oligos that are complementary to library adapters P5 oligo P7 Oligo Adapter1 Adapter2 Insert Sequencing Primer Index

Reversible Terminator Seq Chemistry All 4 labeled nucleotides in 1 reaction (green, orange, red and blue) Advantages of reversible terminators: Only one base is added at a time Fluor can be cleaved off after the imaging. Thus, it does not emit color at the next cycle allowing only newly added base (with attached fluor) to emit the light Next cycle Incorporation Detection Deblock; fluor removal O DNA HN N 3’ 5’ free 3’ end X OH O PPP HN N cleavage site fluor 3’ block

Illumina sequencing

Sequencing By Synthesis (SBS) 5’ 3’ 5’ Cycle 1: Add sequencing reagents First base incorporated Remove unincorporated bases G T C A Detect signal/Imaging T G Cleave off fluor and Deblock C A G T Cycle 2-n: Add sequencing reagents and repeat All four labeled nucleotides in one reaction High accuracy Base-by-base sequencing No problems with homopolymer repeats HCS:1.8.6

4 Ion Protons: coming soon Ion Torrent PGM and Proton Ion PGM™ Sequencer 4 Ion Protons: coming soon First PostLight sequencing technology: Instead of using light as an intermediary, PGM creates a direct connection between the chemical and the digital worlds.

The Chip is the Machine Uses semiconductor chips for sequencing. Ion PI chip: >165 million wells per chip: 8 to 10 Gb data per run Ion PII chips: ~100 Gb of data in ~4 hours

Base Calling When a nucleotide is incorporated into a strand of DNA, a Hydrogen ion is released as a by product. The H ion carries a charge which the PGM’s ion sensor can detect as a base. Ion Torrent technology video.

Advantages and Current Limitations Low equipment cost Rapid run times: 3 to 4 hours Simple Chemistry Limitations Homopolymers detection Error rates Slow on introducing newer chips: Overpromise PGM and Proton: two separate sequencing equipment Library prep: Emulsion PCR/ New protocols

Third generation sequencing PacBio RS

The Third Generation Sequencing Platform: PacBio RS Pacific Biosciences has developed Single Molecule Real Time (SMRT™) DNA sequencing technology: PacBio RS. This technology enables, for the first time, the observation of natural DNA synthesis by a DNA polymerase as it occurs. This technology delivers long reads at single molecule level and fast time to result, enabling a new paradigm in genomic analysis. Most people here are familiar with the Sanger sequencing which is the so callled first generation sequencing; and second generation sequencing technology such as illumina hiseq system. It starts with library prep with PCR amplification and cluster building. After sequencing, it generate tens of millions of short reads. Today I am going to introduce you the third generation sequencing platform pacbio RS, developed by Pacific Biosciences that can do single molecule real time sequencing. the technology is called SMRT for single molecule real time. This technology enables, for the first time, the observation of natural DNA synthesis by a DNA polymerase as it occurs. The major advantages of this new sequencing technology is that it can delivers long reads at single molecule level and fast time to results.

Pacific Biosciences SMRT® Technology Technology Video

Key Applications for PacBio RS Targeted sequencing SNP and structure variants detection Repetitive region Full length transcript profiling De novo assembly and genome finishing Bacteria genome Fungal genome Gap-captured sequencing Targeted captured sequencing Base modifications detection Methylations DNA damages First I want to show you a paper published in nature last week using pacbio sequencing to identifiy mutations in a kinase FLT3, which is associated with AML. 20-30% of the aml patient would have this ITD mutation. This is an activating mutation and there are drugs can effectively inhibits the kinase. But the problem is that the drug develops resistance after certain time. And the drug resistance is likely caused by few mutations in the kinase domain. So to find out whether the ITD mutation and the drug resistance mutation are really the disease causing mutations, they have to determine whether any resistant mutations found were from the same strand as the FLT3-ITD. **Projects at YCGA YCGA PacBio RS

Comparisons Between PacBio RS and Illumina HiSeq PacBio RS (Third generation) Illumina HiSeq (Second generation) Sequencing Chemistry Sequencing by synthesis (SBS) Single Molecule Real Time (SMRT) Sequencing substrate Smart Cell made up of 150,000 ZMWs Flow cell has made of 8 separate lanes Data output per day 1 to 2 billion/ day. $1.5/ Mb 60 billion/day at a cost of $.06 per Mb Read Length Average up to 5 Kb 50bp to 150bp Error rates Raw: 10-15 %. With 30x coverage: Q50 (< 0.01) 0.5 to 1 % Sample Library SMRT Bell template (Single-strand circular DNA) 250 bp to 10 Kb insert dsDNA with adaptors (175 bp to 1 Kb) As shown in this table. Bothe technologies are using the sequencing by synthesis chemistry. The difference for pacbio is that it performs the sequencing at the single molecule level and in real time. the sequencing comsumables are also different. In hiseq, the sequencing is carried on the flowcell, that has 8 lanes, each lane has millions of DNA clusters. For pacbio, the sequencing is carried on a SMRT cell, which is comprised of 150k microscopic holes called ZMWs which stands for Zero Mode Wavelenghth. Each Zmw is a can hold one dna molecule with primer and polymerase. For the base calling, illumina is using the images taken during the sequencing run. And pacbio is using the movies that is collected in real time while the dna synthesis is happening.

Upcoming Technologies

Oxford Nanopore Technology Exonuclease Protein nanopore (Alpha Hemolysin) Cyclodextrin Electrically resistant Lipid bilayer Silicon nitride or graphene. This diagram shows a protein nanopore set in an electrically resistant membrane bilayer. An ionic current is passed through the nanopore by setting a voltage across this membrane.   If an analyte passes through the pore or near its aperture, this event creates a characteristic disruption in current. Measurement of that current makes it possible to identify the molecule in question. For example, this system can be used to distinguish between the four standard DNA bases G, A, T and C, and also modified bases. It can be used to identify target proteins and small molecules, or to gain rich molecular information, for example to distinguish between the enantiomers of ibuprofen or study molecular binding dynamics. http://www.nanoporetech.com/news/movies#movie-24-nanopore-dna-sequencing

PromethION

Recent advances in nanopore sequencing Two types of nanopores: Protein and synthetic (silicon nitride). Protein nanopores appear to be better in recognizing nucleotides. The rapid speed at which DNA strands pass through the tiny hole makes distinguishing bases more difficult. Currently an enzyme is used to control the rate. By shining low power green laser on synthetic nanopore immersed in salt water it is possible to manipulate DNA speed at will. As the current increases, positive ions drag water molecules in Meller A. et al, Nat Biotech 2013 the opposite direction of incoming DNA, acting as a brake and slowing its passage through the pore. As a result, nanoscale sensors in the pore would be more accurately able to read each nucleotide going into the pore. Using nanopores, long stretches of DNA can be zipped back and forth through the pore and can be read several times Protein nanopoers can also identify epigenetic changes. The rapid speed at which DNA strands pass through the tiny holes makes distinguishing bases more difficult. They showed that shining a certain wavelength of light could slow the flow of DNA through synthetic nanopores, potentially making it easier to read the four bases that make up each molecule. Reporting in the November 2013 issue of  Nature Nanotechnology, Dr. Meller's group found that by shining a low-power green laser on a synthetic nanopore made of a thin layer of silicon nitride, it was possible to increase the electric charge near the walls of the pore, which is immersed in salt water. As the current increases, positive ions drag water molecules in the opposite direction of incoming DNA, acting as a brake and slowing its passage through the pore. As a result, nanoscale sensors in the pore would be more accurately able to read each nucleotide going into the pore.

Performance/Limitations…..? Advantages Nanopores offer a label-free, electrical, single-molecule DNA sequencing method No costly fluorescent labeling reagents No need for expensive optical hardware and sophisticated instrumentation to detect DNA bases Performance/Limitations…..? First data was released in Feb 2012. Since then slow to release new data Very little data available for the evaluation: High Error Rates - >5%

The YCGA Laboratory at West Campus

Located in a newly renovated building. YCGA was established in January 2009 through generous funding support and the strong commitment from the Yale University and School of Medicine Portion of the laboratory showing sequencing systems through the glass wall partition that separates laboratory from the rest of office and administrative area. Located in a newly renovated building. Approximately 7,000 Sq Ft laboratory and ~4,000 Sq Ft office space 23 staff

Sequencing Platforms at YCGA 11 Illumina HiSeqs (2000 and 2500) One MiSeq Ion PGM™ Sequencer One PacBio RS YCGA is well equipped with cutting edge technologies . Since the technology keeps improving at a very fast pace, it has been a challenge to keep up with it. New technologies are expensive and some times we have to change the platform before we have recovered the investments. Despite numerous challenges YCGA has been very successful in keeping up with the change while maintaining data production and balancing operating budget.. YCGA has kept pace with cutting-edge sequencing technologies

Computer Infrastructure BulldogN: Dell Cluster with 200 Nodes/2,500 Cores Hitachi/BlueArc Scalable Storage: ~2.5 Petabytes

Types of samples processed and runs of sequence read lengths carried out at YCGA in a typical month

Need for strong R&D efforts for Next-Generation sequencing operation Optimization of sample preparation protocols for exome capture that have decreased the cost of a single human exome from $8,000 in 2009 to the current price of ~$500, while improving the quality of the data. Development of a highly efficient protocol to extract and repair DNA from formalin-fixed paraffin embedded blocks for exome analysis. Improved protocols for gDNA-seq, RNA-seq, and ChIP-seq that show higher data complexity than traditional protocols, allow users to start with less material, and cost less. Continuous improvements of various analysis pipelines This point is extremely significant because >90% of our sequencing is human exomes. The improvements we have made have increased our data quality, decreased our costs, and allowed us to dramatically increase our throughput. There are likely billions of formalin-fixed paraffin embedded (FFPE) samples around the world. The fixation/storage process destroys the DNA and it was thought these samples would be unusable for genetic analysis. Our protocol allows us to use these samples for exome analysis and makes many new and interesting experiments possible that would otherwise be impossible to perform. By spending the time and money to improve all of our protocols – not just human exomes – we are able to offer the Yale community a variety of sample preparation options that produce the most complex data possible at some of the lowest costs in the country.

Whole- Genome VS. Whole Exome Sequencing Protein coding genes (exome) constitute 1% of the human genome but harbor 85 % of disease causing mutations Significantly cheaper than sequencing entire genome Maq, 2.1M probes cover ~300,000 exons of 19,000 genes Total covered bases: 44.1Mb

Scientific and economic impact of high throughput sequencing at Yale

Spatio-temporal transcriptome of the human brain. Kang and Sestan List of select publications resulting form the next-generation sequencing at YCGA Whole-exome sequencing identifies recessive WDR62 mutations in severe brain malformations. Bilguvar Nature, v467, 2010 A Novel miRNA Processing Pathway Independent of Dicer Requires Argonaute2 Activity. Cifuentes Science, v328, 2010 Mitotic recombination in ichthyosis causes reversion of dominant mutations in KRT10. Choate K Science, v330, 2010 Transcriptomic analysis of avian digits reveals conserved and derived digit identities in birds. Wang s. Nature, v477, 2011 Transposom-mediated rewiring of gene regulatory networks contributed to the evolution of pregnancy in mammals. Lynch and Wagner Nature, Genet. v43, 2011 K+ channel mutations in adrenal aldosterone-producing adenomas and hereditary hypertension. Choi M Science, v331, 2011 Recessive LAMC3 mutations cause malformations of occipital cortical development. Barak and Gunel. Nat Genet., V43, 2011 Spatio-temporal transcriptome of the human brain. Kang and Sestan Nature, v478, 2011 Langerhans cells facilitate epithelial DNA damage and squamous cell carcinoma. Modi and Girardi Science, v335, 2012 Mutations in kelch-like 3 and cullin 3 causes hypertension and electrolyte abnormalities. Boyden et al Nature, v482, 2012 De novo point mutations, revealed by whole-exome sequencing, are strongly associated with Autism Spectrum Disorders. Sanders and State Nature, v485, 2012 Exome sequencing identifies recurrent somatic RAC1 mutations in melanoma. Krauthammer Nat Genet., V44, 2012 Genomic Analysis of Non-NF2 Meningiomas Reveals Mutations in TRAF7, KLF4, AKT1,& SMO. Clark V et al Science, v339, 2013 De novo mutations in histone-modifying genes in congenital heart disease. Zaidi and Lifton Nature, v498, 2013 Recessive mutations in DGKE cause atypical hemolytic-uremic syndrome. Lemaire and Lifton Nat Genet., V45, 2013 Somatic and germline CACNA1D calcium channel mutations in aldosterone-producing adenomas and primary aldosteronism. Scholl and Lifton The evolution of lineage-specific regulatory activities in the human embryonic limb. Cotney and Noonan Cell, v154, 2013 Mutations in DSTYK and dominant urinary tract malformations. Sanna-Cherchi and Gharavi N Eng J Med., 2013 Nanog, and SoxB1 activate zygotic gene expression during the maternal-to-zygotic transition. Lee et al Nature, 2013 Co-expression networks implicate human mid-fetal deep cortical projection neurons in the pathogenesis of autism. Willsey and State Cell, 2013 CLP1 Founder Mutation Links tRNA Splicing and Maturation to Cerebellar Development and Neurodegeneration. Schaffer AE and Gleeson JG. Cell, V157, 2014 Exome sequencing links corticospinal motor neuron disease to common neurodegenerative disorders. Novarino G and Gleeson JG. Science, V363, 2014

Impact of High Throughput Sequencing: Grant Funding (partial list) Mendelian center grant, NIH $12M (3y) Gilead cancer grant $40M (4y) Brain tumor gift $12M (4y) ARRA brain development (NIH) $ 3M (2y) ARRA kidney disease (NIH) $ 2M (2y) Simons autism sequencing $ 4M (3y) Brain transcriptome (NIH) $10M (2y) Congenital heart disease (NIH) $ 5M (4y) Pediatric Cardiac Genomic Consortium $ 2M (2Y) Melanoma Spore (NIH) $12M (5y) Biogen Inc. (PPMS) $ 2 M VA- Schizophrenia/Bipolar disorder $12 M Yale Comprehensive Cancer Center $14 M Total $ 128 M

NGS and Personalized Medicine Use of genomics to tailor medical care to individuals based on their genetic makeup. Which treatment? What are my chances? Which class of cancer? Is it benign? Therapeutic Choice Prognosis Diagnosis Classification How and why Discovery Elucidation of mechanism of cause Identification of cancer biomarkers Therapeutic targets The use of microarrays can tell us a lot about specific disease such as cancer. They can help us to (1) Diagnose the specific cancer in a quick and accurate way (2) Classify the specific subtype of cancer to allow for the best treatment (3) Allow the most accurate prognosis of recovery based on the genetics of the tumor (4) Identify the specific treatment for each type of specific genetic profile. The arrays can be used to study the genetics behind how a certain type of cancer reacts to a specific treatment

CLIA: The New Paradigm in Molecular Diagnostics Conventional molecular testing- gene by gene Genomic testing using Exome analysis YCGA is carrying out clinical diagnostic work in collaboration with Dr. Allen Bale Over 1,000 exomes are analyzed for various disorders

Challenges Equipment, reagents, protocols, analysis What is valid and what is significant? Individual judgment versus consensus guidelines

Sequencing a genome is simple finding a cause of a disease is not First clinical use of whole genome sequencing shows just how challenging it can be. Study of fraternal twins with monogenic disorder Genome was sequenced of fraternal twins diagnosed with a movement disorder Sci Transl Med. 2011 Jun 15;3(87):87re3. doi: 10.1126/scitranslmed.3002243. Whole-genome sequencing for optimized patient management. Bainbridge MN1, Wiszniewski W, Murdock DR, Friedman J, Gonzaga-Jauregui C, Newsham I, Reid JG, Fink JK, Morgan MB, Gingras MC, Muzny DM, Hoang LD, Yousaf S, Lupski JR, Gibbs RA. Abstract Whole-genome sequencing of patient DNA can facilitate diagnosis of a disease, but its potential for guiding treatment has been under-realized. We interrogated the complete genome sequences of a 14-year-old fraternal twin pair diagnosed with dopa (3,4-dihydroxyphenylalanine)-responsive dystonia (DRD; Mendelian Inheritance in Man #128230). DRD is a genetically heterogeneous and clinically complex movement disorder that is usually treated with l-dopa, a precursor of the neurotransmitter dopamine. Whole-genome sequencing identified compound heterozygous mutations in the SPR gene encoding sepiapterin reductase. Disruption of SPR causes a decrease in tetrahydrobiopterin, a cofactor required for the hydroxylase enzymes that synthesize the neurotransmitters dopamine and serotonin. Supplementation of l-dopa therapy with 5-hydroxytryptophan, a serotonin precursor, resulted in clinical improvements in both twins. Genomes on prescription: Nature 2011 Bainbridge M, Sci Transl Med 2011

Acknowledgement Jim Noonan Yale University, School of Medicine and west Campus NHGRI: CMG YCGA staff

Questions?

Data Analysis Overview Primary Analysis Secondary Analysis Data Visualization

Primary and Secondary Analysis Overview Analysis Type Software Outputs Images/TIFF files Sequencing ICS/RTA Base Calling Intensities Primary Analysis ICS/RTA Alignments and Variant Detection Secondary Analysis

Cluster Generation: Amplification Template hybridization and Initial Extension Original template is washed away Template hybridization Initial extension Denaturation 3' extension OH OH P7 P5 Grafted flowcell Initials steps for the PE chemistry are the same as the Single Read chemistry. single molecules bound to flow cell in a random pattern >250-300 million single molecules hybridize to the lawn of primers

Cluster Generation: Amplification Result: two copies of covalently bound single-stranded templates Single-strand flips over to hybridize to adjacent oligos to form a bridge Hybridized primer is extended by polymerases Double-stranded bridge is denatured 2nd cycle denaturation 1st cycle extension 1st cycle annealing 1st cycle denaturation 2nd cycle annealing n=35 total 2nd cycle extension Amplification steps are also the same except that 28 cycles is recommended for any samples where the insert is greater than 200 bp. More cycles for samples with insets greater then 200 bp will cause the clusters to get too large after P5 resynthesis (see slide 6)

Cluster Generation: Linearization, Blocking and sequencing Cluster Generation: Linearization, Blocking and sequencing primer hybridization dsDNA bridges are denatured complement strands are cleaved and washed away sequencing primer P5 Linearization Block with ddNTPS Denaturation and Sequencing Primer Hybridization Cluster Amplification The first linearization step uses the Linearization 1 Enzyme instead of Periodate. The blocking step still uses ddNTPs but uses Blocking Enzyme 1 and 2 in the PE protocol instead of terminal transferase for the Single Read protocol. Read 1 primer hybridization uses Read 1 PE Sequencing Primer. Enzymatic cleavage, uracyl incorporation enzyme Free 3’ ends are blocked to prevent unwanted DNA priming

Sequencing Resynthesis of P5 Strand (15Cycles) Sequencing First Read OH Sequencing First Read Denaturation and De-Protection OH Denaturation and Hybridization P7 Linearization OH Sequencing Second Read Denaturation and Hybridization Block with ddNTPs The steps up to and including the first read sequencing are pretty much the same as for a single read. The first read sequencing is where the single read protocol would stop. For the PE protocol, it continues with deprotecting the P5 primer using deprotection enzyme. Resynthesis of the P5 strand occurs over 15 cycles. P7 linearization uses Linearization 2 Enzyme. Blocking again occurs with ddNTPS and Blocking enzyme 1 and 2. Sequencing read 2 uses Read 2 PE Sequencing Primer.