Evolution of genomic technologies Genetic mapping studies: Discovery of genes for well characterized Mendelian diseases. Dense SNP genotyping using microarray technology: GWAS for discovery of common variants in common disease High throughput sequencing: Discovery of rare variants in not previously recognized Mendelian diseases.
DNA sequencing can provide a deeper understanding about DNA/RNA than any other technology Microarray Technology revolutionized biomedical research, but has several limitations, which DNA sequencing may overcome As the cost of sequencing is rapidly decreasing, it is becoming affordable to perform sequencing at a genome level DNA sequencing can provide a deeper understanding about DNA/RNA than any other technology Microarray Technology revolutionized biomedical research, but has several limitations, which DNA sequencing may overcome As the cost of sequencing is rapidly decreasing, it is becoming affordable to perform sequencing at a genome level Number of PubMed Articles In recent years there has been an explosion of research articles using next generation sequencing technologies
1980 Nobel Prize in chemistry phi X 174 ~5300 bp gels read by hand radiolabeled dideoxyNTPs one lane per nucleotide 800 bp reads low throughput (several kb/gel) First Generation: Sanger sequencing (1975-1977)
Second-generation sequencing: Massively parallel sequencing of millions of template Illumina Ion Torrent-Proton
Illumina next-generation sequencing platforms MiSeq HiSeq 2000 HiSeq 2500 NextSeq 500 HiSeq X Ten
Comparison of MiSeq, NextSeq, HiSeq 2500 and HiSeq X Ten Sequencing MiSeqNextSeqHiSeq 2500HiSeq X Ten Focused power Flexible PowerProduction PowerPopulation-scale whole human genome sequencing at $1000/genome Mid Output High Output Rapid RunHigh Output Output/run Gb3 to 1520-4030-12020-360100-2,0003,200-3,600 Reads/run25 M130M400M600M4,000M6,000 M Run times5-65 hrs.15-26 hrs.12-30 hrs.7-40 hrs.1 -6 days3 days Gb/Day 637962153301,200 Flow cells1111 or 2 CapEx$100,000$250,000$740,000$10 Million (sold only in a pack of 10) HiSeq X Ten: 10 instruments most cost effective when operated at full capacity of 18,000 WGS/year
Overall Illumina Sequencing Workflow Sample Preparation Sequencing Library Preparation Adapter1Adapter2 Sequencing Primer Insert Cluster Generation Hybridizing Library to Flow Cell Creating clusters from individual molecules Sequencing by Synthesis Add all 4 bases with Reversible Terminators Image 4 colors Remove Terminator, repeat
Genomic Sample Prep Workflow Purified genomic DNA 1. Genomic DNA fragmentation Fragments of less than 800 bp 2. End-repair Blunt ended fragments with 5’-Phosphorylated ends 3. Klenow exo- with dATP 3’-dA overhang 4. Adapter ligation Adapter modified ends 5. Gel purification/bead Removal of unligated adapter 6. PCR Genomic DNA Library Adapter1Adapter2 Sequencing Primer Insert
What is a Flow Cell? A flow cell is a thick glass slide with 8 channels or lanes Each lane is randomly coated with a lawn of oligos that are complementary to library adapters Adapter1 Adapter2 Insert Sequencing Primer P5 oligo P7 Oligo Ordered Flow Cells: highest number of clusters
OH Grafted flowcell P7 P5 Cluster Generation: Template hybridization and Initial Extension Template hybridization Initial extension Denaturation >200 million single molecules hybridize to the lawn of primers 3' extension single molecules bound to flow cell in a random pattern Original template is washed away
1 st cycle denaturation n=35 total Cluster Generation: Amplification 2 nd cycle denaturation 1 st cycle annealing 1 st cycle extension 2 nd cycle annealing 2 nd cycle extension Single-strand flips over to hybridize to adjacent oligos to form a bridge Hybridized primer is extended by polymerases Double- stranded bridge is denatured Result: two copies of covalently bound single- stranded templates
Cluster Generation: Linearization, Blocking and sequencing primer hybridization Cluster Amplification P5 LinearizationBlock with ddNTPS Denaturation and Sequencing Primer Hybridization dsDNA bridges are denatured complement strands are cleaved and washed away Free 3’ ends are blocked to prevent unwanted DNA priming sequencing primer
Sequencing Denaturation and Hybridization Sequencing First Read Denaturation and De-Protection OH Resynthesis of P5 Strand (15Cycles) OH P7 Linearization OH Block with ddNTPs Denaturation and Hybridization Sequencing Second Read
Reversible Terminator Seq Chemistry O PPP HN N O O cleavage site fluor 3’3’ block Next cycle Incorporation Detection Deblock; fluor removal O DNA HN N O O 3’3’ O 5’5’ free 3’ end X OH All 4 labeled nucleotides in 1 reaction (green, orange, red and blue) Advantages of reversible terminators: Only one base is added at a time Fluor can be cleaved off after the imaging. Thus, it does not emit color at the next cycle allowing only newly added base (with attached fluor) to emit the light
Sequencing By Synthesis (SBS) 5’5’ G T C A G T C A G T C A G T 3’3’ 5’5’ C A G T C A T C A C C T A G C G T A First base incorporated Cycle 1: Add sequencing reagents Remove unincorporated bases Detect signal/Imaging Cycle 2-n: Add sequencing reagents & repeat All four labeled nucleotides in one reaction High accuracy Base-by-base sequencing No problems with homopolymer repeats Cleave off fluor and Deblock
Representation of Base Calling From Raw Data 1 2 3 7 89 4 5 6 T T T T T T T G T … T G C T A C G A T … The identity of each base of a cluster is read off from sequential images
Alignments and Variant Detection Images/TIFF files Base CallingIntensities Software Outputs Primary and Secondary Analysis Overview Analysis Type Primary Analysis Secondary Analysis Sequencing ICS/RTA Bulldog N
4 Ion Protons: coming soon Ion PGM™ Sequencer Ion Torrent PGM and Proton First PostLight sequencing technology : Instead of using light as an intermediary, PGM creates a direct connection between the chemical and the digital worlds.
The Chip is the Machine Ion PI chip: >165 million wells per chip: 8 to 10 Gb data per run Ion PII chips: ~100 Gb of data in ~4 hours Uses semiconductor chips for sequencing. Ion 314 Chip v.2Ion 316 Chip v.2Ion 318 Chip v.2 Wells1.3 million6.3 million11 million Output200 base 400 base 30-50Mb 60-100Mb 300-600Mb 600 Mb-1Gb 1.2-2 Gb
Base Calling When a nucleotide is incorporated into a strand of DNA, a Hydrogen ion is released as a by product. The H ion carries a charge which the PGM’s ion sensor can detect as a base. http://www.lifetechnologies.com/us/en/home/life- science/sequencing/next-generation-sequencing/ion-torrent-next- generation-sequencing-technology.html#http://www.lifetechnologies.com/us/en/home/life- science/sequencing/next-generation-sequencing/ion-torrent-next- generation-sequencing-technology.html#
Advantages and Current Limitations Advantages Low equipment cost Rapid run times: 3 to 4 hours Simple Chemistry Limitations Homopolymers detection Error rates Slow on introducing newer chips: Overpromise PGM and Proton: two separate sequencing equipment Tedious Library prep protocols
PacBio RS Third generation sequencing
The Third Generation Sequencing Platform: PacBio RS Pacific Biosciences has developed Single Molecule Real Time (SMRT ™ ) DNA sequencing technology: PacBio RS. This technology enables, for the first time, the observation of natural DNA synthesis by a DNA polymerase as it occurs. This technology delivers long reads at single molecule level and fast time to result, enabling a new paradigm in genomic analysis.
Key Applications for PacBio RS Targeted sequencing SNP and structure variants detection Repetitive region Full length transcript profiling De novo assembly and genome finishing Bacteria genome Fungal genome Gap-captured sequencing Targeted captured sequencing Base modifications detection Methylations DNA damages **Projects at YCGA YCGA PacBio RS
Comparisons Between PacBio RS and Illumina HiSeq PacBio RS (Third generation) Illumina HiSeq (Second generation) Sequencing Chemistry Sequencing by synthesis (SBS) Single Molecule Real Time (SMRT) Sequencing by synthesis (SBS) Sequencing substrate Smart Cell made up of 150,000 ZMWs Flow cell has made of 8 separate lanes Data output per day 1 to 2 billion/ day.60-1,200 billion/day Cost/Mb$1.5/Mb $.04 per Mb Read LengthAverage up to 5 Kb50bp to 150bp Error rates Raw: 10-15 %. With 30x coverage: Q50 (< 0.01) 0.5 to 1 % Sample Library SMRT Bell template (Single-strand circular DNA) 250 bp to 10 Kb insert dsDNA with adaptors (175 bp to 1 Kb)
Electrically resistant Lipid bilayer Exonuclease Cyclodextrin https://www.nanoporetech.com/technology/the- gridion-system/movie-an-introduction-to-the-gridion- system DNA, RNA and Protein analysis
Performance/Limitations…..? First data was released in Feb 2012. No updates since then No data available for the evaluation: High Error Rates - >5% Will start early access program in the next few months Advantages Nanopores offer a label-free, electrical, single-molecule DNA sequencing method No costly fluorescent labeling reagents No need for expensive optical hardware and sophisticated instrumentation to detect DNA, RNA and Protein. Advantages Nanopores offer a label-free, electrical, single-molecule DNA sequencing method No costly fluorescent labeling reagents No need for expensive optical hardware and sophisticated instrumentation to detect DNA, RNA and Protein.
The YCGA Laboratory at West Campus
Portion of the laboratory showing sequencing systems through the glass wall partition that separates laboratory from the rest of office and administrative area. YCGA was established in January 2009 through generous funding support and the strong commitment from the Yale University and School of Medicine Located in a newly renovated building. Approximately 7,000 Sq Ft laboratory and ~4,000 Sq Ft office space 20 staff
Sequencing Platforms at YCGA 10 Illumina HiSeqs One PacBio RS One MiSeq YCGA has kept pace with cutting-edge sequencing technologies Ion PGM™ Sequencer Will acquire new Illumina sequencers introduced just few weeks ago
Dave Frioni and their team from Yale ITS. Robert Bjornson, Ph.D. IT director for YCGA Nicholas Carriero, Ph.D. BulldogN: provides ~1300 cores and 2.2 PB of high performance storage. Computer Infrastructure
Increasing Demand for Sequencing at YCGA Trends of sequence data output at YCGA (average of 6 months) Increase in the number of Principle Investigators using YCGA over the past 4 years
Types of samples processed and runs of sequence read lengths carried out at YCGA in a typical month. Types of samples processed and runs of sequence read lengths carried out at YCGA in a typical month
Whole-Genome VS. Whole-Exome Sequencing 2.1M probes cover ~300,000 exons of 19,000 genes Total covered bases: 44.1Mb Protein coding genes (exome) constitute 1% of the human genome but harbor 85 % of disease causing mutations Significantly cheaper than sequencing entire genome Data storage challenges Validation challenges >25,000 exomes analyzed for several disorders including Cardiovascular, abnormal brain development, autism, liver, kidney, hypertension, skin and various tumors.
Need for strong R&D efforts for Next-Generation sequencing operation Optimization of sample preparation protocols for exome capture that have decreased the cost of a single human exome from $8,000 in 2009 to the current price of ~$500, while improving the quality of the data. Development of a highly efficient protocol to extract and repair DNA from formalin-fixed paraffin embedded blocks for genetic analysis. Improved protocols for gDNA-seq, RNA-seq, and ChIP-seq that show higher data complexity than traditional protocols, allow users to start with less material, and cost less.
Scientific and economic impact of high throughput sequencing at Yale
List of select publications resulting form the next-generation sequencing usage at YCGA Whole-exome sequencing identifies recessive WDR62 mutations in severe brain malformations. Bilguvar and GunelNature, v467, 2010 A Novel miRNA Processing Pathway Independent of Dicer Requires Argonaute2 Catalytic Activity. Cifuentes and Giraldez Science, v328, 2010 Mitotic recombination in patients with ichthyosis causes reversion of dominant mutations in KRT10. Choate and Lifton. Science, v330, 2010 Transcriptomic analysis of avian digits reveals conserved and derived digit identities in birds. Wang and WagnerNature, v477, 2011 Transposom-mediated rewiring of gene regulatory networks contributed to the evolution of pregnancy in mammals. Lynch and Wagner Nature, Genet. v43, 2011 K + channel mutations in adrenal aldosterone-producing adenomas and hereditary hypertension. Choi and LiftonScience, v331, 2011 Recessive LAMC3 mutations cause malformations of occipital cortical development. Barak and Gunel.Nat Genet., V43, 2011 Spatio-temporal transcriptome of the human brain. Kang and SestanNature, v478, 2011 Langerhans cells facilitate epithelial DNA damage and squamous cell carcinoma. Modi and GirardiScience, v335, 2012 Mutations in kelch-like 3 and cullin 3 causes hypertension and electrolyte abnormalities. Boyden and LiftonNature, v482, 2012 De novo point mutations, revealed by whole-exome sequencing, are strongly associated with Autism Spectrum Disorders. Sanders and State Nature, v485, 2012 Exome sequencing identifies recurrent somatic RAC1 mutations in melanoma. Krauthammer and HalabanNat Genet., V44, 2012 Genomic Analysis of Non-NF2 Meningiomas Reveals Mutations in TRAF7, KLF4, AKT1, and SMO. Clark and GunelScience, v339, 2013 De novo mutations in histone-modifying genes in congenital heart disease. Zaidi and LiftonNature, v498, 2013 Recessive mutations in DGKE cause atypical hemolytic-uremic syndrome. Lemaire and LiftonNat Genet., V45, 2013 Somatic and germline CACNA1D calcium channel mutations in aldosterone-producing adenomas and primary aldosteronism. Scholl and Lifton Nat Genet., V45, 2013 The evolution of lineage-specific regulatory activities in the human embryonic limb. Cotney and NoonanCell, v154, 2013 Mutations in DSTYK and dominant urinary tract malformations. Sanna-Cherchi and Gharavi N Eng J Med., V369, 2013 Nanog, Pou5f1 and SoxB1 activate zygotic gene expression during the maternal-to-zygotic transition. Lee and Giraldez Nature, 2013 Co-expression networks implicate human mid-fetal deep cortical projection neurons in the pathogenesis of autism. Willsey and State Cell, 2013 (In press)
Impact of High Throughput Sequencing: Partial Grant Funding Mendelian center grant, NIH $12M (3y) Gilead cancer grant $40M (4y) Brain tumor gift $12M (4y) ARRA brain development (NIH) $ 3M (2y) ARRA kidney disease (NIH) $ 2M (2y) Simons autism sequencing $ 4M (3y) Brain transcriptome (NIH) $10M (2y) Congenital heart disease (NIH)$ 3M (4y) Melanoma Spore$12M (5y) Fidelity (Computer storage)$ 0.55M VA- Schizophrenia/Bipolar disorder $12.3 M Yale Comprehensive Cancer Center $14.0 M Total $ 124.85 M
The Centers for Mendelian Genomics email@example.com Supported by NHGRI and NHLBI in Dec 2011
CMGs: Goals Discover the genes and variants responsible for as many Mendelian phenotypes as possible Develop and disseminate improved methods for disease gene discovery and analysis Create public resources to enhance research and discovery activities Educate colleagues and public regarding Mendelian disease Whole-Exome/whole-genome analysis is carried out at no cost and on a collaborative basis. Investigators with interesting patients or cohorts can contact us firstname.lastname@example.org email@example.com Shrikant.firstname.lastname@example.org
Opportunities: DNA Sequencing and Personalized Medicine Use of genomics, the science of looking at all of the information in the human genome, to tailor medical care to individuals based on their genetic makeup. Earlier interventions Improved diagnosis More effective drug development Better medical outcome
CLIA: The New Paradigm in Molecular Diagnostics YCGA is carrying out clinical diagnostic work in collaboration with Dr. Allen Bale Over 500 exomes are analyzed for various disorders Conventional molecular testing- gene by gene Genomic testing using Exome analysis
A) Equipment: Rapid introduction of new Sequencing technologies: Investment challenge Challenges associated with new technologies: Upgrade and Breakdown B) Reagents: Constant change/introduction of new more reliable reagents (v3) C) Software upgrades/computer infrastructure challenges/analysis. E) Constant upgrade of protocols: Sequence capture: 6 versions and continues to be updated RNA-Seq: True seq Cheaper than older version A) Equipment: Rapid introduction of new Sequencing technologies: Investment challenge Challenges associated with new technologies: Upgrade and Breakdown B) Reagents: Constant change/introduction of new more reliable reagents (v3) C) Software upgrades/computer infrastructure challenges/analysis. E) Constant upgrade of protocols: Sequence capture: 6 versions and continues to be updated RNA-Seq: True seq Cheaper than older version
Sequencing a genome is simple finding a cause of a disease is not
Despite challenges, tremendous progress has been made at a rapid pace. NGS will continue to make a huge impact in biology, bio-medicine and human health.
Thank You! Jim Noonan Yale University and Medical School and West Campus administration ITS, HPC and Bioinformatics staff YCGA staff Collaborating Yale Investigators
https://www.nanoporetech.com/technology/ analytes-and-applications-dna-rna- proteins/protein-analysis- Nanopore: Protein Analysis