CS 6293 Advanced Topics: Current Bioinformatics

Slides:



Advertisements
Similar presentations
How do we analyze DNA? Gel electrophoresis Restriction digestion
Advertisements

Capillary Electrophoresis and the Human Genome
Next–generation DNA sequencing technologies – theory & practice
High-Throughput Sequencing Technologies
Next-generation sequencing
Next Generation Sequencing, Assembly, and Alignment Methods
The past, present, and future of DNA sequencing Dan Russell.
13-2 Manipulating DNA.
Canadian Bioinformatics Workshops
RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.
Greg Phillips Veterinary Microbiology
Accurate Multiplex Polony Sequencing of an Evolved Bacterial Genome Jay Shendure, Gregory J. Porreca, Nikos B. Reppas, Xiaoxia Lin, John P. McCutcheon.
1 Next Generation Sequencing Itai Sharon November 11th, 2009 Introduction to Bioinformatics.
RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.
Department of Bioinformatics and Computational Biology
Special Topics in Genomics
Next Generation DNA Sequencing Platforms: Evolving Tools for
NEXT GENERATION SEQUENCING Technologies on Biomedical Research
GENOME SEQUENCING. I. Genome sequencing The Sanger Method (1977) Denaturation +priming Polymerization.
Update on Next-Generation Sequencing
Next generation sequencing platforms Applications
The impact of next-generation sequencing technology of genetics Elaine R. Mardis – 11 February Washington School of Medicine, Genome Sequencing Center.
Next Now-Generation Genomics: methods and applications for modern disease research Aaron J. Mackey, Ph.D. Center for Public Health.
High-Throughput Sequencing Technologies
Sequencing Technologies
High Throughput Sequencing Methods and Concepts
MES Genome Informatics I - Lecture IV. NGS basics Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute, Yonsei University.
Technological Solutions. In 1977 Sanger et al. were able to work out the complete nucleotide sequence in a virus – (Phage 0X174) This breakthrough allowed.
NEXT – GEN SEQUENCING TECHNIQUES
Bioinformatics and Sequencing Relevant to SolCAP
High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.
A statistical base-caller for the Illumina Genome Analyzer Wally Gilks University of Leeds.
CZ5225: Modeling and Simulation in Biology Lecture 9: Next Generation Sequencing Prof. Chen Yu Zong Tel:
Chromatin Immunoprecipitation DNA Sequencing (ChIP-seq)
Stratton Nature 45: 719, 2009 Evolution of DNA sequencing technologies to present day DNA SEQUENCING & ASSEMBLY.
How will new sequencing technologies enable the HMP? Elaine Mardis, Ph.D. Associate Professor of Genetics Co-Director, Genome Sequencing Center Washington.
GENE SEQUENCING. INTRODUCTION CELL The cells contain the nucleus. The chromosomes are present within the nucleus.
GigAssembler. Genome Assembly: A big picture
Proteome and Gene Expression Analysis Chapter 15 & 16.
Polymerase Chain Reaction What is PCR History of PCR How PCR works Optimizing PCR Fidelity, errors & cloning PCR primer design Application of PCR.
DNA Sequencing Technology and its Applications in Evolution Research Julie Urban, Ph.D. Assistant Director, Genomics & Microbiology Laboratory NC Museum.
Introduction to Illumina Sequencing
Cse587A/Bio 5747: L2 1/19/06 1 DNA sequencing: Basic idea Background: test tube DNA synthesis DNA polymerase (a natural enzyme) extends 2-stranded DNA.
DNA Sequencing First generation techniques
Next-generation sequencing technology
DNA Sequencing Second generation techniques
Next generation sequencing
Sequencing Introduction
Sequencing technologies
DNA Sequencing -sayed Mohammad Amin Nourion -A’Kia Buford
Next-generation sequencing technology
NGS technologies.
copying & sequencing DNA
SVM 2FG.
Genetic Research and Biotechnology
Sequencing Technologies
SOLEXA aka: Sequencing by Synthesis
Screening a Library for Clones Carrying a Gene of Interest
CISC 667 Intro to Bioinformatics (Spring 2007) Molecular Biology Tools
DNA Technology.
DNA Sequencing The DNA from the genome is chopped into bits- whole chromosomes are too large to deal with, so the DNA is broken into manageably-sized overlapping.
Next Generation Sequencing for Clinical Diagnostics-Principles and Application to Targeted Resequencing for Hypertrophic Cardiomyopathy  Karl V. Voelkerding,
ULTRASEQUENCING. Next Generation Sequencing: methods and applications.
The impact of next-generation sequencing technology on genetics
Massively Parallel Sequencing: The Next Big Thing in Genetic Medicine
High-Throughput Sequencing Technologies
High-Throughput Sequencing Technologies
Molecular Cloning.
Next-generation DNA sequencing
Standard (Sanger) sequencing
Presentation transcript:

CS 6293 Advanced Topics: Current Bioinformatics Next-generation sequencing - technology

Outline First generation sequencing Next generation sequencing (current) AKA: Second generation sequencing Massively parallel sequencing Ultra high-throughput sequencing Future generation sequencing Analysis challenges

Sanger sequencing (1st generation) DNA is fragmented Cloned to a plasmid vector Cyclic sequencing reaction Separation by electrophoresis Readout with fluorescent tags Jay Shendure & Hanlee Ji, Nature Biotechnology 26, 1135 - 1145 (2008)

Cyclic-array methods (next-generation) DNA is fragmented Adaptors ligated to fragments Several possible protocols yield array of PCR colonies. Enyzmatic extension with fluorescently tagged nucleotides. Cyclic readout by imaging the array. Jay Shendure & Hanlee Ji, Nature Biotechnology 26, 1135 - 1145 (2008)

Available next-generation sequencing platforms Illumina/Solexa ABI SOLiD Roche 454 Polonator HeliScope … 5

Emulsion PCR Fragments, with adaptors, are PCR amplified within a water drop in oil. One primer is attached to the surface of a bead. Used by 454, Polonator and SOLiD. Rothberg and Leomon Nat Biotechnol. 2008 Shendure and Ji Nat Biotechnol. 2008

454 Sequencing Stats: read lengths 200-300 bp accuracy problem with homopolymers 400,000 reads per run costs $60 per megabase Rothberg and Leomon Nat Biotechnol. 2008

Bridge PCR DNA fragments are flanked with adaptors. A flat surface coated with two types of primers, corresponding to the adaptors. Amplification proceeds in cycles, with one end of each bridge tethered to the surface. Used by illumina/Solexa.

http://www.illumina.com/pages.ilmn?ID=203

All 4 labeled nucleotides First Round All 4 labeled nucleotides Primers Polymerase

1. Take image of first cycle 2. Remove fluorophore 3. Remove block on 3’ terminus

Stats: read lengths up to 36 bp error rates 1-1.5% several million “spots” per lane (8 lanes) cost $2 per megabase http://seq.molbiol.ru/

Conventional sequencing Can sequence up to 1,000 bp, and per-base 'raw' accuracies as high as 99.999%. In the context of high-throughput shotgun genomic sequencing, Sanger sequencing costs on the order of $0.50 per kilobase. Jay Shendure & Hanlee Ji, Nature Biotechnology 26, 1135 - 1145 (2008) 14

Sequence qualities In most cases, the quality is poorest toward the ends, with a region of high quality in the middle Uses of sequence qualities ‘Trimming’ of reads Removal of low quality ends Consensus calling in sequence assembly Confidence metric for variant discovery In general, newer approaches produce larger amounts of sequences that are shorter and of lower per-base quality Next-generation sequencing has error rate around 1% or higher

Phred Quality Score p=error probability for the base if p=0.01 (1% chance of error), then q=20 p = 0.00001, (99.999% accuracy), q = 50 Phred quality values are rounded to the nearest integer

Main Illumina noise factors Schematic representation of main Illumina noise factors. (a–d) A DNA cluster comprises identical DNA templates (colored boxes) that are attached to the flow cell. Nascent strands (black boxes) and DNA polymerase (black ovals) are depicted. (a) In the ideal situation, after several cycles the signal (green arrows) is strong, coherent and corresponds to the interrogated position. (b) Phasing noise introduces lagging (blue arrows) and leading (red arrow) nascent strands, which transmit a mixture of signals. (c) Fading is attributed to loss of material that reduces the signal intensity (c). (d) Changes in the fluorophore cross-talk cause misinterpretation of the received signal (blue arrows; d). For simplicity, the noise factors are presented separately from each other. Erlich et al. Nature Methods 5: 679-682 (2008) 17

Comparison of existing methods Jay Shendure & Hanlee Ji, Nature Biotechnology 26, 1135 - 1145 (2008)

Read length and pairing ACTTAAGGCTGACTAGC TCGTACCGATATGCTG Short reads are problematic, because short sequences do not map uniquely to the genome. Solution #1: Get longer reads. Solution #2: Get paired reads.

Third generation Single-molecule sequencing Longer reads no DNA amplification is involved Helicos HeliScope Pacific Biosciences SMRT … Longer reads Roche/454 > 400bp Illumina/Solexa > 100bp Pacific Bioscience > 1000 bp and single molecule

Applications of next-generation sequencing Jay Shendure & Hanlee Ji, Nature Biotechnology 26, 1135 - 1145 (2008) 21

Analysis tasks Base calling Mapping to a reference genome De novo or assisted genome assembly

References Next-generation DNA sequencing, Shendure and Ji, Nat Biotechnol. 2008. Next-Generation DNA Sequencing Methods, Elaine R. Mardis, Annu. Rev. Genomics Hum. Genet. (2008) 9:387–402