DNA Sequencing Second generation techniques

Slides:



Advertisements
Similar presentations
Genome Biology for Programmers Lecture Series: Illumina Sequencing
Advertisements

Capillary Electrophoresis and the Human Genome
Recombinant DNA technology
BCM208 Metabolic Biochemistry Topic 7: Gene metabolism and Expression.
SOLiD Sequencing & Data
Canadian Bioinformatics Workshops
RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.
RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.
Polymerase chain reaction: Starting with VERY SMALL AMOUNTS OF DNA (sometimes a few molecules), one can amplify the DNA enough to detect it by electrophoresis.
CS 6293 Advanced Topics: Current Bioinformatics
Next Generation DNA Sequencing Platforms: Evolving Tools for
HIV GENOTYPE ASSAY Anabelia Perez, MLT (ASCP) Molecular Technologist August 6, 2008.
Diabetes and Endocrinology Research Center The BCM Microarray Core Facility: Closing the Next Generation Gap Alina Raza 1, Mylinh Hoang 1, Gayan De Silva.
NGS Data Generation Dr Laura Emery. Overview The NGS data explosion Sequencing technologies An example of a sequencing workflow Bioinformatics challenges.
Update on Next-Generation Sequencing
Finishing the Human Genome
Analyzing your clone 1) FISH 2) “Restriction mapping” 3) Southern analysis : DNA 4) Northern analysis: RNA tells size tells which tissues or conditions.
-The methods section of the course covers chapters 21 and 22, not chapters 20 and 21 -Paper discussion on Tuesday - assignment due at the start of class.
Mapping protein-DNA interactions by ChIP-seq Zsolt Szilagyi Institute of Biomedicine.
High Throughput Sequencing Methods and Concepts
Library Preparation Application dependant, using standard molecular biological techniques. Fragment library oligo kit: (per library)$35 GeneAmp dNTP blend:
MES Genome Informatics I - Lecture IV. NGS basics Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute, Yonsei University.
DNA Cloning and PCR.
Restriction Nucleases Cut at specific recognition sequence Fragments with same cohesive ends can be joined.
High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.
Stratton Nature 45: 719, 2009 Evolution of DNA sequencing technologies to present day DNA SEQUENCING & ASSEMBLY.
How will new sequencing technologies enable the HMP? Elaine Mardis, Ph.D. Associate Professor of Genetics Co-Director, Genome Sequencing Center Washington.
1. 2 VARIANTS OF PCR APPLICATIONS OF PCR MECHANICS OF PCR WHAT IS PCR? PRIMER DESIGN.
Taqman Technology and Its Application to Epidemiology Yuko You, M.S., Ph.D. EPI 243, May 15 th, 2008.
GENE SEQUENCING. INTRODUCTION CELL The cells contain the nucleus. The chromosomes are present within the nucleus.
Molecular Genetic Technologies Gel Electrophoresis PCR Restriction & ligation Enzymes Recombinant plasmids and transformation DNA microarrays DNA profiling.
PCR With PCR it is possible to amplify a single piece of DNA, or a very small number of pieces of DNA, over many cycles, generating millions of copies.
Green with envy?? Jelly fish “GFP” Transformed vertebrates.
Sequencing Transcriptomes Do Me a SOLiD. Overview – Library Construction RNA ◦Isolate & Bioanalyze ◦rRNA Depletion ◦Fragment ◦Bioanalyze Amplified Library.
Library QA & QC Day 1, Video 3
Introduction to Illumina Sequencing
Assay I HLA-DQ Alpha (A1) Haplotype. Purpose To determine which one of several known alleles is present at the HLA DQ α locus on each of an individual’s.
Cse587A/Bio 5747: L2 1/19/06 1 DNA sequencing: Basic idea Background: test tube DNA synthesis DNA polymerase (a natural enzyme) extends 2-stranded DNA.
Measurement Methods in Systems Biology
DNA Sequencing First generation techniques
Next-generation sequencing technology
Draft sequencing of 1,000 genomes to study the genetics of quantitative traits: data production Fabio Busonero1, Brendan J. Tarrier2, Elizabeth A. Ketterer2,
Short Read Sequencing Analysis Workshop
Next generation sequencing
Sequencing technologies
Illumina Processing Steven Leonard
Next-generation sequencing technology
The FASTQ format and quality control
SVM 2FG.
Sequencing Technologies
AMPLIFYING AND ANALYZING DNA.
SOLEXA aka: Sequencing by Synthesis
B3- Olympic High School Bioinformatics
Polymerase Chain Reaction (PCR)
DNA Sequencing The DNA from the genome is chopped into bits- whole chromosomes are too large to deal with, so the DNA is broken into manageably-sized overlapping.
Small RNA Sample Preparation
mRNA Sequencing Sample Preparation
2nd (Next) Generation Sequencing
High-throughput sequencing techniques
ULTRASEQUENCING. Next Generation Sequencing: methods and applications.
Sequencing techniques
Massively Parallel Sequencing: The Next Big Thing in Genetic Medicine
High-Throughput Sequencing Technologies
High-Throughput Sequencing Technologies
Digital Gene Expression – Tag Profiling Sample Preparation
Next-generation DNA sequencing
BF nd (Next) Generation Sequencing
Canadian Bioinformatics Workshops
Standard (Sanger) sequencing
Tools for Molecular Biology
Presentation transcript:

DNA Sequencing Second generation techniques Hardison Genomics 3_2 1/20/14

Second generation sequencing Michael Metzker review (2010) Nature Reviews Genetics 11: 31-46 1/20/14

Two generations of sequencing technology Feature First generation Second generation Isolate DNA fragments to sequence Cloning in bacteria to generate many copies of the same DNA sequence, usually as a recombinant plasmid Physical cloning to generate thousands of copies of a DNA molecule, separated on beads on or as positions on a flow cell Purification of clones? Prepare the plasmids from each bacterial clone No need for plasmid preparation DNA sequencing approach Sequencing by synthesis or by base-specific degradation Sequencing by synthesis, pyrosequencing, or ligation (SOLiD) Method of detection Electrophoresis to separate by size; fluorescent dyes Light detection at each cycle of synthesis Number of clones sequenced in parallel scores to hundreds hundreds of millions 1/20/14

Templates: Physical clones or single molecules No need for molecular clones (e.g. plasmids in bacteria) 1/20/14

Four color cyclic reversible termination in Illumina sequencing 1/20/14

Pacific Biosciences: Long reads from single molecules Low accuracy (~85%) 1/20/14

Sequence files with quality scores FASTQ format @SequenceName Sequence in 1 letter code Optional sequence name Phred quality score in 1 letter ASCII code @M00539:11:000000000-A1VFM:1:1101:13898:1904 1:N:0:1 GTGAGACCACTCTACACATCTCAACGAAATGTCCTATCCCTGTGTGCAGG + ?????BB?DDDDDBDBFFFFFFCFHHHHBHGHFHHHHHHHGFHHGHHFHH @M00539:11:000000000-A1VFM:1:1101:14998:1904 1:N:0:1 TATTCTCTGTACTTTGACTCATTGTGAGTCCCTGTATCAACCACCTTCC ?AAAABBBEDDDDEDDGGGGGGHIHIFHIIIIIFHHGHFGHIEFHIIIH Encoding quality scores, from Wikipedia article on FASTQ: 1/20/14

Second generation sequencing on the Illumina platform Includes material from Illumina and from Cheryl A. Keller, PhD Project manager and research associate Center for Comparative Genomics and Bioinformatics Department of Biochemistry and Molecular Biology Penn State University 1/20/14

The HiSeq2000 Output (as of about 2012) Number of reads Up to 600 Gb/run Number of reads 3 billion single end reads 6 billion paired end reads Sequencing data can be used for a variety of functional genomics assays Transcription factor binding DNA methylation Histone modifications Nucleosome mapping Genome resequencing Transcriptome analysis microRNA profiling Dan Gheba 1/20/14

Illumina Sample Preparation Cluster Generation Sequencing Data Analysis Sample Preparation Cluster Generation Sequencing By Synthesis (SBS) Data Analysis 1/20/14

1/20/14

Single Read (SR) vs. Paired End (PE) sequencing Read 1 SP SR sequencing - 50 bp reads - Only forward strands are read in each cluster Read 2 SP Read 1 SP PE sequencing - 2 x 100 bp reads - Forward and reverse strands are read in each cluster - Allows for highly precise alignment of reads 1/20/14

Quality control checks Bioanalyzer qPCR quantification Why check the size of your library? Only fragments in a certain size range can form clusters 250-300 bp is ideal Need size data for accurate mapping and peak calling The Bioanalyzer is a chip-based capillary electrophoresis machine to analyse RNA, DNA, and protein. Data plot of migration time versus fluorescence intensity. 1/20/14

Quality control checks Bioanalyzer qPCR quantification Quantitative PCR Real time PCR Used to simultaneously amplify and quantify Why use qPCR for quantification of a library? Only amplifies molecules capable of cluster formation on the flow cell Enables more precise control over cluster density, which is crucial to obtaining high quality sequence reads 1/20/14 Illumina qPCR_Quantification_Guide_11322363_B

Cluster Generation Automated cluster generation systems Flow cell Sample Preparation Cluster Generation Sequencing Data Analysis Automated cluster generation systems Clonal amplification of template Flow cell Proprietary Solid surface containing covalently-bound adapters to which templates attach Cluster generation Process by which attached DNA fragments are extended and bridge amplified to create hundreds of millions of clusters, each of which contains ~1,000 identical copies of a single template molecule. 1/20/14

1/20/14

1/20/14 Joe Alessi

1/20/14 Joe Alessi

1/20/14 Joe Alessi

1/20/14 Joe Alessi

1/20/14 Joe Alessi

1/20/14 Joe Alessi

1/20/14 Joe Alessi

1/20/14 Joe Alessi

1/20/14 Joe Alessi

1/20/14 Joe Alessi

1/20/14 Joe Alessi

Sequencing of a paired-end indexed sample requires 3 reads 1/20/14

Second round of cluster amplification of a paired end library occurs directly on the HiSeq2000 1/20/14

Sequencing Starting a run HiSeq Control Software (HSC) Sample Preparation Cluster Generation Sequencing Data Analysis Starting a run HiSeq Control Software (HSC) Sequencing By Synthesis (SBS) Real Time Analysis (RTA) Monitoring the run Data Metrics 1/20/14

Sequencing By Synthesis (SBS) Chemistry/incorporation Imaging Cleavage 1/20/14

Sequencing By Synthesis (SBS) Chemistry/incorporation Imaging Cleavage 1/20/14 Joe Alessi

Sequencing By Synthesis (SBS) Chemistry/incorporation Imaging Cleavage 1/20/14 Joe Alessi

Sequencing By Synthesis (SBS) Chemistry/incorporation Imaging Cleavage 1/20/14 Joe Alessi

Sequencing By Synthesis (SBS) Chemistry/incorporation Imaging Cleavage 1/20/14 Joe Alessi

Real Time Analysis (RTA) Clusters Basecalling 1/20/14 Bjoern Hihn

Real Time Analysis (RTA) Clusters Basecalling 1/20/14 Joe Alessi

Real Time Analysis (RTA) Clusters Basecalling RTA will be ready to call a base if: Color matrix has been generated (corrects for cross-talk between channels) Phasing has been calculated Cluster intensity file for that cycle exits Since all four reversible terminator-bound dNTPs are present during each sequencing cycle, natural competition minimizes incorporation bias. Base calls are made from signal intensity measurements for each cycle. Phasing = RTA assumes that a fixed fraction of molecules in each cluster become "phased" at each cycle, in the sense that those molecules fall one base behind in sequencing. RTA will be ready to call a base if: Color matrix has been generated for that tile (corrects for cross-talk between channels) Phasing has been calculated CIF file exits 1/20/14 Joe Alessi

Real Time Analysis (RTA) Clusters Basecalling Fluorescent intensity of each base during the first 4 cycles is used to generate a base-calling algorithm Illumina cluster detection algorithms are optimized around a balanced representation of A, T, G, C If samples are not balanced, one should select a balanced sample as a control lane Algorithm must account for “cross-talk” or overlap between channels because the excitation and emission spectrums for each base overlap RTA assumes that a fixed fraction of molecules in each cluster become "phased" at each cycle, in the sense that those molecules fall one base behind in sequencing. Since all four reversible terminator-bound dNTPs are present during each sequencing cycle, natural competition minimizes incorporation bias. Base calls are made from signal intensity measurements for each cycle. Phasing = RTA assumes that a fixed fraction of molecules in each cluster become "phased" at each cycle, in the sense that those molecules fall one base behind in sequencing. RTA will be ready to call a base if: Color matrix has been generated for that tile (corrects for cross-talk between channels) Phasing has been calculated CIF file exits 1/20/14 Joe Alessi

Data Metrics What kind of information can be monitored? Cluster density – must be able to resolve individual clusters Intensity values – strength of fluorescent signal Flowcell chart – can monitor intensity values and cluster density for each lane using a heat map scale % base – indication of G-C balance Q scores – measures the quality of a given base A Q score of 30 Probability of incorrect base call = 1 in 1000 Inferred base call accuracy = 99.9% Qscore = Quality scoring refers to the process of assigning a quality score to each base call. For example Q30 equates to an error rate of 1 in 1000, or 0.1% and Q40 equates to an error rate of 1 in 10,000 or 0.01%. %base = Displays bases read Intensity values = displays intensity of bases read FWHM—Displays the focus quality, as indicated by the full width at half maximum of clusters (in pixels). 1/20/14

1/20/14

1/20/14

Illumina Sample Preparation Cluster Generation Sequencing Data Analysis Sample Preparation Cluster Generation Sequencing By Synthesis (SBS) Data Analysis Bcl to Fastq conversion Demultiplexing (if necessary) Bioinformatic analysis 1/20/14

A few take home points… Illumina Sample Preparation Cluster Generation Sequencing Data Analysis Illumina Sample prep involves the ligation of a forked adaptor to size- selected fragments of interest. Accurate sample quantification is crucial to the success of a run. Illumina uses a reversible terminator SBS based chemistry. Samples should be GC-balanced for accurate basecalling. Run data metrics can be monitored to determine success of a run. 1/20/14