Short Read Sequencing Analysis Workshop

Slides:



Advertisements
Similar presentations
Next-Generation Sequencing: Methodology and Application
Advertisements

Mo17 shotgun project Goal: sequence Mo17 gene space with inexpensive new technologies Datasets in progress: Four-phases of 454-FLX sequencing to max of.
RNA-seq library prep introduction
The Past, Present, and Future of DNA Sequencing
Reference mapping and variant detection Peter Tsai Bioinformatics Institute, University of Auckland.
Current Sequencing Technologies and Data Generation
Processing of miRNA samples and primary data analysis
Next-generation sequencing
Introduction to Short Read Sequencing Analysis
Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data Alex Zelikovsky Department of Computer Science Georgia State University Joint work.
Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520
High Throughput Sequencing
Next Generation DNA Sequencing Platforms: Evolving Tools for
Considerations for Analyzing Targeted NGS Data Exome Tim Hague, CTO.
11 © 2009 PerkinElmer © 2010 PerkinElmer November 20, 2012 DNA Services Overview.
Next Now-Generation Genomics: methods and applications for modern disease research Aaron J. Mackey, Ph.D. Center for Public Health.
Sequencing Technologies and Applications at JGI
Expression Analysis of RNA-seq Data
ARC Biotechnology Platform: Sequencing for Game Genomics Dr Jasper Rees
Introduction to next generation sequencing Rolf Sommer Kaas.
Introduction to Short Read Sequencing Analysis
Next Generation DNA Sequencing
RNA-seq workshop ALIGNMENT
RNA-Seq Assembly 转录组拼接 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日.
How will new sequencing technologies enable the HMP? Elaine Mardis, Ph.D. Associate Professor of Genetics Co-Director, Genome Sequencing Center Washington.
Considerations for Analyzing Targeted NGS Data Exome Tim Hague, CTO.
Cancer Genome Assemblies and Variations between Normal and Tumour Human Cells Zemin Ning The Wellcome Trust Sanger Institute.
BNFO 615 Usman Roshan. Short read alignment Input: – Reads: short DNA sequences (upto a few hundred base pairs (bp)) produced by a sequencing machine.
SEQUENCING – THE BENCHTOPS. Roche 454 Junior Same technology as 454 FLX Read length: 400 bases Paired-end 100,000 reads 12 hours (instrument time) Output.
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
Introduction to RNAseq
The Genome Assemblies of Tasmanian Devil Zemin Ning The Wellcome Trust Sanger Institute.
No reference available
RNA Sequencing and transcriptome reconstruction Manfred G. Grabherr.
Short Read Workshop Day 5: Mapping and Visualization
Reliable Identification of Genomic Variants from RNA-seq Data Robert Piskol, Gokul Ramaswami, Jin Billy Li PRESENTED BY GAYATHRI RAJAN VINEELA GANGALAPUDI.
Library QA & QC Day 1, Video 3
Short Read Workshop Day 1 - Experimental Design Example 1: How to log in to vieques.
Introduction to Illumina Sequencing
An Overview of Applications for the MiSeq and HiSeq 2500 April 4, 2016 Kevin Shianna, Ph.D. Sequencing Specialist - Illumina, Inc. MGC USERS GROUP.
071126_EAS56_0057_FC – lanes 1-8 read 2 b a _EAS56_0057_FC – lanes 1-8 read 1 Table S1. Summary tables for a read 1 and b read 2 of a.
Will 10x technology make us rethink genome assemblies?
Risheng Chen et al BMC Genomics
Simon v RNA-Seq Analysis Simon v
Interpreting exomes and genomes: a beginner’s guide
Research Techniques Made Simple: Next-Generation Sequencing:
DNA Sequencing Second generation techniques
Canadian Bioinformatics Workshops
Next generation sequencing
RNA-Seq for the Next Generation RNA-Seq Intro Slides
Cancer Genomics Core Lab
Sequencing technologies
Dr. Christoph W. Sensen und Dr. Jung Soh Trieste Course 2017
Gene expression from RNA-Seq
Short Read Sequencing Analysis Workshop
S1 Supporting information Bioinformatic workflow and quality of the metrics Number of slides: 10.
Ssaha_pileup - a SNP/indel detection pipeline from new sequencing data
Kallisto: near-optimal RNA seq quantification tool
Teagasc/APC Sequencing Facility
Transcriptome Assembly
2nd (Next) Generation Sequencing
MapView: visualization of short reads alignment on a desktop computer
Next-generation DNA sequencing
BF nd (Next) Generation Sequencing
Sequence Analysis - RNA-Seq 2
Genomic & RNA Profiling Core Facility
Toward Accurate and Quantitative Comparative Metagenomics

RNA-Seq Data Analysis UND Genomics Core.
Presentation transcript:

Short Read Sequencing Analysis Workshop Day 1 Considerations for Sequencing

Different types of sequencing libraries Whole genome sequencing RNA Sequencing/GRO-Seq ChIP-seq DNAse 1, ATAC-seq Exome sequencing Methyl-Seq Metagenomic/Amplicon (low diversity)

Platform Comparsion

Platform Comparison Platform Comparison MiniSeq MiSeq NextSeq HiSeq 2500 HiSeq 3000/4000 HiSeq X Output per run 1.65Gb – 7.5Gb 0.5Gb – 15Gb 16Gb – 120Gb 9Gb – 500Gb 105Gb – 750Gb 800Gb – 900 Gb Reads per run 7M – 25M 12M – 25M 130M – 400M 300M – 4B 2.1M – 2.5B 2.6B – 3B Max read length 2 x 150 2 x 300 2 x 250 Time per run 7h – 24h 5h – 56h 11h – 30h 7h – 6d 1d – 3.5d <3d 2 color/4color 2 color 4 color Flowcell PE SR / PE Pattern Samples/FC 1 2 or 8 8

How does Illumina sequencing work? Library generation and affixing library to flow cell http://bitesizebio.com/13546/sequencing-by-synthesis-explaining-the-illumina-sequencing-technology/

How does Illumina sequencing work? Cluster Generation

How does Illumina sequencing work? Sequencing by synthesis with reversible terminators

How does Illumina sequencing work?

Output: Millions of short read sequences ATCGACGGTTAACTGATCG… ATGCGTGCTGCAGTGCCAC… CGTGGACCAAATGGCACAT… CTGTGAAACAATTGGGGAT… Index Read 1 (i7) TCAGTGCT ACGTTCTA TCAGTGGG CTCGGCGA ACGTTCTC Index Read 2 (i5) ACGTTCAT CAACGTTC ATTCAGTG GCCTCGGC Read 2 CTGGTGACAACTGATGCTT… TGACCATTGGGTACAACCC… CCAGTGAACGTGAGCAAGT… GGTTGACCATTGGGGTGAC…

Current Illumina kits allow up to 384 unique indexes to be pooled Demultiplexing Read 1 ATCGACGGTTAACTGATCG… ATGCGTGCTGCAGTGCCAC… CGTGGACCAAATGGCACAT… CTGTGAAACAATTGGGGAT… Index Read 1 (i7) TCAGTGCT ACGTTCTA TCAGTGGG CTCGGCGA ACGTTCTC Index Read 2 (i5) ACGTTCAT CAACGTTC ATTCAGTG GCCTCGGC Read 2 CTGGTGACAACTGATGCTT… TGACCATTGGGTACAACCC… CCAGTGAACGTGAGCAAGT… GGTTGACCATTGGGGTGAC… Current Illumina kits allow up to 384 unique indexes to be pooled

Demultiplexing Read 1 ATCGACGGTTAACTGATCG… ATGCGTGCTGCAGTGCCAC… CGTGGACCAAATGGCACAT… CTGTGAAACAATTGGGGAT… Index Read 1 (i7) TCAGTGCT ACGTTCTA TCAGTGGG CTCGGCGA Index Read 2 (i5) ACGTTCAT CAACGTTC ATTCAGTG GCCTCGGC Read 2 CTGGTGACAACTGATGCTT… TGACCATTGGGTACAACCC… CCAGTGAACGTGAGCAAGT… GGTTGACCATTGGGGTGAC… Sample 1 Read 1 Read 2 ATCGACGGTTAACTGATCG… CTGGTGACAACTGATGCTT… CGTGGACCAAATGGCACAT… CCAGTGAACGTGAGCAAGT… Sample 3 Read 1 Read 2 CTGTGAAACAATTGGGGAT… GGTTGACCATTGGGGTGAC… Sample 2 Read 1 Read 2 ATGCGTGCTGCAGTGCCAC… TGACCATTGGGTACAACCC…

What to do with the data? Short Read Sequencing Quality Metrics & Trimming Assembly Align to reference genome Variant Calling Expression/Read Depth Alternative splicing Peak/Region identification Metagenomics

Quality Assessment & Trimming Pinpoint problems with library prep/sequencing Identify possible biases Improve mapping through trimming

Align to reference genome Chr1 1000-2500 Sample 1 reads Sample 2 reads Sample 3 reads Bowtie2 Tophat2 BWA

Variant Calling Reference Chr1 1000-2500 A C C C C C C

Differential Expression Reference Chr1 1000-2500

Alternative Splicing

Peak/Region identification Reference Chr1 1000-2500 Peak

Experimental Design considerations Genome Size Read Length Sequencing Depth # of Replicates Single-end vs. Paired-end Insert Size

Coverage & Read-depth Coverage = estimate of average number of reads covering a single base Avg Coverage = (# reads) x (read length) size of genome Reference Depth D E P T H

Typical Coverage Requirements DNA-Resequencing (SNPs, small indels) 30X with paired-end reads De novo DNA-Seq 100X minimum, longest paired-end, multiple insert size runs Exome 100-200X of the exome

What that means in reads... 30X Coverage with 2 x 150 bp reads For E. coli, ~4.6 Mb 138 Mbp, 0.46 Million reads ~3% of a MiSeq run For Human, ~3.2 Gb 96 Gbp, 320 Million reads 80% of a NextSeq High Output run or 1.3 lanes of HiSeq 2500 run

RNA-Seq Requirements Can’t use coverage as a measure Differential Expression (highly expressed) Small genomes: 5 Million reads Large genomes: 10-30 Million reads De novo Assembly/DE (lowly expressed) Small genomes: 30-65 Million reads Large genomes: 100-200 Million reads ***For RNA-Seq, replicates typically more powerful than read depth, read length

Which Sequencer should I use? MiSeq 15-25 M reads/run 8h – 4 days/run 1x50 to 2x300 $$$/bp NextSeq 130-400 M reads/run 12 – 30 h/run 1x75 to 2x150 $$/bp HiSeq 2500 250 M reads/lane, 8 lanes/run 7h – 3 d/run 1x36 to 2x125 $$/bp HiSeq 4000 312 M reads/lane, 8 lanes/run 1 – 3.5 d/run 1x50 to 2x150 $/bp HiSeq X Ten 350 M reads/lane, 8 lanes/run 3 d/run 2x150 $/bp BUT minimums on orders

Other considerations Base diversity (at each position) Custom versus kitted libraries – kit biases PCR/PCR-free libraries How unique is the run-type you want Queue times/Data delivery times Many more....

Questions?