SOLiD Sequencing & Data

Slides:



Advertisements
Similar presentations
Next-Generation Sequencing: Methodology and Application
Advertisements

IMGS 2012 Bioinformatics Workshop: File Formats for Next Gen Sequence Analysis.
RNAseq Library Preparation and ANAlysis basics
High-Throughput Sequencing Technologies
Microbiome Analysis from sample to data MGL Users Group June 18, 2014.
Introduction to Short Read Sequencing Analysis
Next-generation sequencing and PBRC. Next Generation Sequencer Applications DeNovo Sequencing Resequencing, Comparative Genomics Global SNP Analysis Gene.
The SOLiD System: Next-Generation Sequencing Overview of the SOLiD System –  Scalable  Accurate Ultra High Throughput  Flexible  Mate Pairs.
RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.
High Throughput Sequencing
Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers
CS 6293 Advanced Topics: Current Bioinformatics
Diabetes and Endocrinology Research Center The BCM Microarray Core Facility: Closing the Next Generation Gap Alina Raza 1, Mylinh Hoang 1, Gayan De Silva.
Before we start: Align sequence reads to the reference genome
NGS Analysis Using Galaxy
Whole Exome Sequencing for Variant Discovery and Prioritisation
Finishing the Human Genome
Analyzing your clone 1) FISH 2) “Restriction mapping” 3) Southern analysis : DNA 4) Northern analysis: RNA tells size tells which tissues or conditions.
Capture / Resequencing Data Handling and Analysis
Mapping protein-DNA interactions by ChIP-seq Zsolt Szilagyi Institute of Biomedicine.
High Throughput Sequencing Methods and Concepts
Library Preparation Application dependant, using standard molecular biological techniques. Fragment library oligo kit: (per library)$35 GeneAmp dNTP blend:
Introduction to Short Read Sequencing Analysis
File formats Wrapping your data in the right package Deanna M. Church
Massive Parallel Sequencing
High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.
Next Generation DNA Sequencing
Quick introduction to genomic file types Preliminary quality control (lab)
Stratton Nature 45: 719, 2009 Evolution of DNA sequencing technologies to present day DNA SEQUENCING & ASSEMBLY.
HaloPlexHS Get to Know Your DNA. Every Single Fragment.
Next Generation Sequencing
Taqman Technology and Its Application to Epidemiology Yuko You, M.S., Ph.D. EPI 243, May 15 th, 2008.
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
Tutorial 6 High Throughput Sequencing. HTS tools and analysis Review of resequencing pipeline Visualization - IGV Analysis platform – Galaxy Tuning up.
 CHANGE!! MGL Users Group meetings will now be on the 1 st Monday of each month 3:00-4:00 Room Note the change of time and room.
Sequence File Formats.
No reference available
Moderní metody analýzy genomu - analýza Mgr. Nikola Tom Brno,
A brief guide to sequencing Dr Gavin Band Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for Health.
Canadian Bioinformatics Workshops
Library QA & QC Day 1, Video 3
Introduction to Illumina Sequencing
From Reads to Results Exome-seq analysis at CCBR
DNA Sequencing First generation techniques
Next-generation sequencing technology
Research Techniques Made Simple: Next-Generation Sequencing:
DNA Sequencing Second generation techniques
Day 5 Mapping and Visualization
Lesson: Sequence processing
Next generation sequencing
Cancer Genomics Core Lab
Next Generation Sequencing Analysis
Sequencing technologies
Next-generation sequencing technology
Invest. Ophthalmol. Vis. Sci ;57(10): doi: /iovs Figure Legend:
Introduction to RAD Acropora millepora.
SVM 2FG.
Comparison of Clinical Targeted Next-Generation Sequence Data from Formalin-Fixed and Fresh-Frozen Tissue Specimens  David H. Spencer, Jennifer K. Sehn,
2nd (Next) Generation Sequencing
High-throughput sequencing techniques
ChIP-Seq Data Processing and QC
The impact of next-generation sequencing technology on genetics
Massively Parallel Sequencing: The Next Big Thing in Genetic Medicine
BF nd (Next) Generation Sequencing
Volume 63, Issue 6, Pages (September 2016)
Canadian Bioinformatics Workshops
(Top) Construction of synthetic long read clouds with 10× Genomics technology. (Top) Construction of synthetic long read clouds with 10× Genomics technology.
BF528 - Sequence Analysis Fundamentals
RNA-Seq Data Analysis UND Genomics Core.
The Variant Call Format
Presentation transcript:

SOLiD Sequencing & Data

Overview Uses for the SOLiD system Starting Material -> Final Library Material Bead Preparation & Deposition (Slide Overview) Sequencing Process (‘Colorspace’ vs Basecalls) Data Formats & Derivative Data Overview Future Topics

Uses for SOLiD: Anything where a reference is available Resequencing SNP or Indel studies Exome or other Capture Whole Genome Abundance studies Transcriptome RNAseq Ribosomal Profiling Microbiome Small RNAs (miR or other) ChIP-Seq / RIP-Seq NOT suitable for deNovo sequencing (Assembly or unknowns) Technically it is possible, but other platforms would likely give FAR better results

Regardless of starting material, we sequence DNA fragments Regardless of starting material, we (or you) prepare a short DNA fragment library derived from it. Longer polynucleotides are generally sheared to smaller size Covaris or enzymatic digestion May depend on application! Getting specific ends is important to some applications (ChIP, Protections, etc) Mate Libraries may also be prepared where we want to sequence the ends of very large fragments RNA gets reverse transcribed to DNA Adapter sequences are added on in the process As extendable ligated stranded RT primers for RNA, or post shear/cleanup ligation for DNA fragments. CRITICAL: Adapter cleanup post ligation! This is a very common major contaminant in poorer library preparations

The Generic Derived Library Libraries have two end sequences used for both PCR and sequencing priming. “P1” is the universal Forward primer sequence. Secondary “P2” may have an embedded barcode sequence where applicable. Between the two adapter ends we have the DNA which will be sequenced from any combination of forward, reverse, and/or Barcode regions (green arrows). Note: Adapter sequences DIFFER from Illumina if other preparations are to be adapted to this platform.

Bead Preparation from Libraries A library or pool of libraries is subjected to emulsion PCR to populate beads Titrated oil micro-reactors such that each bead is populated by a single template. Unpopulated beads are removed in subsequent cleanup.

Slide Deposition of enriched beads Beads are prepared and flowed / adhered in the flowcell lanes. Low loading: little data Overloading: Unable to resolve single beads

Instrument Run Identifies single spots in each lane to track for signal. Camera images 708 “panels” on each lane

Colorspace “Colorspace” refers to the two-nucleotide encoding used by SOLiD. Tiled 5-bp steps with resets.

Colorspace 5-bp steps with resets. Di-nucleotide reads result in redundancy in calls In practice this translates to a slightly higher accuracy in mutation calls Resets in extensions means mis/non-incorporation or a bad cycle does not kill a read. It also allows cycles to be targeted to be repeated without rerunning everything. Drawback: resulting sequence is encoded in “colorspace” dinucleotide calls. Must use colorspace aligners for the data as-is (Lifescope) Possible to use an additional 3bp tiled reading cycle set to disambiguate and produce base-calls. (ECC) Possible to use the first base knowledge to walk a base sequence out, but any poor read anywhere will then cause a cascade of subsequent errors, better to use colorspace algorithms.

Data we get Data is by default in “XSQ” format A binary file/not human readable. Possible to export to ‘CSFASTA’ & ‘CSQUAL’ files which is in combination similar to FASTQ from Illumina. Some additional meta information is lost when doing so. Lifescope is the only existing aligner for XSQ data. CSFASTA: (Read ID, then Color calls [0-3 for the 4 dyes]. CSQUAL has quality scores for each read similarly) >600_50_31_F3 T2222002113300322132112231 >600_50_63_F3 T2330133212130133221033110 >600_50_100_F3 T0130001131012310201000101 FASTQ: (Read ID, then sequence, then a repeated sequence ID line, then quality scores for the read) @SEQ_ID GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT + !''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65

Call Quality scores Call qualities are in ASCII and represent phred-scale scores. Depending on platform these have historically varied. (Basically, a log scale error probability)

Aligned data format BAM is the most common form of aligned sequencing data. This is a binary version of a SAM file. SAM are text/human readable, BAM is not. BAM files are highly compressed & index-able / optimized for rapid access of reads anywhere within. You don’t have to read the whole file if you want to look for reads at a gene in the middle of chromosome 7, for example. BAM files are supported by most genomic viewers. I suggest using IGV to visualize your BAM files.

IGV screenshot

Variant Call Format (VCF) Mutations are typically reported in VCF format. This is a tab-delimited text format (Human Readable). Many programs interpret this format. Varsifter will crunch the data for you in a filterable format. One line per mutation location. Position (chromosome, nt position), Reference base identity, Observed mutation identity, and quality data regarding that call per sample in the VCF file.