Dr. Christoph W. Sensen und Dr. Jung Soh Trieste Course 2017

Slides:



Advertisements
Similar presentations
RNA-Seq as a Discovery Tool
Advertisements

RNA-seq library prep introduction
An Introduction to Studying Expression Data Through RNA-seq
Vanderbilt Center for Quantitative Sciences Summer Institute Sequencing Analysis Yan Guo.
IMGS 2012 Bioinformatics Workshop: RNA Seq using Galaxy
RNAseq.
12/04/2017 RNA seq (I) Edouard Severing.
Processing of miRNA samples and primary data analysis
Peter Tsai Bioinformatics Institute, University of Auckland
RNA-seq: the future of transcriptomics ……. ?
RNAseq analysis Bioinformatics Analysis Team
Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520
Transcriptomics Jim Noonan GENE 760.
MCB Lecture #21 Nov 20/14 Prokaryote RNAseq.
RNA-seq Analysis in Galaxy
mRNA-Seq: methods and applications
Before we start: Align sequence reads to the reference genome
RNA-Seq and RNA Structure Prediction
Li and Dewey BMC Bioinformatics 2011, 12:323
A cell and its population of genes :. DNA forms double strands by a process called hybridization:
Expression Analysis of RNA-seq Data
Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING.
Transcriptome analysis With a reference – Challenging due to size and complexity of datasets – Many tools available, driven by biomedical research – GATK.
RNAseq analyses -- methods
Genomics and High Throughput Sequencing Technologies: Applications Jim Noonan Department of Genetics.
Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome.
Next Generation DNA Sequencing
Schedule change Day 2: AM - Introduction to RNA-Seq (and a touch of miRNA-Seq) Day 2: PM - RNA-Seq practical (Tophat + Cuffdiff pipeline on Galaxy) Day.
Transcriptome Analysis
RNA-seq workshop ALIGNMENT
The iPlant Collaborative
1 Transcript modeling Brent lab. 2 Overview Of Entertainment  Gene prediction Jeltje van Baren  Improving gene prediction with tiling arrays Aaron Tenney.
Tag profiling is dead... October 2009 Claudia Voelckel Patrick Biggs...long live mRNA-Seq!
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
Introduction to RNAseq
RNA-seq: Quantifying the Transcriptome
TOX680 Unveiling the Transcriptome using RNA-seq Jinze Liu.
No reference available
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Reliable Identification of Genomic Variants from RNA-seq Data Robert Piskol, Gokul Ramaswami, Jin Billy Li PRESENTED BY GAYATHRI RAJAN VINEELA GANGALAPUDI.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
From Reads to Results Exome-seq analysis at CCBR
Arrays How do they work ? What are they ?. WT Dwarf Transgenic Other species Arrays are inverted Northerns: Extract target RNA YFG Label probe + hybridise.
RNA-Seq with the Tuxedo Suite Monica Britton, Ph.D. Sr. Bioinformatics Analyst September 2015 Workshop.
Transcriptomics History and practice.
RNA-Seq Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on
Canadian Bioinformatics Workshops
Next generation sequencing
An Introduction to RNA-Seq Data and Differential Expression Tools in R
RNA-Seq for the Next Generation RNA-Seq Intro Slides
Cancer Genomics Core Lab
Gene expression from RNA-Seq
RNA-Seq analysis in R (Bioconductor)
S1 Supporting information Bioinformatic workflow and quality of the metrics Number of slides: 10.
High-Throughput Analysis of Genomic Data [S7] ENRIQUE BLANCO
Canadian Bioinformatics Workshops
Gene expression estimation from RNA-Seq data
From: TopHat: discovering splice junctions with RNA-Seq
Transcriptomics History and practice.
RNA sequencing (RNA-Seq) and its application in ovarian cancer
Next-generation DNA sequencing
Sequence Analysis - RNA-Seq 2
Schematic representation of a transcriptomic evaluation approach.
RNA-Seq Data Analysis UND Genomics Core.
Presentation transcript:

Dr. Christoph W. Sensen und Dr. Jung Soh Trieste Course 2017 RNA-Seq Dr. Christoph W. Sensen und Dr. Jung Soh Trieste Course 2017

What is RNA-Seq? An experimental protocol that uses next-generation sequencing technologies to sequence the messenger RNA molecules within a biological sample in an effort to determine the primary sequence and relative abundance of each mRNA Martin JA, Wang Z (2011) Next-generation transcriptome assembly. Nat Rev Genet. 12(10):671-682 Also known as “Whole Transcriptome Shotgun Sequencing” (WTSS)

Sequencing strategy Metabolite profiling Plant material combination of ½-plate of 454 and 1 lane of 108PE Illumina sequencing excellent depth and coverage high-quality assemblies submission of total RNA samples improves quality control takes better advantage of sequencing facilities similar overall cost 76SE Illumina sequencing on selected species for comparative transcriptomics Plant material Biochemistry PIs Total RNA extraction Bioanalyzer (RNA quality) mRNA isolation cDNA libraries Genome Québec Innovation Centre 454 (1/2-plate) Illumina 1 lane 108PE Reference transcriptomes (75) repeat sequencing in rare cases of low-quality initial output Bioinformatics Innovation Centre Bioinformatics

RNA-Seq workflow intron Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 10(1):57-63.

RNA-Seq vs. microarray Characteristics RNA-Seq Microarray Which transcripts? All in a sample Only those for which probes are designed Transcript sequence generation Yes No Low-abundance transcript detection Limited Abundance info source Count (of the reads aligned to gene) Fluorescence level (of the probe spot for gene) Resolution Base Probe sequence Background noise Low High Additional info Alternative splicing, transcriptome-level variation

RNA-Seq data analysis Map reads Bin reads to features Normalize counts Lots of short reads Reference genome Map reads Table of mapped loci per read Feature annotation (exons, genes, transcripts) Bin reads to features Table of counts per feature Usually combined in a tool Normalize counts Table of normalized quantification values per feature Detect differentially expressed (DE) features DE features

Mapping reads Need a reference genome Issues Huge amounts of data Reads spanning across exon junction Alternative splicing Reads mapping to multiple locations in the genome Huge amounts of data Most common mapping results format SAM: sequence alignment/map BAM: binary format of SAM Many tools Bowtie, SOAP, BWA, SHRiMP, mrFAST, mrsFAST, ZOOM, SSAHA2, Mosaik

Bowtie

Binning reads Need annotated features Exons, genes, transcripts For each feature, the total number of reads mapped is produced Not directly comparable across features/samples yet Usually followed by normalization

Normalizing counts Why normalize? RPKM is most frequently used Longer features have more reads mapped Deeper sequencing produces more reads RPKM is most frequently used Reads Per Kilobase per Million reads Defined as C/(LN) C = number of reads mapped to a feature L = length of the feature (in kilobases) N = total number of reads from the sample (in millions)

RPKM examples http://jura.wi.mit.edu/bio/education/hot_topics/RNAseq/RNA_Seq.pdf

Gene model predicted for fungus Trametes versicolor using Augustus and RNA-seq hints Above is a screenshot of Gbrowse instance for fungal species Trametes versicolor for Genozymes project. Project is sequencing both DNA and transcriptome (RNA-seq) and COE is responsible for annotation. Example of gene predicted using ab intio predictor Augustus (Confident models) using hints from RNA-seq to check accuracy of prediction - Hints are built from short-read alignment of Illumina RNA-seq spliced reads onto the genome (Mapped Reads) - Splice reads show direct evidence of introns (next slide) - Hints are used with ab initio predictors (Augustus) during training and prediction stages

Splice Variants

“non-coding” RNA molecules LincRNA-p21 Tran et al., In press

MIRA Assembly Contig: T_rep_c1201 Read members: 96 Length: 2429 bp Example MIRA Assembly Contig: T_rep_c1201 Read members: 96 Length: 2429 bp Combined Assembly T_rep_c1201 is part of a 6 member contig 2 are partial transcripts assembled by PTA

Detecting Differential Expression Compare quantification values across samples or across features Most tools summarize/normalize counts and suggest DE features Cufflinks/Cuffdiff, R packages (DESeq, edgeR, baySeq, TSPM), SAMtools DE features go through similar analysis to microarray data analysis (e.g. validation)

Cufflinks

Cufflinks Tutorial https://docs.google.com/document/d/1t1gi2Djxd0ykMVe2bF8BVOBsOsPngjFh2999u3rZq-A/edit?hl=en&authkey=CKL1i8sD#

Anaerobic biocorrosion in reactors filled with WP-LS medium

SSV1 Replication Cycle (UV Induced)