RNA-Seq Data Analysis UND Genomics Core.

Slides:



Advertisements
Similar presentations
IMGS 2012 Bioinformatics Workshop: RNA Seq using Galaxy
Advertisements

RNAseq analysis Bioinformatics Analysis Team
RNA-seq data analysis Project
RNA-seq Analysis in Galaxy
Bacterial Genome Assembly | Victor Jongeneel Radhika S. Khetani
Before we start: Align sequence reads to the reference genome
NGS Analysis Using Galaxy
Mapping NGS sequences to a reference genome. Why? Resequencing studies (DNA) – Structural variation – SNP identification RNAseq – Mapping transcripts.
Introduction to RNA-Seq and Transcriptome Analysis
Li and Dewey BMC Bioinformatics 2011, 12:323
Expression Analysis of RNA-seq Data
Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING.
Transcriptome analysis With a reference – Challenging due to size and complexity of datasets – Many tools available, driven by biomedical research – GATK.
Introduction to Short Read Sequencing Analysis
RNAseq analyses -- methods
Next Generation DNA Sequencing
RNA-Seq Analysis Simon V4.1.
Transcriptome Analysis
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
Introduction to RNA-Seq
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop RNA-Seq using the Discovery Environment And COGE.
RNA-Seq Assembly 转录组拼接 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日.
IPlant Collaborative Discovery Environment RNA-seq Basic Analysis Log in with your iPlant ID; three orange icons.
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
RNA-Seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis is doing the.
Introduction to RNAseq
The iPlant Collaborative
The iPlant Collaborative
No reference available
Manuel Holtgrewe Algorithmic Bioinformatics, Department of Mathematics and Computer Science PMSB Project: RNA-Seq Read Simulation.
Moderní metody analýzy genomu - analýza Mgr. Nikola Tom Brno,
Introduction to Exome Analysis in Galaxy Carol Bult, Ph.D. Professor Deputy Director, JAX Cancer Center Short Course Bioinformatics Workshops 2014 Disclaimer…I.
Canadian Bioinformatics Workshops
Overview of Genomics Workflows
RNA Seq Analysis Aaron Odell June 17 th Mapping Strategy A few questions you’ll want to ask about your data… - What organism is the data from? -
Introductory RNA-seq Transcriptome Profiling of the hy5 mutation in Arabidopsis thaliana.
Canadian Bioinformatics Workshops
RNA-Seq with the Tuxedo Suite Monica Britton, Ph.D. Sr. Bioinformatics Analyst September 2015 Workshop.
Simon v RNA-Seq Analysis Simon v
Introductory RNA-seq Transcriptome Profiling
NGS File formats Raw data from various vendors => various formats
GCC Workshop 9 RNA-Seq with Galaxy
An Introduction to RNA-Seq Data and Differential Expression Tools in R
Placental Bioinformatics
Cancer Genomics Core Lab
RNA-Seq Green Line Overview
WS9: RNA-Seq Analysis with Galaxy (non-model organism )
RNA Sequencing Day 7 Wooohoooo!
Dr. Christoph W. Sensen und Dr. Jung Soh Trieste Course 2017
RNA-Seq Software, Tools, and Workflows
Short Read Sequencing Analysis Workshop
RNA-Seq analysis in R (Bioconductor)
S1 Supporting information Bioinformatic workflow and quality of the metrics Number of slides: 10.
Canadian Bioinformatics Workshops
Introductory RNA-Seq Transcriptome Profiling
Outline Overview of RNA-Seq Quality control and read trimming Mapping RNA-Seq reads Transcriptome assembly Additional training resources on RNA-Seq.
Kallisto: near-optimal RNA seq quantification tool
Transcriptome Assembly
Sequence Analysis 2- RNA-Seq
Learning to count: quantifying signal
Maximize read usage through mapping strategies
Additional file 2: RNA-Seq data analysis pipeline
Alignment of Next-Generation Sequencing Data
Sequence Analysis - RNA-Seq 2
BF528 - Sequence Analysis Fundamentals
Transcriptomics – towards RNASeq – part III
Introduction to RNA-Seq & Transcriptome Analysis
Differential Expression of RNA-Seq Data

The Variant Call Format
Presentation transcript:

RNA-Seq Data Analysis UND Genomics Core

RNA-Seq Analysis Pipeline Adapter Trimming Quantification Sequence Quality Control Assessment Alignment Differential expression

Software For RNA-Seq Analysis Step Software Option Sequence Quality Asesement FastQC Adapter Trimming Trim_galore FastX Cutadapt Trimmomatic Scythe Alignment Hisat2 TopHat STAR Quantification FeatureCounts Stringtie HTSeq-Count Cufflinks Differential Expression DESeq2 Ballgown edgeR CuffDiff DEXSeq NOISeq

Fastq files Contains sequence and quality information HWI-D00635 Machine id 65 Run id C7U1DANXX Flow cell id 7 Lane number 1101 Tile number 1448 X coord 1950 Y coord 1 1st in pair N Not filtered Control bit GCAAT index Fastq files are the sequencing files the core gets back from Novegene

Q-score Q-score is a metric to assess the accuracy of sequencing Relates to the probability of a wrong base call via logarithmic function. -10log10(P) Q-score Error rate Accuracy 40 1/10,000 99.99% 30 1/1000 99.9% 20 1/100 99% 10 1/10 90% How do you know your sequencing worked?

Sequence Quality Assessment Fastqc –o FastQC *fastq.gz Good data Bad data Who can you summarize millions of FastQC will give you a series of plots to assess the quality of your Sequecing data. FastqC was developed for whole genome sequencing data, and not all of the plots and warnings are aplicalble to RNA-seq- Illumina drop

Sequence Quality Assessement Q-scores based with respect to the location on flow cell Ideally should be all blue

Sequence Quality Assessement Common for RNA-Seq Not all FastQC warnings apply

Adapter contamination cDNA Read1 Mention pair aware. Example of output, be aware of which ones take gzipped files

Adapter Trimming Trim galore Tries to automatically detect adapter For Illumina adapters, the first 13 bases of the Illumina indexed adapters What trim galore matches cDNA

RNA-Seq Analysis Pipeline Adapter Trimming Quantification Sequence Quality Assessment Alignment Differential expression

Alignment for RNA-Seq For eukaryotic genomes, splice- aware aligners can align reads across exon – intron boundaries. Spliced read

Splice – Aware Alignment Programs Tophat - https://ccb.jhu.edu/software/tophat/index.shtml Hisat2 - https://ccb.jhu.edu/software/hisat2/manual.shtml STAR - https://github.com/alexdobin/STAR

Non Splice-aware Aligners Bowtie http://bowtie-bio.sourceforge.net/index.shtml Bowtie2 http://bowtie-bio.sourceforge.net/bowtie2/index.shtml BWA http://bio-bwa.sourceforge.net/

Reference Files for Alignment fasta gtf (gene transfer format) Where to find reference files? UCSC http://hgdownload.cse.ucsc.edu/downloads.html Ensembl https://uswest.ensembl.org/downloads.html) iGenomes https://support.illumina.com/sequencing/sequencing_software/igenome.html Also location on bart and buddy

More File Formats Fasta Indexes (*ht.1) Binary version of the genome which allows for quick reading of the genome

GTF file Column Description seqname chromosome source source of annotation feature type of feature, ex exon start start of feature end end of feature score Confidence of assembled transcript strand strand of feature frame frame of feature relative to start of coding sequence attributes names of feature

Alignment Output Sam (Sequence alignment format) Bam (binary sam file) Good alignment %

RNA-Seq Analysis Pipeline Adapter Trimming Quantification Sequence Quality Assessment Alignment Differential expression

Counting Reads Count reads that unambiguously align to a gene Programs that will count reads: FeatureCounts HTSeq Gene A Gene B Read Gene A Gene B Read

Transcriptome assembly Assemble a map of the genes in the transcriptome using the reads in present sample Can be guided with a gtf, if using a well –annotated genome Assembles transcriptome and estimate the abundance of transcript at the same time Programs that will assemble transcriptomes: Cufflinks Stringtie

Your Turn! Work through a RNA-Seq analysis described in the file Day2_HandsOn.docx in the Day2 workshop material folder