The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop RNA-Seq using the Discovery Environment And COGE.

Slides:



Advertisements
Similar presentations
IMGS 2012 Bioinformatics Workshop: RNA Seq using Galaxy
Advertisements

DEG Mi-kyoung Seo.
RNAseq analysis Bioinformatics Analysis Team
Introduction To Next Generation Sequencing (NGS) Data Analysis
RNA-seq Analysis in Galaxy
RNA-Seq data analysis Qi Liu Department of Biomedical Informatics
Bacterial Genome Assembly | Victor Jongeneel Radhika S. Khetani
Before we start: Align sequence reads to the reference genome
NGS Analysis Using Galaxy
DNA Subway Green Line Overview. Growth of Sequence Read Archive (SRA) 2.2 Quadrillion bases Log Scale!
An Introduction to RNA-Seq Transcriptome Profiling with iPlant
RNA-Seq Visualization
Introduction to RNA-Seq and Transcriptome Analysis
Expression Analysis of RNA-seq Data
Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING.
Transcriptome analysis With a reference – Challenging due to size and complexity of datasets – Many tools available, driven by biomedical research – GATK.
BIF Group Project Group (A)rabidopsis: David Nieuwenhuijse Matthew Price Qianqian Zhang Thijs Slijkhuis Species: C. Elegans Project: Advanced.
RNAseq analyses -- methods
Introduction to RNA-Seq & Transcriptome Analysis
TopHat Mi-kyoung Seo. Today’s paper..TopHat Cole Trapnell at the University of Washington's Department of Genome Sciences Steven Salzberg Center.
Transcriptome Analysis
RNA-Seq in Galaxy Igor Makunin QAAFI, Internal Workshop, April 17, 2015.
An Introduction to RNA-Seq Transcriptome Profiling with iPlant.
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
Introduction to RNA-Seq
Introduction To Next Generation Sequencing (NGS) Data Analysis
The iPlant Collaborative
IPlant Collaborative Discovery Environment RNA-seq Basic Analysis Log in with your iPlant ID; three orange icons.
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
RNA-Seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis is doing the.
Introduction to RNAseq
Build an Automated Workflow Visual Workflow Creator Discovery Environment.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop RNA-Seq visualization with cummeRbund.
The iPlant Collaborative
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of the iPlant Discovery Environment.
An Introduction to RNA-Seq Transcriptome Profiling with iPlant (
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop BISQUE.
The iPlant Collaborative
No reference available
RNA-Seq in Galaxy Igor Makunin DI/TRI, March 9, 2015.
Short read alignment BNFO 601. Short read alignment Input: –Reads: short DNA sequences (upto a few hundred base pairs (bp)) produced by a sequencing machine.
Moderní metody analýzy genomu - analýza Mgr. Nikola Tom Brno,
Objectives Genome-wide investigation – to estimate alternate Poly-Adenylation (APA) usage on 3’UTR – to identify polymorphism of Downstream Sequence Elements.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of the iPlant Discovery Environment.
Case study: Saccharomyces cerevisiae grown under two different conditions RNAseq data plataform: Illumina Goal: Generate a platform where the user will.
RNA-Seq visualization with CummeRbund
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Overview of Genomics Workflows
RNA Seq Analysis Aaron Odell June 17 th Mapping Strategy A few questions you’ll want to ask about your data… - What organism is the data from? -
Introductory RNA-seq Transcriptome Profiling of the hy5 mutation in Arabidopsis thaliana.
Transforming Science Through Data-driven Discovery Workshop Overview Ohio State University MCIC Jason Williams – Lead, CyVerse – Education, Outreach, Training.
Canadian Bioinformatics Workshops
Case study: Saccharomyces cerevisiae grown under two different conditions RNAseq data plataform: Illumina Goal: Generate a platform where the user will.
RNA-Seq with the Tuxedo Suite Monica Britton, Ph.D. Sr. Bioinformatics Analyst September 2015 Workshop.
Introductory RNA-seq Transcriptome Profiling
GCC Workshop 9 RNA-Seq with Galaxy
Cancer Genomics Core Lab
WS9: RNA-Seq Analysis with Galaxy (non-model organism )
Dr. Christoph W. Sensen und Dr. Jung Soh Trieste Course 2017
Advanced Bioinformatics
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Introductory RNA-Seq Transcriptome Profiling
Introduction To Next Generation Sequencing (NGS) Data Analysis
Additional file 2: RNA-Seq data analysis pipeline
Transcriptomics – towards RNASeq – part III
RNA-Seq Data Analysis UND Genomics Core.
Presentation transcript:

The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop RNA-Seq using the Discovery Environment And COGE

What is RNA-Seq?

Gene-Expression studies by sequencing Reverse-Transcribed RNA

What is RNA-Seq? Gene-Expression studies by sequencing Reverse-Transcribed RNA Getting Started….

First --- What if you have a question?

Hint: What if you have a question about anything??

Starting an RNA-Seq Project

Sequencing Illumina Ion Torrent 454 PacBio

So your reads are ready… You’ve uploaded the sequencing files to the iPlant Data Store What’s next? What are the steps for RNA-Seq?

RNA-Seq Conceptual Overview Image source:

The entire RNA-Seq analysis method… Read analysis and cleanup! Map the reads to the genome (if you have a genome sequence ) Assemble the reads into transcripts Map the transcripts to the genome (if you have a genome sequence ) Annotate the transcripts (or wait to later) Map the reads to the genome or directly to the transcripts Count the number of hits per transcript or gene for each condition Analyze counts for different conditions to determine differential expression Then you start thinking more about the Gene Ontology, what types or genes or transcripts are differentially expressed – biology!!

Examining Data Quality with FastQC

RNA-Seq HWUSI-EAS455:3:1:1:1096 length=41 CAAGGCCCGGGAACGAATTCACCGCCGTATGGCTGACCGGC HWUSI-EAS455:3:1:2:1592 length=41 GAGGCGTTGACGGGAAAAGGGATATTAGCTCAGCTGAATCT + @SRR HWUSI-EAS455:3:1:2:869 length=41 TGCCAGTAGTCATATGCTTGTCTCAAAGATTAAGCCATGCA + HWUSI-EAS455:3:1:4:1075 length=41 CAGTAGTTGAGCTCCATGCGAAATAGACTAGTTGGTACCAC HWUSI-EAS455:3:1:5:238 length=41 AAAAGGGTAAAAGCTCGTTTGATTCTTATTTTCAGTACGAA + @SRR HWUSI-EAS455:3:1:5:1871 length=41 GTCATATGCTTGTCTCAAAGATTAAGCCATGCATGTGTAAG HWUSI-EAS455:3:1:5:1981 length=41 GAACAACAAAACCTATCCTTAACGGGATGGTACTCACTTTC + : …Now What?

$ tophat -p 8 -G genes.gtf -o C1_R1_thout genome C1_R1_1.fq C1_R1_2.fq $ tophat -p 8 -G genes.gtf -o C1_R2_thout genome C1_R2_1.fq C1_R2_2.fq $ tophat -p 8 -G genes.gtf -o C1_R3_thout genome C1_R3_1.fq C1_R3_2.fq $ tophat -p 8 -G genes.gtf -o C2_R1_thout genome C2_R1_1.fq C1_R1_2.fq $ tophat -p 8 -G genes.gtf -o C2_R2_thout genome C2_R2_1.fq C1_R2_2.fq $ tophat -p 8 -G genes.gtf -o C2_R3_thout genome C2_R3_1.fq C1_R3_2.fq $ cufflinks -p 8 -o C1_R1_clout C1_R1_thout/accepted_hits.bam $ cufflinks -p 8 -o C1_R2_clout C1_R2_thout/accepted_hits.bam $ cufflinks -p 8 -o C1_R3_clout C1_R3_thout/accepted_hits.bam $ cufflinks -p 8 -o C2_R1_clout C2_R1_thout/accepted_hits.bam $ cufflinks -p 8 -o C2_R2_clout C2_R2_thout/accepted_hits.bam $ cufflinks -p 8 -o C2_R3_clout C2_R3_thout/accepted_hits.bam $ cuffmerge -g genes.gtf -s genome.fa -p 8 assemblies.txt $ cuffdiff -o diff_out -b genome.fa -p 8 –L C1,C2 -u merged_asm/merged.gtf \./C1_R1_thout/accepted_hits.bam,./C1_R2_thout/accepted_hits.bam,\./C1_R3_thout/accepted_hits.bam \./C2_R1_thout/accepted_hits.bam,\./C2_R3_thout/accepted_hits.bam,./C2_R2_thout/accepted_hits.bam Your RNA-Seq Data Your transformed RNA-Seq Data

RNA-Seq Analysis Workflow Tophat (bowtie) Cufflinks Cuffmerge Cuffdiff CummeRbund Your Data iPlant Data Store FASTQ Discovery Environment Atmosphere

RNA-Seq Workflow Overview

TopHat TopHat is one of many applications for aligning short sequence reads to a reference genome. It uses the BOWTIE aligner internally. Other alternatives are GSNAP, BWA, Stampy, etc.

RNA-seq Sample Read Statistics Genome alignments from TopHat were saved as BAM files, the binary version of SAM (samtools.sourceforge.net/). Reads mapped by TopHat are shown below Sequence runWT-1WT-2hy5-1hy5-2 Reads10,866,70210,276,26813,410,01112,471,462 Seq. (Mbase)

RNA-Seq Workflow Overview

Examining Differential Gene Expression

Input Read Files for Tophat

BAM Alignment files – for CuffLinks

GTF – Reference Based Assembly File

Inputs for CuffDiff

CuffDiff Output Output Directories: cuffdiff_out sorted_data

cuffdiff_out directory basic_plots.R bias_params.info cds.count_tracking cds.diff cds_exp.diff cds.fpkm_tracking cds.read_group_tracking cuffData.db gene_exp.diff genes.count_tracking genes.fpkm_tracking genes.read_group_tracking isoform_exp.diff isoforms.count_tracking isoforms.fpkm_tracking isoforms.read_group_tracking promoters.diff read_groups.info run.info splicing.diff tss_group_exp.diff tss_groups.count_tracking tss_groups.fpkm_tracking tss_groups.read_group_tracking var_model.info

cds.diff file test_idgene_idgenelocussample_1sample_2statusvalue_1value_2 log2(fold_ch ange)test_statp_valueq_valuesignificant AT1G01010 ANAC0011: nuclearcytoplasmicOK no AT1G01020 ARV11: nuclearcytoplasmicOK E yes AT1G01030 NGA3 1: nuclearcytoplasmicOK yes AT1G01040 DCL1 1: nuclearcytoplasmicOK no AT1G01046 MIR838A 1: nuclearcytoplasmicNOTEST inf011no AT1G01060 AT1G01060, CUFFLHY 1: nuclearcytoplasmicOK no AT1G : nuclearcytoplasmicOK E yes AT1G : nuclearcytoplasmicNOTEST000011no AT1G : nuclearcytoplasmicOK yes AT1G01100 AT1G01100, CUFFAT1G : nuclearcytoplasmicOK E yes

sorted_data directory genes.sorted_by_expression.sig.txt genes.sorted_by_expression.txt genes.sorted_by_fold.sig.txt genes.sorted_by_fold.txt transcripts.sorted_by_expression.sig.txt transcripts.sorted_by_expression.txt transcripts.sorted_by_fold.sig.txt transcripts.sorted_by_fold.txt

Sorted Differentially expressed genes gene_id gene_nam esample1sample2 fold_chan gedirectiontotal_fpkmq-valuegene_description ATCG000 10TRNHnuclearcytoplasmic3.8600DOWN A chloroplas tgeneencodinga histidine- accepting tRNA..[Source:TAIR;Acc:ATCG0 0010] ATCG002 20PSBMnuclearcytoplasmic4.7900DOWN photosyst emIIreactioncenterprotein M.[Source:TAIR;Acc:ATCG0022 0] AT3G166 40TCTPnuclearcytoplasmic9.3800UP translation ally- controlledtumor protein- likeprotein [Source:EMBL;Acc:AEE ] AT5G nuclearcytoplasmic4.6700UP uncharact erizedprotein [Source:EMBL;Acc:AED ] ATCG000 90TRNS.1nuclearcytoplasmic5.9300DOWN tRNA-Ser.[Source:TAIR;Acc:ATCG00090] AT3G ATARFA1 Enuclearcytoplasmic7.6400UP ADP- ribosylatio nfactorA1E [Source:EMBL;Acc:AEE ] AT2G nuclearcytoplasmic6.0500UP SribosomalproteinL41 [Source:EMBL;Acc:AEC ] AT4G nuclearcytoplasmic4.8300DOWN snoRNA.[Source:TAIR;Acc:AT4 G39366] AT5G nuclearcytoplasmic2.0200DOWN uncharact erizedprotein [Source:EMBL;Acc:AED ] AT4G nuclearcytoplasmic8.6900DOWN snoRNA.[Source:TAIR;Acc:AT4 G39364] AT3G nuclearcytoplasmic4.5600DOWN snoRNA.[Source:TAIR;Acc:AT3 G47347] AT1G283 30DRM1nuclearcytoplasmic3.0000UP dormancy- associate d protein- like [Source:EMBL;Acc:AEE ] AT5G032 40UBQ3nuclearcytoplasmic UP polyubiqui tin [Source:EMBL;Acc:AED ]

The iPlant Collaborative is funded by a grant from the National Science Foundation Plant Cyberinfrastructure Program (#DBI ).

ATG44120 (12S seed storage protein) significantly down-regulated in hy5 mutant Background (> 9-fold p=0). Compare to gene on right lacking differential expression