Presentation is loading. Please wait.

Presentation is loading. Please wait.

The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop RNA-Seq using the Discovery Environment And COGE.

Similar presentations


Presentation on theme: "The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop RNA-Seq using the Discovery Environment And COGE."— Presentation transcript:

1 The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop RNA-Seq using the Discovery Environment And COGE

2 What is RNA-Seq?

3 Gene-Expression studies by sequencing Reverse-Transcribed RNA

4 What is RNA-Seq? Gene-Expression studies by sequencing Reverse-Transcribed RNA Getting Started….

5 First --- What if you have a question?

6 Hint: What if you have a question about anything??

7 Starting an RNA-Seq Project

8 Sequencing Illumina Ion Torrent 454 PacBio

9 So your reads are ready… You’ve uploaded the sequencing files to the iPlant Data Store What’s next? What are the steps for RNA-Seq?

10 RNA-Seq Conceptual Overview Image source: http://www.bgisequence.com

11 The entire RNA-Seq analysis method… Read analysis and cleanup! Map the reads to the genome (if you have a genome sequence ) Assemble the reads into transcripts Map the transcripts to the genome (if you have a genome sequence ) Annotate the transcripts (or wait to later) Map the reads to the genome or directly to the transcripts Count the number of hits per transcript or gene for each condition Analyze counts for different conditions to determine differential expression Then you start thinking more about the Gene Ontology, what types or genes or transcripts are differentially expressed – biology!!

12 Examining Data Quality with FastQC

13 RNA-Seq Data @SRR070570.4 HWUSI-EAS455:3:1:1:1096 length=41 CAAGGCCCGGGAACGAATTCACCGCCGTATGGCTGACCGGC + BA?39AAA933BA05>A@A=?4,9################# @SRR070570.12 HWUSI-EAS455:3:1:2:1592 length=41 GAGGCGTTGACGGGAAAAGGGATATTAGCTCAGCTGAATCT + @=:9>5+.5=?@ A?@6+2?:,%1/=0/7/>48## @SRR070570.13 HWUSI-EAS455:3:1:2:869 length=41 TGCCAGTAGTCATATGCTTGTCTCAAAGATTAAGCCATGCA + A;BAA6=A3=ABBBA84B AB2@>B@/9? @SRR070570.32 HWUSI-EAS455:3:1:4:1075 length=41 CAGTAGTTGAGCTCCATGCGAAATAGACTAGTTGGTACCAC + BB9?A@>AABBBB@BCA?A8BBBAB4B@BC71=?9;B:3B? @SRR070570.40 HWUSI-EAS455:3:1:5:238 length=41 AAAAGGGTAAAAGCTCGTTTGATTCTTATTTTCAGTACGAA + BBB?06-8BB@B17>9)=A91?>>8>*@ >@1:B>(B@ @SRR070570.44 HWUSI-EAS455:3:1:5:1871 length=41 GTCATATGCTTGTCTCAAAGATTAAGCCATGCATGTGTAAG + BBBCBCCBBBBBA@BBCCB+ABBCB@B@BB@:BAA@B@BB> @SRR070570.46 HWUSI-EAS455:3:1:5:1981 length=41 GAACAACAAAACCTATCCTTAACGGGATGGTACTCACTTTC + ?A>-?B;BCBBB@BC@/>A : …Now What?

14

15 $ tophat -p 8 -G genes.gtf -o C1_R1_thout genome C1_R1_1.fq C1_R1_2.fq $ tophat -p 8 -G genes.gtf -o C1_R2_thout genome C1_R2_1.fq C1_R2_2.fq $ tophat -p 8 -G genes.gtf -o C1_R3_thout genome C1_R3_1.fq C1_R3_2.fq $ tophat -p 8 -G genes.gtf -o C2_R1_thout genome C2_R1_1.fq C1_R1_2.fq $ tophat -p 8 -G genes.gtf -o C2_R2_thout genome C2_R2_1.fq C1_R2_2.fq $ tophat -p 8 -G genes.gtf -o C2_R3_thout genome C2_R3_1.fq C1_R3_2.fq $ cufflinks -p 8 -o C1_R1_clout C1_R1_thout/accepted_hits.bam $ cufflinks -p 8 -o C1_R2_clout C1_R2_thout/accepted_hits.bam $ cufflinks -p 8 -o C1_R3_clout C1_R3_thout/accepted_hits.bam $ cufflinks -p 8 -o C2_R1_clout C2_R1_thout/accepted_hits.bam $ cufflinks -p 8 -o C2_R2_clout C2_R2_thout/accepted_hits.bam $ cufflinks -p 8 -o C2_R3_clout C2_R3_thout/accepted_hits.bam $ cuffmerge -g genes.gtf -s genome.fa -p 8 assemblies.txt $ cuffdiff -o diff_out -b genome.fa -p 8 –L C1,C2 -u merged_asm/merged.gtf \./C1_R1_thout/accepted_hits.bam,./C1_R2_thout/accepted_hits.bam,\./C1_R3_thout/accepted_hits.bam \./C2_R1_thout/accepted_hits.bam,\./C2_R3_thout/accepted_hits.bam,./C2_R2_thout/accepted_hits.bam Your RNA-Seq Data Your transformed RNA-Seq Data

16 RNA-Seq Analysis Workflow Tophat (bowtie) Cufflinks Cuffmerge Cuffdiff CummeRbund Your Data iPlant Data Store FASTQ Discovery Environment Atmosphere

17 RNA-Seq Workflow Overview

18

19 TopHat TopHat is one of many applications for aligning short sequence reads to a reference genome. It uses the BOWTIE aligner internally. Other alternatives are GSNAP, BWA, Stampy, etc.

20 RNA-seq Sample Read Statistics Genome alignments from TopHat were saved as BAM files, the binary version of SAM (samtools.sourceforge.net/). Reads mapped by TopHat are shown below Sequence runWT-1WT-2hy5-1hy5-2 Reads10,866,70210,276,26813,410,01112,471,462 Seq. (Mbase)445.5421.3549.8511.3

21 RNA-Seq Workflow Overview

22 Examining Differential Gene Expression

23 Input Read Files for Tophat

24 BAM Alignment files – for CuffLinks

25 GTF – Reference Based Assembly File

26 Inputs for CuffDiff

27 CuffDiff Output Output Directories: cuffdiff_out sorted_data

28 cuffdiff_out directory basic_plots.R bias_params.info cds.count_tracking cds.diff cds_exp.diff cds.fpkm_tracking cds.read_group_tracking cuffData.db gene_exp.diff genes.count_tracking genes.fpkm_tracking genes.read_group_tracking isoform_exp.diff isoforms.count_tracking isoforms.fpkm_tracking isoforms.read_group_tracking promoters.diff read_groups.info run.info splicing.diff tss_group_exp.diff tss_groups.count_tracking tss_groups.fpkm_tracking tss_groups.read_group_tracking var_model.info

29 cds.diff file test_idgene_idgenelocussample_1sample_2statusvalue_1value_2 log2(fold_ch ange)test_statp_valueq_valuesignificant AT1G01010 ANAC0011:3630-5899nuclearcytoplasmicOK15.192418.44640.2799850.6852680.32460.416508no AT1G01020 ARV11:5927-8737nuclearcytoplasmicOK103.52813.2388-2.96718-5.278065.00E-05 0.00025606 7yes AT1G01030 NGA3 1:11648- 13714nuclearcytoplasmicOK9.646720.710269-3.7636-5.860830.0001 0.00048256 5yes AT1G01040 DCL1 1:23145- 33153nuclearcytoplasmicOK2.629580.0754974-5.12226-0.9911950.19560.288561no AT1G01046 MIR838A 1:23145- 33153nuclearcytoplasmicNOTEST03.47623inf011no AT1G01060 AT1G01060, CUFFLHY 1:33378- 37871nuclearcytoplasmicOK1.596711.20252-0.409035-0.41060.50220.587558no AT1G01070 1:38751- 40944nuclearcytoplasmicOK6.2232120.1831.697413.192285.00E-05 0.00025606 7yes AT1G01073 1:44676- 44787nuclearcytoplasmicNOTEST000011no AT1G01080 1:45295- 47019nuclearcytoplasmicOK10.84093.28162-1.724-2.716950.00020.00089512yes AT1G01100 AT1G01100, CUFFAT1G01100 1:50074- 51199nuclearcytoplasmicOK244.3981195.12.289834.643575.00E-05 0.00025606 7yes

30 sorted_data directory genes.sorted_by_expression.sig.txt genes.sorted_by_expression.txt genes.sorted_by_fold.sig.txt genes.sorted_by_fold.txt transcripts.sorted_by_expression.sig.txt transcripts.sorted_by_expression.txt transcripts.sorted_by_fold.sig.txt transcripts.sorted_by_fold.txt

31 Sorted Differentially expressed genes gene_id gene_nam esample1sample2 fold_chan gedirectiontotal_fpkmq-valuegene_description ATCG000 10TRNHnuclearcytoplasmic3.8600DOWN93727.70000.0003A chloroplas tgeneencodinga histidine- accepting tRNA..[Source:TAIR;Acc:ATCG0 0010] ATCG002 20PSBMnuclearcytoplasmic4.7900DOWN23751.13000.0003 photosyst emIIreactioncenterprotein M.[Source:TAIR;Acc:ATCG0022 0] AT3G166 40TCTPnuclearcytoplasmic9.3800UP13957.95000.0060 translation ally- controlledtumor protein- likeprotein [Source:EMBL;Acc:AEE75847.1 ] AT5G660 53.nuclearcytoplasmic4.6700UP12783.21000.0295 uncharact erizedprotein [Source:EMBL;Acc:AED98150.1 ] ATCG000 90TRNS.1nuclearcytoplasmic5.9300DOWN11070.10000.0003tRNA-Ser.[Source:TAIR;Acc:ATCG00090] AT3G622 90 ATARFA1 Enuclearcytoplasmic7.6400UP6942.78000.0003 ADP- ribosylatio nfactorA1E [Source:EMBL;Acc:AEE80334.1 ] AT2G402 05.nuclearcytoplasmic6.0500UP6915.93000.000360SribosomalproteinL41 [Source:EMBL;Acc:AEC09796.1 ] AT4G393 66.nuclearcytoplasmic4.8300DOWN6875.36000.0003 snoRNA.[Source:TAIR;Acc:AT4 G39366] AT5G652 07.nuclearcytoplasmic2.0200DOWN6744.90000.0350 uncharact erizedprotein [Source:EMBL;Acc:AED98018.1 ] AT4G393 64.nuclearcytoplasmic8.6900DOWN6401.51000.0003 snoRNA.[Source:TAIR;Acc:AT4 G39364] AT3G473 47.nuclearcytoplasmic4.5600DOWN6386.94000.0082 snoRNA.[Source:TAIR;Acc:AT3 G47347] AT1G283 30DRM1nuclearcytoplasmic3.0000UP5849.46000.0158 dormancy- associate d protein- like1.0000 [Source:EMBL;Acc:AEE30956.1 ] AT5G032 40UBQ3nuclearcytoplasmic25.3900UP5105.60000.0272 polyubiqui tin3.0000 [Source:EMBL;Acc:AED90576.1 ]

32 The iPlant Collaborative is funded by a grant from the National Science Foundation Plant Cyberinfrastructure Program (#DBI-0735191).

33 ATG44120 (12S seed storage protein) significantly down-regulated in hy5 mutant Background (> 9-fold p=0). Compare to gene on right lacking differential expression


Download ppt "The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop RNA-Seq using the Discovery Environment And COGE."

Similar presentations


Ads by Google