RNA-Seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis is doing the.

RNA-Seq Transcriptome Profiling

Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis is doing the alignments of the reads (in Sanger fastq format) for all replicates against the reference genome.

Overview: This training module is designed to provide a hands on experience in using RNA-Seq for transcriptome profiling. Question: How can we compare gene expression levels using RNA-Seq data in Arabidopsis WT and hy5 genetic backgrounds? RNA-seq in the Discovery Environment

Scientific Objective LONG HYPOCOTYL 5 (HY5) is a basic leucine zipper transcription factor (TF). Mutations in the HY5 gene cause aberrant phenotypes in Arabidopsis morphology, pigmentation and hormonal response. We will use RNA-seq to compare the transcriptomes of seedlings from WT and hy5 genetic backgrounds to identify HY5-regulated genes.

Samples Experimental data downloaded from the NCBI Short Read Archive (GEO:GSM613465 and GEO:GSM613466) Two replicates each of RNA-seq runs for Wild- type and hy5 mutant seedlings.

Specific Objectives By the end of this module, you should 1)Be more familiar with the DE user interface 1)Understand the starting data for RNA-seq analysis 1)Be able to align short sequence reads with a reference genome in the DE 1)Be able to analyze differential gene expression in the DE 1)Be able to visualize RNA-Seq data in Atmosphere

RNA-Seq Conceptual Overview Image source: http://www.bgisequence.com

RNA-Seq Data @SRR070570.4 HWUSI-EAS455:3:1:1:1096 length=41 CAAGGCCCGGGAACGAATTCACCGCCGTATGGCTGACCGGC + BA?39AAA933BA05>A@A=?4,9################# @SRR070570.12 HWUSI-EAS455:3:1:2:1592 length=41 GAGGCGTTGACGGGAAAAGGGATATTAGCTCAGCTGAATCT + @=:9>5+.5=?@ A?@6+2?:,%1/=0/7/>48## @SRR070570.13 HWUSI-EAS455:3:1:2:869 length=41 TGCCAGTAGTCATATGCTTGTCTCAAAGATTAAGCCATGCA + A;BAA6=A3=ABBBA84B AB2@>B@/9? @SRR070570.32 HWUSI-EAS455:3:1:4:1075 length=41 CAGTAGTTGAGCTCCATGCGAAATAGACTAGTTGGTACCAC + BB9?A@>AABBBB@BCA?A8BBBAB4B@BC71=?9;B:3B? @SRR070570.40 HWUSI-EAS455:3:1:5:238 length=41 AAAAGGGTAAAAGCTCGTTTGATTCTTATTTTCAGTACGAA + BBB?06-8BB@B17>9)=A91?>>8>*@ >@1:B>(B@ @SRR070570.44 HWUSI-EAS455:3:1:5:1871 length=41 GTCATATGCTTGTCTCAAAGATTAAGCCATGCATGTGTAAG + BBBCBCCBBBBBA@BBCCB+ABBCB@B@BB@:BAA@B@BB> @SRR070570.46 HWUSI-EAS455:3:1:5:1981 length=41 GAACAACAAAACCTATCCTTAACGGGATGGTACTCACTTTC + ?A>-?B;BCBBB@BC@/>A : …Now What?

@SRR070570.4 HWUSI-EAS455:3:1:1:1096 length=41 CAAGGCCCGGGAACGAATTCACCGCCGTATGGCTGACCGGC + BA?39AAA933BA05>A@A=?4,9################# @SRR070570.12 HWUSI-EAS455:3:1:2:1592 length=41 GAGGCGTTGACGGGAAAAGGGATATTAGCTCAGCTGAATCT + @=:9>5+.5=?@ A?@6+2?:,%1/=0/7/>48## @SRR070570.13 HWUSI-EAS455:3:1:2:869 length=41 TGCCAGTAGTCATATGCTTGTCTCAAAGATTAAGCCATGCA + A;BAA6=A3=ABBBA84B AB2@>B@/9? @SRR070570.32 HWUSI-EAS455:3:1:4:1075 length=41 CAGTAGTTGAGCTCCATGCGAAATAGACTAGTTGGTACCAC + BB9?A@>AABBBB@BCA?A8BBBAB4B@BC71=?9;B:3B? @SRR070570.40 HWUSI-EAS455:3:1:5:238 length=41 AAAAGGGTAAAAGCTCGTTTGATTCTTATTTTCAGTACGAA + BBB?06-8BB@B17>9)=A91?>>8>*@ >@1:B>(B@ @SRR070570.44 HWUSI-EAS455:3:1:5:1871 length=41 GTCATATGCTTGTCTCAAAGATTAAGCCATGCATGTGTAAG + BBBCBCCBBBBBA@BBCCB+ABBCB@B@BB@:BAA@B@BB> @SRR070570.46 HWUSI-EAS455:3:1:5:1981 length=41 GAACAACAAAACCTATCCTTAACGGGATGGTACTCACTTTC + ?A>-?B;BCBBB@BC@/>A : Bioinformagician

$ tophat -p 8 -G genes.gtf -o C1_R1_thout genome C1_R1_1.fq C1_R1_2.fq $ tophat -p 8 -G genes.gtf -o C1_R2_thout genome C1_R2_1.fq C1_R2_2.fq $ tophat -p 8 -G genes.gtf -o C1_R3_thout genome C1_R3_1.fq C1_R3_2.fq $ tophat -p 8 -G genes.gtf -o C2_R1_thout genome C2_R1_1.fq C1_R1_2.fq $ tophat -p 8 -G genes.gtf -o C2_R2_thout genome C2_R2_1.fq C1_R2_2.fq $ tophat -p 8 -G genes.gtf -o C2_R3_thout genome C2_R3_1.fq C1_R3_2.fq $ cufflinks -p 8 -o C1_R1_clout C1_R1_thout/accepted_hits.bam $ cufflinks -p 8 -o C1_R2_clout C1_R2_thout/accepted_hits.bam $ cufflinks -p 8 -o C1_R3_clout C1_R3_thout/accepted_hits.bam $ cufflinks -p 8 -o C2_R1_clout C2_R1_thout/accepted_hits.bam $ cufflinks -p 8 -o C2_R2_clout C2_R2_thout/accepted_hits.bam $ cufflinks -p 8 -o C2_R3_clout C2_R3_thout/accepted_hits.bam $ cuffmerge -g genes.gtf -s genome.fa -p 8 assemblies.txt $ cuffdiff -o diff_out -b genome.fa -p 8 –L C1,C2 -u merged_asm/merged.gtf \./C1_R1_thout/accepted_hits.bam,./C1_R2_thout/accepted_hits.bam,\./C1_R3_thout/accepted_hits.bam \./C2_R1_thout/accepted_hits.bam,\./C2_R3_thout/accepted_hits.bam,./C2_R2_thout/accepted_hits.bam Your RNA-Seq Data Your transformed RNA-Seq Data

RNA-Seq Analysis Workflow Tophat (bowtie) Cufflinks Cuffmerge Cuffdiff CummeRbund Your Data iPlant Data Store FASTQ Discovery Environment Atmosphere

Quick Summary Find Differentially Expressed genes Align to Genome: TopHat View Alignments: IGV Differential Expression: CuffDiff Download Reads from SRA Export Reads to FASTQ

Import SRA data from NCBI SRA Extract FASTQ files from the downloaded SRA archives Pre-Configured: Getting the RNA-seq Data

Examining Data Quality with fastQC

RNA-Seq Workflow Overview

Align the four FASTQ files to Arabidopsis genome using Tophat Align Reads to the Genome

TopHat TopHat is one of many applications for aligning short sequence reads to a reference genome. It uses the BOWTIE aligner internally. Other alternatives are BWA, MAQ, OLego, Stampy, Novoalign, etc.

RNA-seq Sample Read Statistics Genome alignments from TopHat were saved as BAM files, the binary version of SAM (samtools.sourceforge.net/). Reads retained by TopHat are shown below Sequence runWT-1WT-2hy5-1hy5-2 Reads10,866,70210,276,26813,410,01112,471,462 Seq. (Mbase)445.5421.3549.8511.3

ATG44120 (12S seed storage protein) significantly down-regulated in hy5 mutant Background (> 9-fold p=0). Compare to gene on right lacking differential expression

RNA-Seq Workflow Overview

CuffDiff CuffLinks is a program that assembles aligned RNA-Seq reads into transcripts, estimates their abundances, and tests for differential expression and regulation transcriptome-wide. CuffDiff is a program within CuffLinks that compares transcript abundance between samples

Examining Differential Gene Expression

Examining the Gene Expression Data

Filter CuffDiff results for up or down-regulated gene expression in hy5 seedlings Differentially expressed genes

Example filtered CuffDiff results generated with the Filter_CuffDiff_Results to 1)Select genes with minimum two-fold expression difference 2)Select genes with significant differential expression (q <= 0.05) 3)Add gene descriptions

RNA-Seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis is doing the.

Similar presentations

Presentation on theme: "RNA-Seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis is doing the."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

RNA-Seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis is doing the.

Similar presentations

Presentation on theme: "RNA-Seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis is doing the."— Presentation transcript:

Similar presentations

About project

Feedback