Before we start: Align sequence reads to the reference genome

Slides:



Advertisements
Similar presentations
NGS Bioinformatics Workshop 2.1 Tutorial – Next Generation Sequencing and Sequence Assembly Algorithms May 3rd, 2012 IRMACS Facilitator: Richard.
Advertisements

IMGS 2012 Bioinformatics Workshop: RNA Seq using Galaxy
Advanced ChIP-seq Identification of consensus binding sites for the LEAFY transcription factor Explain that you can use your own data Explain that data.
RNA-seq Analysis in Galaxy
Bacterial Genome Assembly | Victor Jongeneel Radhika S. Khetani
NGS Analysis Using Galaxy
An Introduction to RNA-Seq Transcriptome Profiling with iPlant
RNA-Seq Visualization
Introduction to RNA-Seq and Transcriptome Analysis
Customized cloud platform for computing on your terms !
Expression Analysis of RNA-seq Data
Genomics Virtual Lab: analyze your data with a mouse click Igor Makunin School of Agriculture and Food Sciences, UQ, April 8, 2015.
Transcriptome analysis With a reference – Challenging due to size and complexity of datasets – Many tools available, driven by biomedical research – GATK.
BIF Group Project Group (A)rabidopsis: David Nieuwenhuijse Matthew Price Qianqian Zhang Thijs Slijkhuis Species: C. Elegans Project: Advanced.
Advanced ChIPseq Identification of consensus binding sites for the LEAFY transcription factor.
RNAseq analyses -- methods
Introduction to RNA-Seq & Transcriptome Analysis
NGS data analysis CCM Seminar series Michael Liang:
Next Generation DNA Sequencing
Transcriptome Analysis
RNA-Seq in Galaxy Igor Makunin QAAFI, Internal Workshop, April 17, 2015.
RNA-seq workshop ALIGNMENT
An Introduction to RNA-Seq Transcriptome Profiling with iPlant.
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
Introduction to RNA-Seq
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop RNA-Seq using the Discovery Environment And COGE.
Data Analysis Project Advanced Bioinformatics BIF
BIF Group Project Group (A)rabidopsis: David Nieuwenhuijse Matthew Price Qianqian Zhang Thijs Slijkhuis.
IPlant Collaborative Discovery Environment RNA-seq Basic Analysis Log in with your iPlant ID; three orange icons.
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
Transcriptomics Sequencing. over view The transcriptome is the set of all RNA molecules, including mRNA, rRNA, tRNA, and other non coding RNA produced.
Galaxy – Set up your account. Galaxy – Two ways to get your data.
RNA-Seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis is doing the.
Introduction to RNAseq
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop RNA-Seq visualization with cummeRbund.
Trinity College Dublin, The University of Dublin GE3M25: Data Analysis, Class 4 Karsten Hokamp, PhD Genetics TCD, 07/12/2015
Tutorial 6 High Throughput Sequencing. HTS tools and analysis Review of resequencing pipeline Visualization - IGV Analysis platform – Galaxy Tuning up.
Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented.
IGV tools. Pipeline Download genome from Ensembl bacteria database Export the mapping reads file (SAM) Map reads to genome by CLC Using the mapping.
The iPlant Collaborative
An Introduction to RNA-Seq Transcriptome Profiling with iPlant (
The iPlant Collaborative
No reference available
Accessing and visualizing genomics data
Canadian Bioinformatics Workshops
User-friendly Galaxy interface and analysis workflows for deep sequencing data Oskari Timonen and Petri Pölönen.
RNA-Seq visualization with CummeRbund
Introduction to Exome Analysis in Galaxy Carol Bult, Ph.D. Professor Deputy Director, JAX Cancer Center Short Course Bioinformatics Workshops 2014 Disclaimer…I.
Canadian Bioinformatics Workshops
IGV Demo Slides:/g/funcgen/trainings/visualization/Demos/IGV_demo.ppt Galaxy Dev: 0.
Canadian Bioinformatics Workshops
Overview of Genomics Workflows
Canadian Bioinformatics Workshops
RNA Seq Analysis Aaron Odell June 17 th Mapping Strategy A few questions you’ll want to ask about your data… - What organism is the data from? -
Introductory RNA-seq Transcriptome Profiling of the hy5 mutation in Arabidopsis thaliana.
Canadian Bioinformatics Workshops
Introductory RNA-seq Transcriptome Profiling
Cancer Genomics Core Lab
WS9: RNA-Seq Analysis with Galaxy (non-model organism )
Integrative Genomics Viewer (IGV)
Advanced ChIP-seq Identification of consensus binding sites for the LEAFY transcription factor Explain that you can use your own data Explain that data.
RNA-Seq visualization with CummeRbund
S1 Supporting information Bioinformatic workflow and quality of the metrics Number of slides: 10.
How to store and visualize RNA-seq data
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Introductory RNA-Seq Transcriptome Profiling
Additional file 2: RNA-Seq data analysis pipeline
Introduction to RNA-Seq & Transcriptome Analysis
RNA-Seq Data Analysis UND Genomics Core.
Presentation transcript:

Introductory RNA-seq Transcriptome Profiling of the hy5 mutation in Arabidopsis thaliana

Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis is doing the alignments of the reads (in Sanger fastq format) for all replicates against the reference genome. Make sure everyone has gotten the four replicates loaded into the new Tophat implementation that accepts multiple fastq files and runs them serially (TopHat-1.4.1) at the beginning of the lecture. This takes the most time but will finish for most people while you do the lecture.

RNA-seq in the Discovery Environment Overview: This training module is designed to provide a hands on experience in using RNA-Seq for transcriptome profiling. Question: How well is the annotated transcriptome represented in RNA-seq data in Arabidopsis WT and hy5 genetic backgrounds? How can we compare gene expression levels in the two samples?

Scientific Objective LONG HYPOCOTYL 5 (HY5) is a basic leucine zipper transcription factor (TF). Mutations in the HY5 gene cause aberrant phenotypes in Arabidopsis morphology, pigmentation and hormonal response. We will use RNA-seq to compare the transcriptomes of seedlings from WT and hy5 genetic backgrounds to identify HY5-regulated genes.

Samples Experimental data downloaded from the NCBI Short Read Archive (GEO:GSM613465 and GEO:GSM613466) Two replicates each of RNA-seq runs for Wild-type and hy5 mutant seedlings.

Specific Objectives By the end of this module, you should Be more familiar with the DE user interface Understand the starting data for RNA-seq analysis Be able to align short sequence reads with a reference genome in the DE Be able to analyze differential gene expression in the DE Be able to use DE text manipulation tools to explore the gene expression data

Quick Summary Differential Expression: CuffDiff Download Reads from SRA Align to Genome: TopHat Find Differentially Expressed genes Export Reads to FASTQ View Alignments: IGV

Pre-Configured: Getting the RNA-seq Data Import SRA data from NCBI SRA Extract FASTQ files from the downloaded SRA archives These steps are pre-done to make the work-flow fit into the module time allocation. Spend a moment explaining the provenance (ie getting the data from NCBI, SRA-lite format) Explain that the fastq dumper rescales the quality scores to the Sanger convention for fastq Let them know we did this for them in advance

RNA-Seq Conceptual Overview This is a quick visual overview of transcriptome profiling via RNA-seq. It does not go into comparisons but we cover that with CuffDiff later. Image source: http://www.bgisequence.com

RNA-Seq Workflow Overview Explain reference-sequence based NGS read alignments. Explain that we are skipping the cufflinks step because the Arabidopsis transcriptome is so well annotated that we can use the TAIR gene models as our refernce transcripts for CuffDiff

Step 1: Align Reads to the Genome Align the four FASTQ files to Arabidopsis genome using TopHat They will have done this part by now.

It uses the BOWTIE aligner internally. TopHat TopHat is one of many applications for aligning short sequence reads to a reference genome. It uses the BOWTIE aligner internally. Other alternatives are BWA, MAQ, TopHat, Stampy, Novoalign, etc. Emphasize that the TopHat aligner is one of many choices. Let them know that others are available in the DE and they can also integrate their own if they want to.

RNA-seq Sample Read Statistics Genome alignments from TopHat were saved as BAM files, the binary version of SAM (). Reads retained by TopHat are shown below Sequence run WT-1 WT-2 hy5-1 hy5-2 Reads 10,866,702 10,276,268 13,410,011 12,471,462 Seq. (Mbase) 445.5 421.3 549.8 511.3 These are the read counts generated by TopHat as part of its alignment analysis. This is a modestly sized data set by NGS standard; good time to mention scalability, Data Store, etc.

Prepare BAM files for viewing Index BAM files using SAMtools This is done for them.

Using IGV in Atmosphere We already Launched an instance of NGS Viewers in Atmosphere Use VNClient to connect to your remote desktop We will just show them the slides. Launching an Atmosphere instance is out of scope for this module. Explain that we will cover Atmosphere later in the day/workshop.

Pre-configured VM for NGS Viewers

Integrated Genomics Viewer (IGV) The Integrative Genomics Viewer (IGV) is a high-performance visualization tool for interactive exploration of large, integrated genomic datasets. It supports a wide variety of data types, including array-based and next-generation sequence data, and genomic annotations. http://www.broadinstitute.org/igv/ IGV: Make sure you know how to run IGV yourself. Work the example. Play with configuring tracks. You don’t NEED to run IGV in Atmosphere. If that product is flaking out, show users how to do the same thing on their OWN desktop! Use IGV to inspect outputs from TopHat

Explain this figure: The gene on the left is differentially expressed (down-regulated in hy5). Compare to gene on right that is not differentially expressed in the two samples. ATG44120 (12S seed storage protein) significantly down-regulated in hy5 mutant Background (> 9-fold p=0). Compare to gene on right lacking differential expression

Other Ways to View Alignment Data WIG->Ensembl Explain that we can also export to popular browsers like Ensembl and UCSC by using the Bam->Wig converter.

RNA-Seq Workflow Overview Explain that we are skipping the cufflinks step because the Arabidopsis transcriptome is so well annotated that we can use the TAIR gene models as our refernce transcripts for CuffDiff

CuffDiff CuffLinks is a program that assembles aligned RNA-Seq reads into transcripts, estimates their abundances, and tests for differential expression and regulation transcriptome-wide. CuffDiff is a program within CuffLinks that compares transcript abundance between samples Explain that we are skipping the cufflinks step because the Arabidopsis transcriptome is so well annotated that we can use the TAIR gene models as our refernce transcripts for CuffDiff

Examining Differential Gene Expression Introducing CuffDiff-1.3.0 with replicates

Examining the Gene Expression Data Explain that there are various text manipulation tools integrated into the DE (grep, cut, awk etc) for very configurable modular analysis Of the tabular output data from CuffDiff. Then segue into the Filter_CuffDiff_Results App, which consolidates some of these steps.

Differentially expressed genes Filter CuffDiff results for up or down-regulated gene expression in hy5 seedlings

Differentially expressed genes Example filtered CuffDiff results generated with the Filter_CuffDiff_Results to Select genes with minimum two-fold expression difference Select genes with significant differential expression (q <= 0.05) Add gene descriptions