IPlant Collaborative Discovery Environment RNA-seq Basic Analysis Log in with your iPlant ID; three orange icons.

Slides:



Advertisements
Similar presentations
MY NCBI (module 4.5). MODULE 4.5 PubMed/How to Use MY NCBI Instructions - This part of the: course is a PowerPoint demonstration intended to introduce.
Advertisements

Click on the rectangle on the lower right-hand side of the screen. This will quickly clear your desktop.
Training Course: Task List. Agenda Overview of the Task List Screen Icons across the top Making Appointments Viewing Appointments & Filters Working Your.
How to validate in Budget Manager Hopefully now you have reached this stage you will have downloaded the inpatient and outpatients excel files for you.
AIMSweb Progress Monitor Online User Training
Log into the portal and go to the Web Advisor Menu at the bottom right.
1 State Records Center Searching and Requesting Inventory  Versatile web address:  Look for any new ‘Special.
IMGS 2012 Bioinformatics Workshop: RNA Seq using Galaxy
 Use the Left and Right arrow keys or the Page Up and Page Down keys to move between the pages. You can also click on the pages to move forward.  To.
Using Schoolnet: Creating an Express Test Workshop Mike Antrim Woodbridge Technology Advisory Committee 1.
The Maize Inflorescence Project Website Tutorial Nov 7, 2014.
Gateway Program Go to along the left side, in the third section, click on The Gateway.
Page 1 of 16 The Work in Progress screen is accessed from the ETS main menu. The screen will be displayed when a user clicks on the ‘Work in Progress’
2008 Physiological Measurements Focusing on measurements that assess the function of the major body systems 1.
Welcome to the Turnitin.com Instructor Quickstart Tutorial ! This brief tour will take you through the basic steps teachers and students new to Turnitin.com.
RNA-seq Analysis in Galaxy
Using the Unity 98 Cordless Response System Directions : Courtesy of IHC public relations.
Business Objects For End Users BI_BOBJ_200 1BI_BOBJ_200 Business Objects for End Users.
Working with SharePoint Document Libraries. What are document libraries? Document libraries are collections of files that you can share with team members.
EASY TEAM MANAGER By Dave Abineri EASYWARE: PO Box 231, Milford, OHIO (Cincinnati) Phone: (513) Use UP arrow to move to the NEXT slide Use.
Bacterial Genome Assembly | Victor Jongeneel Radhika S. Khetani
NGS Analysis Using Galaxy
Introduction to RNA-Seq and Transcriptome Analysis
Panorama High School E.G.P./ Training to Put Students’ Grades on the Website Wednesday, September 29,
Introducing... NPF Connect Press [Space Bar] to continue...
Introduction to the WebBoard Terry Dennis. The WebBoard - Our Connection The WebBoard URL is
Download Dropbox Download should start immediately Save download file:
MagicInfo Pro Scheduler Now that a template has been created from content imported into the Library, the user is ready to begin scheduling content to.
IC 3 BASICS, Internet and Computing Core Certification Key Applications Lesson 10 Creating and Formatting an Excel Worksheet.
Vector NTI. Go Herd! Download your sequence and open the file Click your name on my web page on the class genes page
Getting Started with BDI-2™ Mobile Data Solution for Windows®
Introduction to RNA-Seq & Transcriptome Analysis
Introduction to RNA-Seq
2008 Adobe Systems Incorporated. All Rights Reserved. Getting Started with Adobe Presenter Modified by C. Candace Chou.
Page 1 Non-Payroll Cost Transfer Enhancements Last update January 24, 2008 What are the some of the new enhancements of the Non-Payroll Cost Transfer?
Creating a Web Site Using 000webhost.com The 000webhost.com Site You will be required to create an account in order to use their host computer 000webhost.com.
© 2008 General Parts International, Inc. Written permission is required to copy or forward to anyone other than the intended recipient. 1 © 2008 General.
11/25/2015Slide 1 Scripts are short programs that repeat sequences of SPSS commands. SPSS includes a computer language called Sax Basic for the creation.
Build an Automated Workflow Visual Workflow Creator Discovery Environment.
Indicator 13 Secondary Transition. Main Menu SPP13 has a navigation toolbar located at the top of each screen. If you use the toolbar to navigate to another.
The iPlant Collaborative
CuffDiff ran successfully. Output files include gene_exp.diff What are the next steps? Use Navigation bar to find files; they may be under DNA Subway if.
Using Middle Search® Plus For Junior Academic Bowl Competitions.
How to create an educational wiki. Laurie Roberts 2010.
If you don’t have Google Earth downloaded already, you can go to to get it.
Word and the Writing Process. To create a document 1.On the Start menu, point to Programs, and then click Microsoft Word. A new document opens in Normal.
Invoices and Service Invoices Training Presentation for Raytheon Supply Chain Platform (RSCP) April 2016.
Orders – Create Responses Boeing Supply Chain Platform (BSCP) Detailed Training July 2016.
Introductory RNA-seq Transcriptome Profiling
Getting Started with BDI-2™ Mobile Data Solution for Windows®
Standard Operating Procedure
WS9: RNA-Seq Analysis with Galaxy (non-model organism )
Entering Elementary SBRC Term Grades in Aspen
Sigma-Aldrich PT Portal
Setting up Categories, Grading Preferences and Entering Grades
Installation and Training
NGS Analysis Using Galaxy
Word and the Writing Process
9/14/2018 6:28 AM How to create Learning Plans in Partner University Mary Sutton October 2017 © 2014 Microsoft Corporation. All rights reserved. MICROSOFT.
Yonglan Zheng Galaxy Hands-on Demo Step-by-step Yonglan Zheng
NextGen Trustee General Ledger Accounting
Viewing Provider Schedules
Viewing Provider Schedules
Reporting Site Manager User Guide February 2019.
Viewing Provider Schedules
Introduction to RNA-Seq & Transcriptome Analysis
RNA-Seq Data Analysis UND Genomics Core.
Presentation transcript:

iPlant Collaborative Discovery Environment RNA-seq Basic Analysis Log in with your iPlant ID; three orange icons on left side: Data, Apps, Analysis. 1.Select Apps (Applications) 2.When Apps window opens, search for application of interest, in this case: FastQC since the first task is to determine if the fastq data files are high quality. 3.Click on “I” icon for more information.

Can read full name: FastQC (multi-file) Open User Manual Link User Manual open; Quick Start and Test Data may be useful.

Click on App name, App window opens.Modify Analysis Name so it is easy to recall what was done. Add Comments, this will greatly help as you try to figure out what was done at a later date. Scroll down using bar (red arrow) to Select output folder: keep as default: /iplant/home/youraccountname/analyses. Files can be moved around later. Click on Select input data.

Click on + Add. List of folders open; click on folder with fastq files, check those that need QC, then OK. Check to make sure all files are present then Launch Analysis.

A notification will briefly appear that lets you know App was successfully launched. Click on orange Analyses icon to open Analyses window to track progress. You can also look at notifications on top right side. Be sure to refresh Analyses window when you check the status. Wait, this will take hours. Status will change to Running and then Completed. You may get Idle (no problem) or Failed (did not work).

An will arrive to let you when run is Completed or Failed. Go to Analyses folder for Output; Status will read Completed (!), Click on name to Go to the output folder. The Data window now opens (one of the three orange icons on the DE desktop). Under analyses, the folder for this particular analysis will be selected and the subfolders can be seen to the right. Click on a folder that has one of the names of your fastq input files. You can share this output easily using the share icon and selecting a collaborator’s name. Share icon

Once in selected fastqc folder, click on images. Slide name bar to the right so that entire name can be read. Select per_base_quality.png

Y-axis = Phred score Sequence Quality 40 = 1/10,000 chance of error. 30 = 1/1000 chance of error 20, 1/100 chance of error 28 and above are coded green; high quality. The central red line is the median value. The yellow box represents the inter- quartile range (25-75%). The upper and lower whiskers represent the 10% and 90% points. The blue line represents the mean quality. The mean quality is quite high for this fastq file. Check all files. If quality goes way down at the ends, trimming may be needed.

This per_base_quality graph shows a dip at nucleotides, however the mean quality score (blue line) stays above 28. This fastq file is of acceptable quality.

This per_base_quality graph shows a steady decline in quality. Must filter for higher quality reads.

If you are concerned about any of your fastq files, screen using a quality filter. Search in Apps, and select appropriate App. Add information to output name and be sure to use Comments! Under Select input data, browse for appropriate fastq file. Under Options, the default is a score of 20 for minimum quality for 50% of bases; increase to 75%. The correct type of Illumina encoding can be seen at the top of the per_base_quality graph. Launch Analysis. Do this for all fastq files that have dips in quality.

Let them run! When complete, go to Analyses folder. Output is fastq_quality_filter_out.fastq. Need to change name to reflect sample sequenced, in this case 29d-3. You can see why it is good to put sample name on run name!

Check box next to file name. Under edit, click on Rename… Change file name, but retain.fastq at end. Create New Folder for quality filtered fastq files. Under Edit, Click on New Folder Choose Location and give appropriate Folder Name. Rename all quality filtered fastq files, move into New Folder.

Remaining FASTX_quality_filtered folders in Analyses Folder only contain logs and can be deleted. Open Analyses Folder Check files to be deleted. Under Files, click on Move to Trash Now all quality filtered fastq files are in a folder, and ready for FASTQC to see if data are improved. Use App to redo fastqc analysis as before.

After quality filter, Phred above 20 for 75% of sequenceBefore quality filter Median value (blue line) is somewhat higher after quality filter. Demonstrates that dip in middle is not major concern.

The next step is to align the quality filtered sequence reads to the genome. These reads are from spliced RNA so the alignment programs must take the presence of introns into account. Tophat is commonly used for alignment with eukaryotes. Open Apps, and search for Tophat. These datasets are single-end (SE) reads, so choose Tophat2-SE. PE is for paired-end reads. Add Input fastq files, using quality filtered files. Select Reference Genome from list. If your genome is not on list, obtain FASTA file from appropriate genome project.

Under Analysis Options Use default Anchor length, but you may want to change minimum and maximum intron length depending on your organism. Arabidopsis has smaller introns. After these values are set, Launch Analysis. Refresh Analysis window and check that your Tophat job is running.

When Tophat2_SE is Completed (you will receive an ), click on analysis name and you will be taken to the Output Folder. Open bam folder. This folder can be shared with collaborators. Widen name column to read names of bam and bai (bam index) files. Note that bam files are much larger than bai files (GB vs. KB). The bam file is a text file with sequence alignment data. A bam and a bai file will be generated for each input fastq file. These are the input for the next step in the analysis.

The next step is to count and compare reads at each gene locus and determine if read count values are significantly different between samples. Search Apps using cuffdiff and select to run Cuffdiff2. Under Input data, give clear name to each sample. Click Add and select bam files from Tophat2 output. Multiple samples can be added at one time. Do this for all samples. Add Reference Annotation and Reference Genome GTF file, Gene Transfer Format, gene list by chromosome location. Limited to known genes.

Analysis Options If samples are a time series, checking “treat sample files as a time series” will only compare adjacent time points. The False Discovery Rate (FDR) can be reduced to 0.01 if more stringency is needed. Launch Analysis! Wait until Completed…

When complete, Click on Analysis name to get to folder within analyses window. Open the cuffdiff_out folder. A number of files appear. Gene_exp.diff will be the most important. It should be large (MB). This file can be shared with colleagues. Click on gene_exp.diff to view data.

The columns of the gene_exp.diff file include gene name, the comparisons being made (important if you have more than two samples), the median RPKM values for the three sample replicates and the log2(fold change) which will be positive if sample 2 is greater than sample 1. p-value and q-value are shown, and if the q-value is ≤ 0.05, then a yes is present in the significant column. Downloading instructions for gene_exp.diff files and subsequent Gene Ontology analyses can be found on the Life After Cuffdiff PowerPoint.