WS9: RNA-Seq Analysis with Galaxy (non-model organism )

Slides:



Advertisements
Similar presentations
IMGS 2012 Bioinformatics Workshop: RNA Seq using Galaxy
Advertisements

TOPHAT Next-Generation Sequencing Workshop RNA-Seq Mapping
RNA-Seq based discovery and reconstruction of unannotated transcripts in partially annotated genomes 3 Serghei Mangul*, Adrian Caciula*, Ion.
RNA-seq Analysis in Galaxy
Bacterial Genome Assembly | Victor Jongeneel Radhika S. Khetani
Before we start: Align sequence reads to the reference genome
NGS Analysis Using Galaxy
An Introduction to RNA-Seq Transcriptome Profiling with iPlant
Introduction to RNA-Seq and Transcriptome Analysis
Expression Analysis of RNA-seq Data
Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING.
Genomics Virtual Lab: analyze your data with a mouse click Igor Makunin School of Agriculture and Food Sciences, UQ, April 8, 2015.
Transcriptome analysis With a reference – Challenging due to size and complexity of datasets – Many tools available, driven by biomedical research – GATK.
An Introduction to Designing and Executing Workflows with Taverna Katy Wolstencroft University of Manchester.
Introduction to RNA-Seq & Transcriptome Analysis
Galaxy for Bioinformatics Analysis An Introduction TCD Bioinformatics Support Team Fiona Roche, PhD Date: 31/08/15.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics Lab v1 | Saurabh Sinha1 Powerpoint by Casey Hanson.
NGS data analysis CCM Seminar series Michael Liang:
TopHat Mi-kyoung Seo. Today’s paper..TopHat Cole Trapnell at the University of Washington's Department of Genome Sciences Steven Salzberg Center.
RNA-Seq in Galaxy Igor Makunin QAAFI, Internal Workshop, April 17, 2015.
An Introduction to RNA-Seq Transcriptome Profiling with iPlant.
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
Introduction to RNA-Seq
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop RNA-Seq using the Discovery Environment And COGE.
Welcome to DNA Subway Classroom-friendly Bioinformatics.
Chip – Seq Peak Calling in Galaxy Lisa Stubbs Chip-Seq Peak Calling in Galaxy | Lisa Stubbs | PowerPoint by Casey Hanson.
IPlant Collaborative Discovery Environment RNA-seq Basic Analysis Log in with your iPlant ID; three orange icons.
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
RNA-Seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis is doing the.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics | Saurabh Sinha | PowerPoint by Casey Hanson.
Introduction to RNAseq
Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented.
The iPlant Collaborative
The iPlant Collaborative
Chip – Seq Peak Calling in Galaxy Lisa Stubbs Lisa Stubbs | Chip-Seq Peak Calling in Galaxy1.
TRACKSTER &CIRCSTER DEMO Slides: /g/funcgen/trainings/visualization/Demos/Trackster+Circster.ppt Galaxy: Galaxy Dev:
Introduction to Exome Analysis in Galaxy Carol Bult, Ph.D. Professor Deputy Director, JAX Cancer Center Short Course Bioinformatics Workshops 2014 Disclaimer…I.
Canadian Bioinformatics Workshops
IGV Demo Slides:/g/funcgen/trainings/visualization/Demos/IGV_demo.ppt Galaxy Dev: 0.
Canadian Bioinformatics Workshops
RNA Seq Analysis Aaron Odell June 17 th Mapping Strategy A few questions you’ll want to ask about your data… - What organism is the data from? -
Introductory RNA-seq Transcriptome Profiling of the hy5 mutation in Arabidopsis thaliana.
Canadian Bioinformatics Workshops
RNA-Seq with the Tuxedo Suite Monica Britton, Ph.D. Sr. Bioinformatics Analyst September 2015 Workshop.
Introductory RNA-seq Transcriptome Profiling
NGS File formats Raw data from various vendors => various formats
GCC Workshop 9 RNA-Seq with Galaxy
Is the end of RNA-Seq alignment?
RNA Sequencing Day 7 Wooohoooo!
Integrative Genomics Viewer (IGV)
Using RNA-seq data to improve gene annotation
RNA-Seq analysis in R (Bioconductor)
Chip – Seq Peak Calling in Galaxy
S1 Supporting information Bioinformatic workflow and quality of the metrics Number of slides: 10.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Introductory RNA-Seq Transcriptome Profiling
Rod Eyles1, John Juma1, Morag Ferguson1, Trushar Shah1 1 IITA, Nairobi
Kallisto: near-optimal RNA seq quantification tool
The PATRIC RNASeq Service
Yonglan Zheng Galaxy Hands-on Demo Step-by-step Yonglan Zheng
Maximize read usage through mapping strategies
Yating Liu July 2018 G-OnRamp workshop
Regulatory Genomics Lab
Additional file 2: RNA-Seq data analysis pipeline
Transcriptomics – towards RNASeq – part III
Introduction to RNA-Seq & Transcriptome Analysis
Regulatory Genomics Lab
Chip – Seq Peak Calling in Galaxy
RNA-Seq Data Analysis UND Genomics Core.
Presentation transcript:

WS9: RNA-Seq Analysis with Galaxy (non-model organism ) David Crossman, Ph.D. UAB Heflin Center for Genomic Science GCC2012 Wednesday, July 25, 2012

Galaxy Splash Page

Random Galaxy icons/colors Download/Save Queued Running Completed Failed Icons Display data in browser Edit attributes Delete Edit dataset annotation View details Run this job again View in Trackster Edit dataset tags

Edit files in History *

Upload/Import Data Click “Get Data” Click “Upload File” 1 2 3a 3b-1 3b-2 Click “Get Data” Click “Upload File” Boxes to be aware of: File Format File to be uploaded: File from computer URL/text FTP Genome Click “Execute” 3b-3 3c 3d

Shared Data Click on “Shared Data” (located on top toolbar) 1 2 3 Click on “Shared Data” (located on top toolbar) Drop down box appears; click on “Data Libraries” Will see this Data Library. Click on it to expand (as shown)

Import Shared Data to Current History 1 2 3 Check boxes of files you want to import Choose “Import to current history” and then click “Go” Will see the files in the right-hand pane of the Galaxy window

Quality Control of raw fastq reads 1 2 3b * * 4 Click on “NGS: QC and manipulation” Click on “Fastqc: Fastqc QC Select options: This is what the window looks like when first opened Choose fastq file and give it a useful name Click “Execute” Do the exact same thing for the other 3 fastq files

FastQC Output Report This data looks awful because this is filtered data from a much larger fastq file. Better results when using entire file!

RNA-Seq Analysis Pipeline Aligns raw reads to reference genome, identifies splice junctions between exons. TopHat Cufflinks Assembles transcripts, and estimates their abundances. Cuffmerge Merges together several Cufflinks files. Finds significant changes in transcript expression, splicing, and promoter use. Cuffdiff

TopHat Click on “NGS: RNA Analysis” Click on “Tophat for Illumina” 1 3 2 Click on “NGS: RNA Analysis” Click on “Tophat for Illumina” Default window with options appears

TopHat 1 2a Select forward fastq read file Select reference genome: Choose “Use one from the history” Select the reference genome fasta Select “Paired-end” Select reverse fastq read file Input “150” (ask sequencing center for this info) Can choose “Default settings” or “Full parameter list” Click “Execute” Do the exact same thing for the other sample 2b 3 4 5 6 7

Note about FASTA files not already indexed in Galaxy If a FASTA is not indexed in Galaxy, then it is easy to upload the appropriate FASTA file into Galaxy. (Get Data -> Upload File) However, it can take up to 5 hours extra to run TopHat because Bowtie has to index your uploaded FASTA file (best to have your own instance of Galaxy) each time you run TopHat! Where do I go to get a non-model organism FASTA file? NCBI: http://www.ncbi.nlm.nih.gov/genome Ensembl: http://useast.ensembl.org/info/data/ftp/index.html iGenome: http://cufflinks.cbcb.umd.edu/igenomes.html Your favorite species website: http://www...

TopHat output files

Cufflinks Click on “NGS: RNA Analysis” Click on “Cufflinks” 1 3 2 Click on “NGS: RNA Analysis” Click on “Cufflinks” Default window with options appears

Cufflinks 1 Choose TopHat accepted hits file Perform quartile normalization (for this demo sample, choose “No”) Reference Annotation: For genomes in scaffolds, choose “Use reference annotation” Choose GTF file from history Perform Bias Correction (for this demo, choose “No”) Click “Execute” Do the exact same thing for the other TopHat accepted hits file 2 3a 3b 4

Note about GTF files for Cuff* If you use a GTF file from Ensembl, then you need to convert the chromosome column (column 1) to include ‘chr’ in front of the chromosome #. You can do this by: Using Jeremy’s published workflow “Make Ensembl GTF compatible with Cufflinks” in Galaxy: https://main.g2.bx.psu.edu/u/jeremy/w/make-ensembl-gtf-compatible-with-cufflinks Use ‘awk’ to add ‘chr’ to column 1 (if using Mac or Linux) Where do I go to get a GTF file? NCBI: http://www.ncbi.nlm.nih.gov/genome Ensembl: http://useast.ensembl.org/info/data/ftp/index.html iGenome: http://cufflinks.cbcb.umd.edu/igenomes.html Your favorite species website: http://www...

Cufflinks output files

Cuffmerge Click on “NGS: RNA Analysis” Click on “Cuffmerge” 1 2 Click on “NGS: RNA Analysis” Click on “Cuffmerge” Default window with options appears

Cuffmerge 1 Choose GTF file produced by Cufflinks Additional GTF Input Files: Click on “Add new Additional GTF Input Files” Choose other GTF file produced by Cufflinks Reference Annotation: Select “Yes” to Use Reference Annotation Choose GTF Reference Annotation file from history Sequence Data: Slect “Yes” to Use Sequence Data Choose “History” Choose FASTA file from history Click “Excecute” 2b 2a 3a 3b 4a 4b 4c 5

Cuffmerge output files

Cuffdiff Click on “NGS: RNA Analysis” Click on “Cuffdiff” 1 2 Click on “NGS: RNA Analysis” Click on “Cuffdiff” Default window with options appears

Cuffdiff Choose GTF transcript file from either Cuffmerge or Cuffcompare Perform replicate analysis: Choose “Yes” Click “Add new Group” Select a name to give the Group Choose TopHat accepted hits file associated with this Group If you have more than one TopHat accepted hits file associated with this Group, then click “Add new Replicate” Click “Add new Group” if you have another Group you want to add Select a False Discovery Rate cutoff Select the minimum # of reads that will align to a locus in order to perform significant testing Perform quartile normalization (for this demo, choose “No”) Perform bias correction (for this demo, choose “No”) Click “Execute” 1 2a 2c 2d 2e 2g 2h 2i 2b, 2f, 2j 3 4 5 6 7

Cuffdiff output files

Transcript differential expression testing output Gene differential expression testing output

Create genome in IGV 1 2 Click on “File” Choose “Import Genome”

Create genome in IGV Required Optional Need: FASTA file (required) Cytoband file Gene file (can use GTF) Alias file

Create genome in IGV Save the *.genome file

Create genome in IGV

Load aligned BAM files into IGV 1 2 Click “File” Choose “Load from File” Choose *.bam files (*.bai files need to be in the same folder

IGV

References and web links TopHat Trapnell C, Pachter L, Salzberg SL. TopHat: discovering splice junctions with RNA-Seq.Bioinformatics doi:10.1093/bioinformatics/btp120 http://tophat.cbcb.umd.edu/ Bowtie Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10:R25. http://bowtie-bio.sourceforge.net/index.shtml Cufflinks Trapnell C, Williams BA, Pertea G, Mortazavi AM, Kwan G, van Baren MJ, Salzberg SL, Wold B, Pachter L. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation Nature Biotechnology doi:10.1038/nbt.1621 Roberts A, Trapnell C, Donaghey J, Rinn JL, Pachter L. Improving RNA-Seq expression estimates by correcting for fragment bias Genome Biology doi:10.1186/gb-2011-12-3-r22 Roberts A, Pimentel H, Trapnell C, Pachter L.Identification of novel transcripts in annotated genomes using RNA-Seq Bioinformatics doi:10.1093/bioinformatics/btr355 http://cufflinks.cbcb.umd.edu/ TopHat and Cufflinks protocol Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, Pimentel H, Salzberg SL, Rinn JL, Pachter L. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks Nature Protocols 7, 562-578 (2012) doi:10.1038/nprot.2012.016 IGV http://www.broadinstitute.org/igv/

Thanks! Questions? Contact info: David K. Crossman, Ph.D. Bioinformatics Director Heflin Center for Genomic Science University of Alabama at Birmingham http://www.heflingenetics.uab.edu dkcrossm@uab.edu