Short Read Sequencing Analysis Workshop

Slides:



Advertisements
Similar presentations
MCB Lecture #15 Oct 23/14 De novo assemblies using PacBio.
Advertisements

IMGS 2012 Bioinformatics Workshop: RNA Seq using Galaxy
RNA-seq data analysis Project
RNA-seq Analysis in Galaxy
Bacterial Genome Assembly | Victor Jongeneel Radhika S. Khetani
Before we start: Align sequence reads to the reference genome
NGS Analysis Using Galaxy
Introduction to RNA-Seq and Transcriptome Analysis
Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING.
Genomics Virtual Lab: analyze your data with a mouse click Igor Makunin School of Agriculture and Food Sciences, UQ, April 8, 2015.
Transcriptome analysis With a reference – Challenging due to size and complexity of datasets – Many tools available, driven by biomedical research – GATK.
June 11, 2013 Intro to Bioinformatics – Assembling a Transcriptome Tom Doak Carrie Ganote National Center for Genome Analysis Support.
Introduction to RNA-Seq & Transcriptome Analysis
NGS data analysis CCM Seminar series Michael Liang:
Next Generation DNA Sequencing
Transcriptome Analysis
RNA-seq workshop ALIGNMENT
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
The iPlant Collaborative
Linux & Shell Scripting Small Group Lecture 3 How to Learn to Code Workshop group/ Erin.
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
RNA-Seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis is doing the.
Trinity College Dublin, The University of Dublin GE3M25: Data Analysis, Class 4 Karsten Hokamp, PhD Genetics TCD, 07/12/2015
The iPlant Collaborative
__________________________________________________________________________________________________ Fall 2015GCBA 815 __________________________________________________________________________________________________.
User-friendly Galaxy interface and analysis workflows for deep sequencing data Oskari Timonen and Petri Pölönen.
Short Read Workshop Day 1 - Experimental Design Example 1: How to log in to vieques.
Canadian Bioinformatics Workshops
Overview of Genomics Workflows
RNA Seq Analysis Aaron Odell June 17 th Mapping Strategy A few questions you’ll want to ask about your data… - What organism is the data from? -
Introductory RNA-seq Transcriptome Profiling of the hy5 mutation in Arabidopsis thaliana.
Setting up visualization. Make output folder for visualization files Log into vieques $ ssh
Canadian Bioinformatics Workshops
Short Read Workshop Day 1 - Experimental Design Example 1: How to log in to vieques.
From Reads to Results Exome-seq analysis at CCBR
Short Read Workshop Day 1 - Experimental Design Video 1- Why to short read sequence (or not)
Centralizing Bioinformatics Services: Analysis Pipelines, Opportunities, and Challenges with Large- scale –Omics, and other BigData High-Performance Computing.
Advanced Computing Facility Introduction
Welcome to Indiana University Clusters
Introductory RNA-seq Transcriptome Profiling
Using command line tools to process sequencing data
NGS File formats Raw data from various vendors => various formats
Day 5 Mapping and Visualization
Short Read Sequencing Analysis Workshop
Placental Bioinformatics
Cancer Genomics Core Lab
Dowell Short Read Class Phillip Richmond
RNA Sequencing Day 7 Wooohoooo!
Welcome to Indiana University Clusters
Integrative Genomics Viewer (IGV)
Dr. Christoph W. Sensen und Dr. Jung Soh Trieste Course 2017
S1 Supporting information Bioinformatic workflow and quality of the metrics Number of slides: 10.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Introductory RNA-Seq Transcriptome Profiling
GE3M25: Data Analysis, Class 4
Short Read Sequencing Analysis Workshop
ChIP-Seq Data Processing and QC
Material for today’s workshop is at:
Epigenetics System Biology Workshop: Introduction
Maximize read usage through mapping strategies
Garbage In, Garbage Out: Quality control on sequence data
Computational Pipeline Strategies
Introduction to RNA-Seq & Transcriptome Analysis
Campus and Phoenix Resources
Short Read Sequencing Analysis Workshop

RNA-Seq Data Analysis UND Genomics Core.
Quality Control & Nascent Sequencing
The Variant Call Format
Presentation transcript:

Short Read Sequencing Analysis Workshop Day 1 Introduction to the Workshop

Schedule for Week 1 Day 1: Introduction Day 2: Introduction to Linux Workshop syllabus and schedule Basic considerations for sequencing – depth, read length, format, etc. Test Logins Day 2: Introduction to Linux Understand how to use local machine to connect to a server/compute cluster Basic Linux commands, how to set up a command line VIM text editor Day 3: Cluster Usage and Data Transfer Head node vs. child nodes Job queues, modules Troubleshooting failed jobs Transferring/downloading files to the server Day 4: Sequencing QC Evaluating quality of sequencing data with FastQC Trimming reads, why and how Day 5: Skills Assessment

Schedule for Week 2 Day 6: Intro to Read Mapping and Visualization Understand how & why we map reads SAM/BAM file formats, Samtools Introduction to IGV for visualization Day 7: Resequencing and Variant Calling GATK best practices for variant detection Running GATK Post-processing vcf files Day 8: ChIP-Seq Analysis Different types of ChIP-Seq experiments and analysis ENCODE best practices Peak calling and motif calling tools Day 9: RNA-Seq and differential expression RNA-Seq basic considerations Splice-aware aligners Differential expression analysis Day 10: Using multiple datasets BEDtools

Reverse Classroom Before class: During class (9-11 am) After class: Watch Videos http://bficores.colorado.edu/biofrontiers-core-facility-workshops/workshops-in-the-series/short-read-2017-course-materials) During class (9-11 am) Examples with example files (fastq, sam, bam, bed) After class: Homework Get help in A104 1-3pm

You will be learning “command line” This is not a GUI Spelling and Capitalization matter! Google can help you learn “command line unix” The names for commands are not always intuitive (you will learn many this week)

Programs we will be using Text editor – VIM Sequencing quality – FastQC Read filtering/trimming – Trimmomatic Read Mapping – Bowtie2 Visualization – IGV Variant Calling – GATK toolkit, PicardTools, VCFTools Peak Calling (ChIP-Seq) – MACS Motif calling – MEME Suite, TomTom RNA-Seq mapping (splice-aware) – Tophat2 Differential Expression – HTseq, DESeq, Cufflinks Other programs: Samtools, BEDTools

What this workshop won’t cover De novo assembly (genome or transcriptome) 16s metagenomics Shotgun metagenomics Long-read data analysis Specialized analysis beyond the basics Coding Biostatistics How to analyze YOUR data

If you have a problem on Tortuga Do you need to be logged into the VPN? Check for typos. Check your input directories and output directories. Did you load the necessary modules? Check for typos again.

If you have a problem on Tortuga Still having trouble? Email bit-help@colorado.edu to submit a ticket Put “SR2017” in the subject line Include your IdentiKey UserID somewhere in the email If you are referencing a specific job (running or past), include the job number in the ticket and PBS script used to submit the job Be as specific as possible description of the problem Include the exit and error files or the path on vieques to the e and o files. Please be courteous. BIT graciously gives their time for this workshop, we couldn’t do it without them

Important Considerations for Bioinformatics Keep a digital notebook: What programs and version are being used What options or variables are being used What order programs are being called in the pipeline What compute environment If you don’t know these answers and report them in publication, your work will not be repeatable!

Bioinformatics is not “free” Compute Resources Storage (and Backup) Analysis time Expertise Extracting meaningful information requires more than basic bioinformatics And we haven’t even discussed biostatistics... http://massgenomics.org/2015/10/ngs-analysis-not-free.html

There is no One Right Answer Every algorithm/pipeline is different Different assumptions Different order of processing can have significant effect Different options can have a significant effect Most projects will require some customization or further processing to get useful data Zhang ZH, et al. (2014) PLoS ONE 9(8)

When to think about your analysis Don’t wait until you have the data to think about bioinformatics Important experimental considerations Controls, replicates, sequencing depth, library prep, etc. all effect what type of analysis and how powerful it is to tell you something But if you did... All is not necessarily lost Many bad data sets have useful information, can be harder to find and may limit what you can find

Additional Acknowledgments Acknowledgements Mary Allen Next Generation Seq Core Funding: BioFrontiers Institute and Colorado Office of Economic Development and International Trade Compute Resources: BioFrontiers IT Staff Additional Acknowledgments Robin Dowell and Dowell Lab Volunteers ©2017