Variant Calling Workshop Chris Fields Variant Calling Workshop v2 | Chris Fields1 Powerpoint by Casey Hanson.

Slides:



Advertisements
Similar presentations
Enrichment Map GSEA Tutorial
Advertisements

PayDox applications All features can be used independently.
1 Notice Last Date to submit your project: 29 th December 2007 Major Exam 2: 31 st December Lab time Material: Chapters 5-10 ( 5,6,7,8,9 and 10)
Linux & Shell Scripting Small Group Lecture 4 How to Learn to Code Workshop group/ Erin.
Working with SharePoint Document Libraries. What are document libraries? Document libraries are collections of files that you can share with team members.
Bacterial Genome Assembly | Victor Jongeneel Radhika S. Khetani
Using Macs and Unix Nancy Griffeth January 6, 2014 Funding for this workshop was provided by the program “Computational Modeling and Analysis of Complex.
NGS Analysis Using Galaxy
Refworks Presented by Margaret Clark, Reference Librarian FSU College of Law Library September 20, 2005.
Introduction to UNIX/Linux Exercises Dan Stanzione.
Introduction to RNA-Seq and Transcriptome Analysis
Copyright 2007, Information Builders. Slide 1 Maintain & JavaScript: Two Great Tools that Work Great Together Mark Derwin and Mark Rawls Information Builders.
Polymorphism and Variant Analysis Lab
Customized cloud platform for computing on your terms !
Variant Calling Workshop Chris Fields Variant Calling Workshop | Chris Fields | PowerPoint by Casey Hanson.
Bacterial Genome Assembly C. Victor Jongeneel Bacterial Genome Assembly | C. Victor Jongeneel | PowerPoint by Casey Hanson.
National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Variant Calling Workshop.
Guideline for ClinLabGeneticist tool Jinlian Wang
®® Microsoft Windows 7 for Power Users Tutorial 13 Using the Command-Line Environment.
Nancy Severe-Barnett Program Coordinator, SCIS
ECT 250: Survey of E-Commerce Technology FrontPage Publishing pages Unix.
Introduction to RNA-Seq & Transcriptome Analysis
Galaxy for Bioinformatics Analysis An Introduction TCD Bioinformatics Support Team Fiona Roche, PhD Date: 31/08/15.
C# Tutorial -1 ASP.NET Web Application with Visual Studio 2005.
Polymorphism & Variant Analysis Lab Saurabh Sinha Polymorphism and Variant Analysis Lab v1 | Saurabh Sinha 1 Powerpoint by Casey Hanson.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics Lab v1 | Saurabh Sinha1 Powerpoint by Casey Hanson.
Tools Menu and Other Concepts Alerts Event Log SLA Management Search Address Space Search Syslog Download NetIIS Standalone Application.
Basic Microbiome Analysis with QIIME
Diagnostic Pathfinder for Instructors. Diagnostic Pathfinder Local File vs. Database Normal operations Expert operations Admin operations.
Chip – Seq Peak Calling in Galaxy Lisa Stubbs Chip-Seq Peak Calling in Galaxy | Lisa Stubbs | PowerPoint by Casey Hanson.
ChrGeneticist introduction for reviewer Jinlian Wang 10/8/2014.
A Genomics View of Unix. General Unix Tips To use the command line start X11 and type commands into the “xterm” window A few things about unix commands:
An Introduction to Designing, Executing and Sharing Workflows with Taverna Katy Wolstencroft myGrid University of Manchester IMPACT/Taverna Hackathon 2011.
National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Variant Calling Workshop.
The UCSC Table Browser & Custom Tracks Advanced searching and discovery using the UCSC Table Browser and Custom Tracks Osvaldo Graña CNIO Bioinformatics.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics | Saurabh Sinha | PowerPoint by Casey Hanson.
GVS: Genome Variation Server Materials prepared by: Warren C. Lathe, PhD Updated: Q Version 2.
XP New Perspectives on Microsoft Office FrontPage 2003 Tutorial 7 1 Microsoft Office FrontPage 2003 Tutorial 8 – Integrating a Database with a FrontPage.
Using This PowerPoint This PowerPoint presentation assumes your Computer Science teacher has provided you with the InstallingJava folder, which contains.
Lesson 3-Touring Utilities and System Features. Overview Employing fundamental utilities. Linux terminal sessions. Managing input and output. Using special.
Tutorial 6 High Throughput Sequencing. HTS tools and analysis Review of resequencing pipeline Visualization - IGV Analysis platform – Galaxy Tuning up.
Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented.
Copyright OpenHelix. No use or reproduction without express written consent1.
Computing on TSCC Make a folder for the class and move into it –mkdir –p /oasis/tscc/scratch/username/biom262_harismendy –cd /oasis/tscc/scratch/username/biom262_harismendy.
Welcome to the combined BLAST and Genome Browser Tutorial.
Chip – Seq Peak Calling in Galaxy Lisa Stubbs Lisa Stubbs | Chip-Seq Peak Calling in Galaxy1.
Active-HDL Server Farm Course 11. All materials updated on: September 30, 2004 Outline 1.Introduction 2.Advantages 3.Requirements 4.Installation 5.Architecture.
Learning Unix/Linux Based on slides from: Eric Bishop.
IGV Demo Slides:/g/funcgen/trainings/visualization/Demos/IGV_demo.ppt Galaxy Dev: 0.
Visualizing data from Galaxy
Protein Sequence, Structure, and Function Lab Gustavo Caetano - Anolles Protein Sequence, Structure, and Function Lab v1 | Gustavo Caetano - Anolles 1.
Bacterial Genome Assembly Tutorial: C. Victor Jongeneel Bacterial Genome Assembly v9 | C. Victor Jongeneel1 Powerpoint: Casey Hanson.
Canadian Bioinformatics Workshops
Day 5 Mapping and Visualization
Dowell Short Read Class Phillip Richmond
Integrative Genomics Viewer (IGV)
NGS Analysis Using Galaxy
Regulatory Genomics Lab
Bacterial Genome Assembly
Variant Calling Workshop
Getting Started with R.
How to access your work from home or another computer
Bacterial Genome Assembly
Linux + Galaxy Server Tutorial
Yonglan Zheng Galaxy Hands-on Demo Step-by-step Yonglan Zheng
Using the Omega3P Eigensolver
Regulatory Genomics Lab
Linux + Genome Assembly Tutorial
Introduction to RNA-Seq & Transcriptome Analysis
Regulatory Genomics Lab
Presentation transcript:

Variant Calling Workshop Chris Fields Variant Calling Workshop v2 | Chris Fields1 Powerpoint by Casey Hanson

Introduction In this lab, we will do the following: 1.Perform variant calling analysis on the IGB biocluster. 2.Visualize our results on the desktop using the Integrative Genomics Viewer (IGV) tool. Variant Calling Workshop v2 | Chris Fields2

Step 0A: Accessing the IGB Biocluster Open Putty.exe In the hostname textbox type: biocluster.igb.Illinois.edu Click Open If popup appears, Click Yes Enter login credentials assigned to you; example, user class45. Variant Calling Workshop v2 | Chris Fields3 Now you are all set!

Step 0B: Lab Setup The lab is located in the following directory: ~/mayo/fields This directory contains the finished version of the lab (i.e. the version of the lab after the tutorial). Consult it if you unsure about your runs. You don’t have write permissions to the lab directory. Create a working directory of this lab in your home directory for your output to be stored. Note ~ is a symbol in Unix paths referring to your home directory. Copy the necessary shell files (.sh) files from ~/mayo/fields to your working directory. Note: in this lab, we will NOT login to a node on the biocluster. Instead, we will submit jobs to the biocluster. Variant Calling Workshop v2 | Chris Fields4

Step 0C: Lab Setup Variant Calling Workshop v2 | Chris Fields5 $ mkdir ~/fields # Make working directory in your home directory $ cp ~/mayo/fields/*.sh ~/fields # Copy shell files to your working directory. Create a working directory called ~/fields in your home directory. Copy all shell files (.sh) from ~/mayo/fields to your working directory. Copied Files annotate_snpeff.sh call_variants_ug.sh hard_filtering.sh post_annotate.sh

Step 0D: Shared Desktop Directory For viewing and manipulating files on the classroom computers, we provide a shared directory in the following folder on the desktop: classes/mayo In today’s lab, we will be using the following folder in the shared directory: classes/mayo/fields Variant Calling Workshop v2 | Chris Fields6

Variant Calling Setup In this exercise, we will use data from the 1000 Genomes project (WGS, 60x coverage) to call variants, in particular single nucleotide polymorphisms. The initial part of the GATK pipeline (alignment, local realignment, base quality score recalibration) has been done, and the BAM file has been reduced for a portion of human chromosome 20. This is the data we will be working with in this exercise. Variant Calling Workshop v2 | Chris Fields7

Step 1A: Running a Variant Calling Job In this step, we will start a variant calling job using the qsub command. Additionally, we will gather statistics about our job using the qstat command. Variant Calling Workshop v2 | Chris Fields8 $ cd ~/fields # Change directory to your working directory. $ qsub -q classroom –l ncpus=4 call_variants_ug.sh # This will execute call_variants_ug.sh on the biocluster. $ qstat –u # Get statistics on your submitted job

Step 1B: Output of Variant Calling Job Periodically, call qstat to see if your job has finished. You should have 4 files when it has completed. Discussion What did we just do? We ran the GATK UnifiedGenotyper to call variants. Look at file structure. Variant Calling Workshop v2 | Chris Fields9 Files raw_indels.sh raw_indels.vcf.idx raw_snps.vcf raw_snps.vcf.idx

Step 1C: SNP and Indel Counting In this step, we will count the # of SNPS and Indels identified in the raw_snps.vcf and raw_indels.vcf files. We will use the program grep, which is a text matching program. Variant Calling Workshop v2 | Chris Fields10 $ grep –c –v ‘^#’ raw_snps.vcf # Get the number of SNPs. # -v Tells grep to show all lines not beginning with # in raw_snps.vcf. # -c Tells grep to return the total number of returned lines. # Output should be $ grep –c –v ‘^#’ raw_indels.vcf # Get the number of indels. # Output should be 1070.

Step 1D: SNP and Indel Counting in dbSNP In this step, we will count the number of SNPs and Indels in dbSNP. dbSNP SNPs and Indels have the rs# identifier where # is a number. Example: rs1000 Variant Calling Workshop v2 | Chris Fields11 $ grep –c ‘rs[0-9]*’ raw_snps.vcf # Get the number of dbSNP SNPs. # Return all lines in raw_snps.vcf containing rs followed by a number. # -c Tells grep to return the total number of returned lines. # Output should be $ grep –c ‘rs[0-9]*’ raw_indels.vcf # Get the number of dbSNP indels. # Output should be 1019.

Step 2A: Hard Filtering Variant Calls We need to filter these variant calls in some way. In general, we would filter on quality scores. However, since we have a very small set of variants, we will use hard filtering. Variant Calling Workshop v2 | Chris Fields12 $ qsub -q classroom –l ncpus=4 hard_filtering.sh # This will execute hard_filtering.sh on the biocluster. $ qstat –u Output Files hard_filtered_snps.v cf hard_filtered_indels. vcf Periodically, call qstat to see if your job has finished.

Step 2B: Hard Filtering Variants Calls In this step, we will count the # of filtered SNPs and Indels. Variant Calling Workshop v2 | Chris Fields13 $ grep –c ‘PASS’ hard_filtered_snps.vcf # Count # of passes # Output 8270 $ grep –c ‘PASS’ hard_filtered_indels.vcf # Count # of PASSES # Output 1041 Discussion 1.Did we lose any variants? 2.How many PASSED the filter? 3.What is the difference in the filtered and raw input?

Step 3A: Annotating Variants With SnpEff Wit our filtered variants, we now need to annotate them with SnpEff. SnpEff adds information about where variants are in relation to specific genes. Variant Calling Workshop v2 | Chris Fields14 $ qsub -q classroom –l ncpus=4 annotate_snpeff.sh # This will execute snpeff.sh on the biocluster. $ qstat –u Output Files hard_filtered_snps_annotated.vcf hard_filtered_indels_annotate d.vcf Periodically, call qstat to see if your job has finished.

Step 3B: Annotating Variants With SnpEff The IDs for the human assembly version we us are from Ensemble. The Ensemble format is ENSGXXXXXXXXXXX. Example: FOXA2’s Ensemble ID is ENSG In this step, we would like to see if there are any variants of FOXA2. Variant Calling Workshop v2 | Chris Fields15 $ grep –c ‘ENSG ’ hard_filtered_snps_annotated.vcf # Get the number of SNPS in FOXA2, ENSG # Output should be 3. $ grep –c ‘ENSG ’ hard_filtered_indels_annotated.vcf # Get the number of Indels in FOXA2, ENSG # Output should be 0.

Step 4: GATK Variant Annotator SnpEff adds a lot of information to the VCF. GATK Variant Annotator helps remove a lot of the extraneous information. Variant Calling Workshop v2 | Chris Fields16 $ qsub -q classroom –l ncpus=4 post_annotate.sh # This will execute post_annotate.sh on the biocluster. $ qstat –u Periodically, call qstat to see if your job has finished.

Step 5A: Visualization With IGV Variant Calling Workshop v2 | Chris Fields17 Switch the genome to Human (b37).

Step 5B: Viewing Results in IGV Load the VCF files marked ‘final’ and Click on the ‘20’ (chr 20). Variant Calling Workshop v2 | Chris Fields18

Step 5C: Viewing Results in IGV Click and drag from the ‘20 mb’ mark to roughly the centromeric region on the chromosome (~2.6 mb) Variant Calling Workshop v2 | Chris Fields19

Step 5D: Viewing Results in IGV The result should look similar to the screenshot below: Variant Calling Workshop v2 | Chris Fields20

Step 5E: Viewing Results in IGV Right click on the track to bring up a menu, and then select Set Feature Visibility Window. Set to ‘ ’ (10 million, or 10 Mb) Variant Calling Workshop v2 | Chris Fields21

Step 5F: Viewing Results in IGV In the search box, enter in ‘FOXA2’. Variant Calling Workshop v2 | Chris Fields22

Checkpoint: Viewing Results in IGV 1.How many SNPs are here? 2.How many Indels are here? 3.How many SNPs are ‘hets’? Variant Calling Workshop v2 | Chris Fields23

Step 6A: Viewing Results in IGV Now we’ll load in a ‘small’ BAM file.This is the same BAM file you analyzed on the cluster Load the following BAM file in classes/mayo/fields: NA12878.HiSeq.WGS.bwa.cleaned.recal.b37.20_arm1.bam What is happening here? Variant Calling Workshop v2 | Chris Fields24

Step 6B: Viewing Results in IGV Right click on BAM track and select ‘Show coverage track’. Note colors in coverage track. Variant Calling Workshop v2 | Chris Fields25

Step 6C: Viewing Results in IGV Zoom in to regions with SNPs Mouse-over the SNP regions in the VCF tracks to get more information (including annotation from SnpEff and VariantAnnotator) Variant Calling Workshop v2 | Chris Fields26

Browse around… How are filtered SNPs are displayed? How about low-quality bases? Can you change the display to sort BAM reads by strand? Variant Calling Workshop v2 | Chris Fields27

Step 7: SnpEff Results SnpEff gives a nice summary HTML file that should have been downloaded with your result (one for SNP, one for indel results) Open those up in a browser. Variant Calling Workshop v2 | Chris Fields28

Step 8: Viewing Results in IGV This gives us raw reads, but we need to calculate coverage. Let’s use igvtools. Set command to ‘Count’, then select input file (BAM file). Hit ‘Run’ Variant Calling Workshop v2 | Chris Fields29