Presentation is loading. Please wait.

Presentation is loading. Please wait.

Genome STRiP ASHG Workshop demo materials

Similar presentations


Presentation on theme: "Genome STRiP ASHG Workshop demo materials"— Presentation transcript:

1 Genome STRiP ASHG Workshop demo materials
Bob Handsaker October 19, 2014

2 Running Genome STRiP directly on AWS

3 Cloud demo: Genome STRiP command line
StarCluster Cloud Storage Sequencing data Amazon Web Services Genome STRIP

4 Cloud computing scenarios
Why are people interested in Genome STRiP on the cloud? Increase compute and storage capacity for large-scale processing Large genome studies Economical and with short lead time Utilize data sets that are stored in the cloud Public data sets (e.g Genomes) Data sharing with collaborators No need to download bulky data to each site

5 Cookbook recipe: Genotyping in 1000 Genomes Phase 1
Inputs A site VCF file describing the variants (e.g. large deletions) to genotype Outputs Genotype VCF file Plots for quality control 1000 Genomes Data You choose the BAM file location: Cached copy on Amazon S3 storage HTTP from NCBI or EBI StarCluster Uses the StarCluster software from MIT for Amazon EC2 provisioning

6 Demo Show input vcf file in local directory
starcluster put gs-cluster example.vcf example.vcf starcluster sshmaster gs-cluster ./genotype-sites.sh example.vcf run1 (show output) (log out) starcluster get gs-cluster run1 run1 Show vcf in textedit Show genotyping plot pdf

7 Cloud computing support in Genome STRiP
Remote BAM file access Support for multiple file access protocols in addition to local files HTTP / HTTPS FTP Amazon S3 protocol Pre-computed metadata for 1000 Genomes Phase 1 and Phase 3 Eliminates the need to run Genome STRiP preprocessing Avoids the need to download the 1000 Genomes BAM files Metadata is relatively compact: 5Gb (Phase1) and 13Gb (Phase 3) ftp://ftp.broadinstitute.org/pub/svtoolkit/public_metadata/ Cookbook recipes for common scenarios Genotyping variants in 1000 Genomes samples

8 Genome STRiP cookbook

9 Sample genotyping output
Standard VCF file with sample genotypes ##fileformat=VCFv4.1 #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT HG HG00100 DEL_2_ A <DEL> END= GT:FT:GQ 0/0:PASS:71 0/1:PASS:14 Genotyping plot for visual verification Histogram of normalized read depth Colors indicate confident calls (gray samples are below 95% confidence) Small numbers on plot indicate evidence from read pairs or split reads

10 Command summary starcluster start gs-cluster -s 1
starcluster put gs-cluster example.vcf example.vcf starcluster sshmaster gs-cluster ./genotype_sites.sh example.vcf run1 starcluster get gs-cluster run1 run1 starcluster terminate gs-cluster Launch Amazon compute cluster Copy input file from local to cloud Log in to remote cluster Run genotyping command script Copy output files from cloud to local Shut down compute cluster

11 For more information …. Bonus evening session
Tonight (Monday) 6:30 – 8:00 PM Room 24, Upper Level Web site Support forum (Genome STRiP topic in GATK forum) AWS Support In Genome STRiP Seva Kashin Poster 603 T (Tuesday afternoon) Multi-allelic copy number variation in humans Early look at upcoming Genome STRiP functionality for duplications and multi-allelic CNVs

12

13 Intro Slides for Gabor

14 Genome STRiP Genome STRucture in Populations
Integrates multiple features of sequence data with population-based patterns across many individuals Handsaker, R.E., Korn, J.M., Nemesh, J. & McCarroll, S.A. Discovery and genotyping of genome structural polymorphism by sequencing on a population scale. Nat Genet 43, (2011)

15 Genome STRiP Structural variation analysis from sequence data
Integrative Combines multiple feature of the sequence data (read pairs, read depth, split reads) Integrative approaches have consistently shown higher accuracy Population-aware Increases power and accuracy Particularly important for low-coverage genomes Modular architecture Discovery of new variants Genotyping of newly discovered variants and/or known variants Includes tools for QC / analysis Initial prototype developed for analyses in 1000 Genomes Project Low false discovery rate and high sensitivity

16 Demo Slides


Download ppt "Genome STRiP ASHG Workshop demo materials"

Similar presentations


Ads by Google