Presentation is loading. Please wait.

Presentation is loading. Please wait.

Canadian Bioinformatics Workshops www.bioinformatics.ca.

Similar presentations


Presentation on theme: "Canadian Bioinformatics Workshops www.bioinformatics.ca."— Presentation transcript:

1 Canadian Bioinformatics Workshops www.bioinformatics.ca

2

3 Module 2 SNP & short-INDEL Discovery

4 SNP and short-INDEL Discovery bioinformatics.ca Genetic Variations: SNPs & INDELs

5 SNP and short-INDEL Discovery bioinformatics.ca SNP Discovery: Goal sequencing errors SNP

6 SNP and short-INDEL Discovery bioinformatics.ca SNP Discovery: Base Qualities High qualityLow quality

7 SNP and short-INDEL Discovery bioinformatics.ca SNPs & Bayesian Statistics base quality# of individualsallele call in read

8 SNP and short-INDEL Discovery bioinformatics.ca Genotyping & Consensus Generation AACGTTAGCATA strain 1 [A] strain 2 [C] strain 3 [A] haploid individual 1 [A/C] individual 3 [A/A] individual 2 [C/C] diploid AACGTTCGCATA AACGTTAGCATA AACGTTCGCATA AACGTTAGCATA

9 SNP and short-INDEL Discovery bioinformatics.ca Handling Trios Take advantage of duplicate data De novo mutation rate

10 SNP and short-INDEL Discovery bioinformatics.ca 1000G Consortium July 2010

11 SNP and short-INDEL Discovery bioinformatics.ca The power of imputation # of variant genotype calls # of incorrect variant genotype calls 1000G Consortium July 2010

12 SNP and short-INDEL Discovery bioinformatics.ca Nielsen et al June 2011

13 SNP and short-INDEL Discovery bioinformatics.ca BAM files Raw variants (VCF) Filtered variants (VCF) 200 GB 1 GB samtools GATK unified genotyper freeBayes glfMultiples samtools GATK unified genotyper freeBayes glfMultiples Expert user judgment GATK variant filtration 10 hours days 30 min Adapted from Mark DePristo Broad Institute February 2010 File size File format Tools Time Recalibrated BQ, duplicates removed Sites with non-reference bases are genotyped Separate true segregating variation from machine/alignment artifacts

14 SNP and short-INDEL Discovery bioinformatics.ca QC: HapMap & dbSNP International HapMap Project (phase III) – 1301 individuals in 11 populations genotyped – ~1 SNP per 2 kb – Proxy for false negatives dbSNP (build 130) – 14 million SNPs in human genome – Varying quality – Proxy for false positives

15 SNP and short-INDEL Discovery bioinformatics.ca QC: Coverage Auton & Hernandez Cornell University June 2009

16 SNP and short-INDEL Discovery bioinformatics.ca QC: Inter-SNP Distance

17 SNP and short-INDEL Discovery bioinformatics.ca QC: Hardy-Weinberg Violations Auton & Hernandez Cornell University June 2009 HapMap sites in red, other sites in blue. CEU, P(seg)>0.5, coverage 2-5x

18 SNP and short-INDEL Discovery bioinformatics.ca QC: Other metrics P(SNP) – Determining at the optimal P(SNP) threshold Transitions:transversions – Adjusting filters so that the ratio approaches 2

19 SNP and short-INDEL Discovery bioinformatics.ca Using multiple QC metrics Mark DePristo Broad Institute February 2010

20 SNP and short-INDEL Discovery bioinformatics.ca VCF ##fileformat=VCFv4.0 #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NA00002 NA00003 20 14370 rs6054257 G A 29 PASS NS=3;DP=14;AF=0.5;DB;H2 GT:GQ:DP:HQ 0|0:48:1:51,51 1|0:48:8:51,51 1/1:43:5:.,. Genotype Genotype quality Read depth Haplotype qualities # samples Combined depth Allele frequency In dbSNP? In HapMap2?

21 SNP and short-INDEL Discovery bioinformatics.ca VCF Mark DePristo Broad Institute February 2010

22 SNP and short-INDEL Discovery bioinformatics.ca Experimental Design: Tools BAM files BQ recalibration Duplicate filtering SNP discovery (samtools) SNP discovery (GATK) View SNPs and INDELs (igv)

23 SNP and short-INDEL Discovery bioinformatics.ca


Download ppt "Canadian Bioinformatics Workshops www.bioinformatics.ca."

Similar presentations


Ads by Google