Presentation is loading. Please wait.

Presentation is loading. Please wait.

BST 775 Lecture PLINK – A Popular Toolset for GWAS Guodong Wu SSG, Department of Biostatistics University of Alabama at Birmingham September 24, 2013.

Similar presentations


Presentation on theme: "BST 775 Lecture PLINK – A Popular Toolset for GWAS Guodong Wu SSG, Department of Biostatistics University of Alabama at Birmingham September 24, 2013."— Presentation transcript:

1 BST 775 Lecture PLINK – A Popular Toolset for GWAS Guodong Wu SSG, Department of Biostatistics University of Alabama at Birmingham September 24, 2013

2 Designed for GWAS and population-based linkage analysis. Developed by Shaun Purcell*, current version V Why the toolset is so popular? Store the GWAS data sets, which is too large for SAS, R, or other statistical packages. Well developed guideline and toolsets for Dataset Management and Quality Control Platform for various association methods Overview * Purcell et al 2007, AJHG

3 Overview Data management Summary statistics Quality Control Association Test

4 Summary statistics and quality control Assessment of population stratification Further exploration of ‘hits’ Visualization and follow-up Whole genome SNP-based association GeneChip Scanner Cell Intensity Files for each chip Phenotype, sex and other covariates Experimental Design & Sample Collection PLINK in GWAS workflow

5 Data Format P1 A A A C C G T T A A T T P2 A C A A C G G T A C T T P3 C C A C G G T T A A T T P4 C C A A G G G T A A T T P1 A A A C C G T T A A T T P2 A C A A C G G T A C T T P3 C C A C G G T T A A T T P4 C C A A G G G T A A T T ←People SNPs → PED and MAP format 1 snp X snp Y snp XY snp MT snp snp X snp Y snp XY snp MT snp Transposed format S1 A A A C C C C C S2 A C A A A C A A S3 C G C G G G G G S4 T T C G T T G T S5 A A G T A A A A S6 T T A C T T T T S1 A A A C C C C C S2 A C A A A C A A S3 C G C G G G G G S4 T T C G T T G T S5 A A G T A A A A S6 T T A C T T T T ←SNPs People → P1 … P2 … P3 … P4 … P5 … P1 … P2 … P3 … P4 … P5 … SNP information People information Compact binary format

6 Data management Recode dataset (A,C,G,T → 1,2) Reorder, reformat dataset Flip DNA strand Extract/remove individuals/SNPs New phenotypes, covariates as extra file Merge 2 or more data sets

7 Summary and QC Hardy-Weinberg test Mendel errors Missing genotypes Allele frequencies Tests of non-random missingness –by phenotype and by (unobserved) genotype Sex Check Pairwise IBD estimates

8 Mendel errors plink --file data --hardy An exact test by default. In Case control study, the Control group typically needs more lenient threshold (eg. P-value < 1e-3)

9 Mendel errors plink --file data --mendel Genotyping error when child’s genotype is not inherited from the parents, according to mendel’s law Output as Output the error rate for each SNP and each individual Code Pat, Mat -> Offspring Code Pat, Mat -> Offspring 1 AA, AA -> AB 1 AA, AA -> AB 2 BB, BB -> AB 2 BB, BB -> AB 3 BB, ** -> AA 3 BB, ** -> AA 4 **, BB -> AA 4 **, BB -> AA 5 BB, BB -> AA 5 BB, BB -> AA 6 AA, ** -> BB 6 AA, ** -> BB 7 **, AA -> BB 7 **, AA -> BB 8 AA, AA -> BB 8 AA, AA -> BB Code Pat, Mat -> Offspring Code Pat, Mat -> Offspring 1 AA, AA -> AB 1 AA, AA -> AB 2 BB, BB -> AB 2 BB, BB -> AB 3 BB, ** -> AA 3 BB, ** -> AA 4 **, BB -> AA 4 **, BB -> AA 5 BB, BB -> AA 5 BB, BB -> AA 6 AA, ** -> BB 6 AA, ** -> BB 7 **, AA -> BB 7 **, AA -> BB 8 AA, AA -> BB 8 AA, AA -> BB

10 Missingness and Allele Frequency plink --file data --missing Output the missing rate per SNP and per individual. plink --file data --freq Output each SNP’s allele frequency

11 Is the missingness random? plink --file data –-test-missing Test whether the SNP is randomly missing between case and control status. plink --file data -–test-mishap Test whether the SNP is randomly missing based on observed genotyped nearby SNPs. Assume dense SNP genotyping. Use haplotype and LD information in tests.

12 Sex Check plink --file data –check-sex Use X chromosome data heterozygosity rates to determine sex, and then compare with the observed sex.

13 Pairwise IBD sharing (relatedness) ABAB ACAC ABAB ACAC IBS = 1 IBD = 0 Parents Most recent common ancestor from homogeneous random mating population AB AC PLINK tutorial, October 2006; Shaun Purcell,

14 plink --file data –-genome Relatedness Check The Genome-wide information, typically do not need whole-genome SNPs. Typically 100K independent SNPs are enough.

15

16 Association methods in PLINK Population-based – Allelic, trend, genotypic, Fisher ’ s exact – Stratified tests (Cochran-Mantel-Haenszel, Breslow-Day) – Linear & logistic regression models multiple covariates, interactions, joint tests, etc Family-based – Disease traits: TDT / sib-TDT – Continuous traits: QFAM (between/within model, QTDT) Permutation procedures –“ adaptive ”, max(T), gene-dropping, between/within, rank-based, within-cluster Multilocus tests – Haplotype estimation, set-based tests, Hotelling ’ s T 2, epistasis

17 An Example: logistic Regression plink --maf exclude nonautosomalSNPs.txt --out AllAssoc --bfile bdata --remove exclusions.txt --logistic -- hide-covar --pheno IChipCovs.txt --pheno-name cas_con --covar IChipCovs.txt --covar-name Sex,EurAdmix

18 An Example: logistic Regression Result

19 Cardinal rules in PLINK Always consult the log file, console output Also consult the web documentation –regularly PLINK has no memory –each run loads data anew, previous filters lost Exact syntax and spelling is important –“minus minus” … PLINK tutorial, October 2006; Shaun Purcell,


Download ppt "BST 775 Lecture PLINK – A Popular Toolset for GWAS Guodong Wu SSG, Department of Biostatistics University of Alabama at Birmingham September 24, 2013."

Similar presentations


Ads by Google