Presentation is loading. Please wait.

Presentation is loading. Please wait.

BST 775 Lecture PLINK – A Popular Toolset for GWAS

Similar presentations

Presentation on theme: "BST 775 Lecture PLINK – A Popular Toolset for GWAS"— Presentation transcript:

1 BST 775 Lecture PLINK – A Popular Toolset for GWAS
Guodong Wu SSG, Department of Biostatistics University of Alabama at Birmingham September 24, 2013

2 Overview Designed for GWAS and population-based linkage analysis.
Developed by Shaun Purcell*, current version V1.07. Why the toolset is so popular? Store the GWAS data sets, which is too large for SAS, R, or other statistical packages. Well developed guideline and toolsets for Dataset Management and Quality Control Platform for various association methods * Purcell et al 2007, AJHG

3 Overview Data management Summary statistics Quality Control
Association Test

4 PLINK in GWAS workflow Experimental Design & Sample Collection
Cell Intensity Files for each chip GeneChip Scanner Summary statistics and quality control Phenotype, sex and other covariates Assessment of population stratification Whole genome SNP-based association Further exploration of ‘hits’ Visualization and follow-up

5 Data Format PED and MAP format Transposed format SNP information
SNPs → SNP information 1 snp X snp Y snp XY snp MT snp P1 A A A C C G T T A A T T P2 A C A A C G G T A C T T P3 C C A C G G T T A A T T P4 C C A A G G G T A A T T ←People Transposed format People → People information S1 A A A C C C C C S2 A C A A A C A A S3 C G C G G G G G S4 T T C G T T G T S5 A A G T A A A A S6 T T A C T T T T ←SNPs P1 … P2 … P3 … P4 … P5 … Compact binary format

6 Data management Recode dataset (A,C,G,T → 1,2)
Reorder, reformat dataset Flip DNA strand Extract/remove individuals/SNPs New phenotypes, covariates as extra file Merge 2 or more data sets

7 Summary and QC Hardy-Weinberg test Mendel errors Missing genotypes
Allele frequencies Tests of non-random missingness by phenotype and by (unobserved) genotype Sex Check Pairwise IBD estimates

8 Mendel errors An exact test by default.
plink --file data --hardy An exact test by default. In Case control study, the Control group typically needs more lenient threshold (eg. P-value < 1e-3)

9 Mendel errors plink --file data --mendel Genotyping error when child’s genotype is not inherited from the parents, according to mendel’s law Output as Output the error rate for each SNP and each individual Code Pat , Mat -> Offspring AA , AA -> AB BB , BB -> AB BB , ** -> AA ** , BB -> AA BB , BB -> AA AA , ** -> BB ** , AA -> BB AA , AA -> BB

10 Missingness and Allele Frequency Output each SNP’s allele frequency
plink --file data --missing Output the missing rate per SNP and per individual. plink --file data --freq Output each SNP’s allele frequency

11 Is the missingness random?
plink --file data –-test-missing Test whether the SNP is randomly missing between case and control status. plink --file data -–test-mishap Test whether the SNP is randomly missing based on observed genotyped nearby SNPs. Assume dense SNP genotyping. Use haplotype and LD information in tests.

12 Sex Check plink --file data –check-sex Use X chromosome data heterozygosity rates to determine sex, and then compare with the observed sex.

13 Pairwise IBD sharing (relatedness)
Most recent common ancestor from homogeneous random mating population Parents AB AC AB AC IBS = 1 IBD = 0 AB AC PLINK tutorial, October 2006; Shaun Purcell,

14 Relatedness Check plink --file data –-genome The Genome-wide information, typically do not need whole-genome SNPs. Typically 100K independent SNPs are enough.


16 Association methods in PLINK
Population-based Allelic, trend, genotypic, Fisher’s exact Stratified tests (Cochran-Mantel-Haenszel, Breslow-Day) Linear & logistic regression models multiple covariates, interactions, joint tests, etc Family-based Disease traits: TDT / sib-TDT Continuous traits: QFAM (between/within model, QTDT) Permutation procedures “adaptive”, max(T), gene-dropping, between/within, rank-based, within-cluster Multilocus tests Haplotype estimation, set-based tests, Hotelling’s T2, epistasis

17 An Example: logistic Regression
plink --maf exclude nonautosomalSNPs.txt --out AllAssoc --bfile bdata --remove exclusions.txt --logistic --hide-covar --pheno IChipCovs.txt --pheno-name cas_con --covar IChipCovs.txt --covar-name Sex,EurAdmix

18 An Example: logistic Regression Result

19 Cardinal rules in PLINK
Always consult the log file, console output Also consult the web documentation regularly PLINK has no memory each run loads data anew, previous filters lost Exact syntax and spelling is important “minus minus” … PLINK tutorial, October 2006; Shaun Purcell,

Download ppt "BST 775 Lecture PLINK – A Popular Toolset for GWAS"

Similar presentations

Ads by Google