BST 775 Lecture PLINK – A Popular Toolset for GWAS

Slides:



Advertisements
Similar presentations
Mendel’s Laws.
Advertisements

Statistical methods for genetic association studies
Generalized Regional Admixture Mapping (RAM) and Structured Association Testing (SAT) David T. Redden, Associate Professor, Department of Biostatistics,
PLINK: a toolset for whole genome association analysis
Association Tests for Rare Variants Using Sequence Data
Single Nucleotide Polymorphism And Association Studies Stat 115 Dec 12, 2006.
SNP Applications statwww.epfl.ch/davison/teaching/Microarrays/snp.ppt.
Objectives Cover some of the essential concepts for GWAS that have not yet been covered Hardy-Weinberg equilibrium Meta-analysis SNP Imputation Review.
Basics of Linkage Analysis
Joint Linkage and Linkage Disequilibrium Mapping
Association Mapping David Evans. Outline Definitions / Terminology What is (genetic) association? How do we test for association? When to use association.
MALD Mapping by Admixture Linkage Disequilibrium.
Ingredients for a successful genome-wide association studies: A statistical view Scott Weiss and Christoph Lange Channing Laboratory Pulmonary and Critical.
Using HapMap.Org A Tutorial Lincoln Stein, Cold Spring Harbor Laboratory.
Biometrical genetics Manuel Ferreira Shaun Purcell Pak Sham Boulder Introductory Course 2006.
Gene-gene and gene-environment interactions Manuel Ferreira Massachusetts General Hospital Harvard Medical School Center for Human Genetic Research.
Biometrical genetics Manuel Ferreira Shaun Purcell Pak Sham Boulder Introductory Course 2006.
CS177 Lecture 9 SNPs and Human Genetic Variation Tom Madej
Estimating “Heritability” using Genetic Data David Evans University of Queensland.
Genetic Theory Manuel AR Ferreira Egmond, 2007 Massachusetts General Hospital Harvard Medical School Boston.
More Powerful Genome-wide Association Methods for Case-control Data Robert C. Elston, PhD Case Western Reserve University Cleveland Ohio.
:NEUROPSYCHIATRIC GENETICS [BIOSTATISTICS|BIOINFORMATICS] CORE BIOSTATISTIC/BIOINFORMATIC TOOLS FOR GENETICS DATA: DATA MANAGEMENT AND ANALYSIS RICHARD.
Something related to genetics? Dr. Lars Eijssen. Bioinformatics to understand studies in genomics – São Paulo – June Image:
Genome-Wide Association Studies Xiaole Shirley Liu Stat 115/215.
Linkage Analysis in Merlin
Analysis of genome-wide association studies
Polymorphism and Variant Analysis Lab
Geuvadis RNAseq analysis at UNIGE Analysis plans
Introduction to BST775: Statistical Methods for Genetic Analysis I Course master: Degui Zhi, Ph.D. Assistant professor Section on Statistical Genetics.
PLINK tutorial, December 2006; Shaun Purcell, PLINK gPLINK Haploview Whole genome association software tutorial Shaun Purcell.
Population Stratification
Factors to Consider in Selecting a Genotyping Platform Elizabeth Pugh June 22, 2007.
Haplotype Blocks An Overview A. Polanski Department of Statistics Rice University.
A gene is composed of strings of bases (A,G, C, T) held together by a sugar phosphate backbone. Reminder - nucleotides are the building blocks.
Family-Based Association Tests
Polymorphism & Variant Analysis Lab Saurabh Sinha Polymorphism and Variant Analysis Lab v1 | Saurabh Sinha 1 Powerpoint by Casey Hanson.
CS177 Lecture 10 SNPs and Human Genetic Variation
1 Association Analysis of Rare Genetic Variants Qunyuan Zhang Division of Statistical Genomics Course M Computational Statistical Genetics.
Gene Hunting: Linkage and Association
Type 1 Error and Power Calculation for Association Analysis Pak Sham & Shaun Purcell Advanced Workshop Boulder, CO, 2005.
Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,
Addressing cryptic relatedness in candidate samples for 1KG James Nemesh Steve McCarroll 02/13/2012.
1 B-b B-B B-b b-b Lecture 2 - Segregation Analysis 1/15/04 Biomath 207B / Biostat 237 / HG 207B.
1 Genes and MS in Tasmania, cont. Lecture 6, Statistics 246 February 5, 2004.
Statistical Issues in Genetic Association Studies
Genome-wide association studies (GWAS) Thomas Hoffmann Department of Epidemiology and Biostatistics, and Institute for Human Genetics.
PLINK / Haploview Whole genome association software tutorial
GenABEL: an R package for Genome Wide Association Analysis
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
Genetic background and population stratification Shaun Purcell 1,2 & Pak Sham 1 1 Social, Genetic & Developmental Psychiatry Research Centre, IoP, KCL,
Linkage. Announcements Problem set 1 is available for download. Due April 14. class videos are available from a link on the schedule web page, and at.
Linkage. Announcements Problem set 1 is available for download. Due April 14. class videos are available from a link on the schedule web page, and at.
Population stratification
Common variation, GWAS & PLINK
upstream vs. ORF binding and gene expression?
Genome Wide Association Studies using SNP
Recombination (Crossing Over)
Preparing data for GWAS analysis
Population stratification
Introduction to Data Formats and tools
Power to detect QTL Association
Statistical Methods for Quantitative Trait Loci (QTL) Mapping II
Biometrical model and introduction to genetic analysis
Pak Sham & Shaun Purcell Twin Workshop, March 2002
Association Analysis Spotted history
A Tutorial Lincoln Stein, Cold Spring Harbor Laboratory
Population Stratification Practical
Association Design Begins with KNOWN polymorphism theoretically expected to be associated with the trait (e.g., DRD2 and schizophrenia). Genotypes.
Presentation transcript:

BST 775 Lecture PLINK – A Popular Toolset for GWAS Guodong Wu SSG, Department of Biostatistics University of Alabama at Birmingham September 24, 2013

Overview Designed for GWAS and population-based linkage analysis. Developed by Shaun Purcell*, current version V1.07. http://pngu.mgh.harvard.edu/~purcell/plink/ Why the toolset is so popular? Store the GWAS data sets, which is too large for SAS, R, or other statistical packages. Well developed guideline and toolsets for Dataset Management and Quality Control Platform for various association methods * Purcell et al 2007, AJHG

Overview Data management Summary statistics Quality Control Association Test

PLINK in GWAS workflow Experimental Design & Sample Collection Cell Intensity Files for each chip GeneChip Scanner Summary statistics and quality control Phenotype, sex and other covariates Assessment of population stratification Whole genome SNP-based association Further exploration of ‘hits’ Visualization and follow-up

Data Format PED and MAP format Transposed format SNP information SNPs → SNP information 1 snp1 0 1000 X snp2 0 1000 Y snp3 0 1000 XY snp4 0 1000 MT snp5 0 1000 P1 A A A C C G T T A A T T P2 A C A A C G G T A C T T P3 C C A C G G T T A A T T P4 C C A A G G G T A A T T ←People Transposed format People → 0101010010101010101 1010011101010101010 1101110101001010101 1101001011101101010 1101010101010111010 People information S1 A A A C C C C C S2 A C A A A C A A S3 C G C G G G G G S4 T T C G T T G T S5 A A G T A A A A S6 T T A C T T T T ←SNPs P1 … P2 … P3 … P4 … P5 … Compact binary format

Data management Recode dataset (A,C,G,T → 1,2) Reorder, reformat dataset Flip DNA strand Extract/remove individuals/SNPs New phenotypes, covariates as extra file Merge 2 or more data sets

Summary and QC Hardy-Weinberg test Mendel errors Missing genotypes Allele frequencies Tests of non-random missingness by phenotype and by (unobserved) genotype Sex Check Pairwise IBD estimates

Mendel errors An exact test by default. plink --file data --hardy An exact test by default. In Case control study, the Control group typically needs more lenient threshold (eg. P-value < 1e-3)

Mendel errors plink --file data --mendel Genotyping error when child’s genotype is not inherited from the parents, according to mendel’s law Output as Output the error rate for each SNP and each individual Code Pat , Mat -> Offspring 1 AA , AA -> AB 2 BB , BB -> AB 3 BB , ** -> AA 4 ** , BB -> AA 5 BB , BB -> AA 6 AA , ** -> BB 7 ** , AA -> BB 8 AA , AA -> BB

Missingness and Allele Frequency Output each SNP’s allele frequency plink --file data --missing Output the missing rate per SNP and per individual. plink --file data --freq Output each SNP’s allele frequency

Is the missingness random? plink --file data –-test-missing Test whether the SNP is randomly missing between case and control status. plink --file data -–test-mishap Test whether the SNP is randomly missing based on observed genotyped nearby SNPs. Assume dense SNP genotyping. Use haplotype and LD information in tests.

Sex Check plink --file data –check-sex Use X chromosome data heterozygosity rates to determine sex, and then compare with the observed sex.

Pairwise IBD sharing (relatedness) Most recent common ancestor from homogeneous random mating population Parents AB AC AB AC IBS = 1 IBD = 0 AB AC PLINK tutorial, October 2006; Shaun Purcell, shaun@pngu.mgh.harvard.edu

Relatedness Check plink --file data –-genome The Genome-wide information, typically do not need whole-genome SNPs. Typically 100K independent SNPs are enough.

Association methods in PLINK Population-based Allelic, trend, genotypic, Fisher’s exact Stratified tests (Cochran-Mantel-Haenszel, Breslow-Day) Linear & logistic regression models multiple covariates, interactions, joint tests, etc Family-based Disease traits: TDT / sib-TDT Continuous traits: QFAM (between/within model, QTDT) Permutation procedures “adaptive”, max(T), gene-dropping, between/within, rank-based, within-cluster Multilocus tests Haplotype estimation, set-based tests, Hotelling’s T2, epistasis

An Example: logistic Regression plink --maf 0.05 --exclude nonautosomalSNPs.txt --out AllAssoc --bfile bdata --remove exclusions.txt --logistic --hide-covar --pheno IChipCovs.txt --pheno-name cas_con --covar IChipCovs.txt --covar-name Sex,EurAdmix

An Example: logistic Regression Result

Cardinal rules in PLINK Always consult the log file, console output Also consult the web documentation regularly PLINK has no memory each run loads data anew, previous filters lost Exact syntax and spelling is important “minus minus” … PLINK tutorial, October 2006; Shaun Purcell, shaun@pngu.mgh.harvard.edu