Presentation is loading. Please wait.

Presentation is loading. Please wait.

Washington State University

Similar presentations


Presentation on theme: "Washington State University"— Presentation transcript:

1 Washington State University
Workshop Assessment of statistical power, false positive rate and type I error of GWAS Zhiwu Zhang Washington State University

2 Objectives Simulation of phenotypes True and false positives
Effect of population structure Power, FDR and type I error Comparison of methods Experimental design

3 Complex traits Controlled by multiple genes Influenced by environment
Also known as quantitative traits Most traits are continuous, e.g. yield and height, Some are categorical, e.g. node number, score of disease resistance Some binary traits are still quantitative traits, e.g. diabetes Economically important

4 Dissecting phenotype Y= G + E + GxE + Residual
G = Additive + Dominance + Epistasis E: Environment, e.g. year and location Residual: e.g. measurement error

5 Distribution of QTN effect
Normal distribution Geometry distribution

6 Theoretical geometric distribution
The probability distribution of the number X of Bernoulli trials needed to get one success Prob (X=k)=(1-p)k-1 p

7 Approximated geometric distribution
Effect(X=k)=pk

8 Demo code

9 Preparation for GAPIT #Import GAPIT
#source(" #biocLite("multtest") #install.packages("gplots") #install.packages("scatterplot3d")#The downloaded link at: rm(list=ls()) library('MASS') # required for ginv library(multtest) library(gplots) library(compiler) #required for cmpfun library("scatterplot3d")

10 Data preparation #Import demo data
myGD=read.table(file=" myGM=read.table(file=" #myGD=read.table(file="~/Dropbox/Current/ZZLab/WSUCourse/CROPS545/Demo/mdp_numeric.txt",head=T) #myGM=read.table(file="~/Dropbox/Current/ZZLab/WSUCourse/CROPS545/Demo/mdp_SNP_information.txt",head=T)

11 Genotype in Numeric format
myGD=read.table(file="

12 Genetic map myGM=read.table(file="

13 GAPIT.Phenotype.Simulation
#Simultate 10 QTN on the first half chromosomes X=myGD[,-1] index1to5=myGM[,2]<6 X1to5 = X[,index1to5] taxa=myGD[,1] set.seed(99164) GD.candidate=cbind(taxa,X1to5) mySim=GAPIT.Phenotype.Simulation(GD=GD.candidate,GM=myGM[index1to5,],h2=.5,NQTN=10,QTNDist="normal")

14 Simulation object str(mySim) List of 5
$ Y :'data.frame': 281 obs. of 2 variables: ..$ GD[, 1]: Factor w/ 281 levels "33-16","38-11",..: ..$ V1 : num [1:281] $ u : num [1:281, 1] $ e : num [1:281] $ QTN.position: int [1:10] $ effect : num [1:10]

15 QTN positions plot(myGM[,c(2,3)])
points(myGM[mySim$QTN.position,c(2,3)],type="p",col="red",cex=3)

16 Simulation results par(mfrow=c(2,2), mar = c(3,4,1,1))
plot(mySim$effect) plot(mySim$Y[,2],mySim$u) plot(mySim$Y[,2],mySim$e) plot(mySim$e,mySim$u)

17 LM for GWAS Y = SNP + Q (or PCs) + e + Kinship Phenotype Q+K
Population structure Unequal relatedness Y = SNP + Q (or PCs) e + Kinship (fixed effect) (fixed effect) (random effect) General Linear Model (GLM) Mixed Linear Model (MLM) (Yu et al. 2005, Nature Genetics)

18 Group by kinship

19 Compression improves power
Zhang et al., Nature Genetics, 2010 Average number of individuals per group

20 Average number of individuals per group
Fit matches power Average number of individuals per group

21 Compression is robust across species
Human (n=1315) Dog (n=292) Maize (n=277) Fit of Model 0.20sd (0.83%) 0.1sd (0.21%) 0.2sd (0.83%) 0.3sd (1.85%) 0.4sd (3.25%) 0.5sd (4.99%) 0.5sd (4.99%) 0.16sd (0.53%) 0.4sd (3.25%) 0.12sd (0.30%) Statistical power 0.3sd (1.85%) 0.08sd (0.13%) 0.2sd (0.83%) 0.04sd 0(.03%) 0.1sd (0.21%) Compression level Compression is robust across species

22 Compressed MLM is more general
Zhang et al., Nature Genetics, 2010 GLM (1 group) SA, GC, PCA and QTDT Compressed MLM Sire model Compressed MLM (s groups) n ≥ s ≥ 1 Full MLM (n groups) Henderson’s MLM Unified MLM Pedigree based kinship Marker based kinship

23 ZZLab.Net

24 Modeling in GAPIT Model PCA.total group.from group.to t 1 GLM >0
1 GLM >0 MLM n CMLM

25 Run GAPIT setwd("~/Desktop/temp") myGAPIT=GAPIT( Y=mySim$Y, GD=myGD,
GM=myGM, QTN.position=mySim$QTN.position, PCA.total=0, group.from = 1, group.to = 1, group.by = 10, #sangwich.top="MLM", #options are GLM,MLM,CMLM, FaST and SUPER #sangwich.bottom="SUPER", #options are GLM,MLM,CMLM, FaST and SUPER memo="ttest")

26 Manhattan plot

27 Power, type I error and FDR
Power: Proportion of QTNs identified Type I error: empirical null distribution of non QTN SNPs FDR: Proportion of false positives

28 Mapping resolution 10Kb is really good, 100Kb is OK
Bins with QTNs for power Bins without QTNs for type I error

29 GAPIT.FDR.TypeI Function
myStat=GAPIT.FDR.TypeI( WS=c(1e0,1e3,1e4,1e5), GM=myGM, seqQTN=mySim$QTN.position, GWAS=myGAPIT$GWAS) str(myStat)

30 Return

31 Area Under Curve (AUC) par(mfrow=c(1,2),mar = c(5,2,5,2))
plot(myStat$FDR[,1],myStat$Power,type="b") plot(myStat$TypeI[,1],myStat$Power,type="b")

32 Replicates nrep=5 set.seed(99164) statRep=replicate(nrep,{
mySim=GAPIT.Phenotype.Simulation(GD=GD.candidate,GM=myGM[index1to5,],h2=.5,NQTN=10,QTNDist="norm") myGAPIT=GAPIT( Y=mySim$Y, GD=myGD, GM=myGM, QTN.position=mySim$QTN.position, PCA.total=0, group.from = 1, group.to = 1, group.by = 10, #sangwich.top="MLM", #options are GLM,MLM,CMLM, FaST and SUPER #sangwich.bottom="SUPER", #options are GLM,MLM,CMLM, FaST and SUPER file.output = F, memo="ttest") myStat=GAPIT.FDR.TypeI(WS=c(1e0,1e3,1e4,1e5),GM=myGM,seqQTN=mySim$QTN.position,GWAS=myGAPIT$GWAS) })

33 str(statRep)

34 Means over replicates power=statRep[[2]] #FDR
s.fdr=seq(3,length(statRep),7) fdr=statRep[s.fdr] fdr.mean=Reduce ("+", fdr) / length(fdr)

35 Plots of power vs. FDR theColor=rainbow(4)
plot(fdr.mean[,1],power , type="b", col=theColor [1],xlim=c(0,1)) for(i in 2:ncol(fdr.mean)){ lines(fdr.mean[,i], power , type="b", col= theColor [i]) }

36 Compare methods

37 Experimental design Methods: t, GLM, MLM, CMLM… Sample size
Populations: Association, RILs... Marker sensity Heritability Number of genes Major genes


Download ppt "Washington State University"

Similar presentations


Ads by Google