Download presentation
Presentation is loading. Please wait.
1
Washington State University
Workshop Assessment of statistical power, false positive rate and type I error of GWAS Zhiwu Zhang Washington State University
2
Objectives Simulation of phenotypes True and false positives
Effect of population structure Power, FDR and type I error Comparison of methods Experimental design
3
Complex traits Controlled by multiple genes Influenced by environment
Also known as quantitative traits Most traits are continuous, e.g. yield and height, Some are categorical, e.g. node number, score of disease resistance Some binary traits are still quantitative traits, e.g. diabetes Economically important
4
Dissecting phenotype Y= G + E + GxE + Residual
G = Additive + Dominance + Epistasis E: Environment, e.g. year and location Residual: e.g. measurement error
5
Distribution of QTN effect
Normal distribution Geometry distribution
6
Theoretical geometric distribution
The probability distribution of the number X of Bernoulli trials needed to get one success Prob (X=k)=(1-p)k-1 p
7
Approximated geometric distribution
Effect(X=k)=pk
8
Demo code
9
Preparation for GAPIT #Import GAPIT
#source(" #biocLite("multtest") #install.packages("gplots") #install.packages("scatterplot3d")#The downloaded link at: rm(list=ls()) library('MASS') # required for ginv library(multtest) library(gplots) library(compiler) #required for cmpfun library("scatterplot3d")
10
Data preparation #Import demo data
myGD=read.table(file=" myGM=read.table(file=" #myGD=read.table(file="~/Dropbox/Current/ZZLab/WSUCourse/CROPS545/Demo/mdp_numeric.txt",head=T) #myGM=read.table(file="~/Dropbox/Current/ZZLab/WSUCourse/CROPS545/Demo/mdp_SNP_information.txt",head=T)
11
Genotype in Numeric format
myGD=read.table(file="
12
Genetic map myGM=read.table(file="
13
GAPIT.Phenotype.Simulation
#Simultate 10 QTN on the first half chromosomes X=myGD[,-1] index1to5=myGM[,2]<6 X1to5 = X[,index1to5] taxa=myGD[,1] set.seed(99164) GD.candidate=cbind(taxa,X1to5) mySim=GAPIT.Phenotype.Simulation(GD=GD.candidate,GM=myGM[index1to5,],h2=.5,NQTN=10,QTNDist="normal")
14
Simulation object str(mySim) List of 5
$ Y :'data.frame': 281 obs. of 2 variables: ..$ GD[, 1]: Factor w/ 281 levels "33-16","38-11",..: ..$ V1 : num [1:281] $ u : num [1:281, 1] $ e : num [1:281] $ QTN.position: int [1:10] $ effect : num [1:10]
15
QTN positions plot(myGM[,c(2,3)])
points(myGM[mySim$QTN.position,c(2,3)],type="p",col="red",cex=3)
16
Simulation results par(mfrow=c(2,2), mar = c(3,4,1,1))
plot(mySim$effect) plot(mySim$Y[,2],mySim$u) plot(mySim$Y[,2],mySim$e) plot(mySim$e,mySim$u)
17
LM for GWAS Y = SNP + Q (or PCs) + e + Kinship Phenotype Q+K
Population structure Unequal relatedness Y = SNP + Q (or PCs) e + Kinship (fixed effect) (fixed effect) (random effect) General Linear Model (GLM) Mixed Linear Model (MLM) (Yu et al. 2005, Nature Genetics)
18
Group by kinship
19
Compression improves power
Zhang et al., Nature Genetics, 2010 Average number of individuals per group
20
Average number of individuals per group
Fit matches power Average number of individuals per group
21
Compression is robust across species
Human (n=1315) Dog (n=292) Maize (n=277) Fit of Model 0.20sd (0.83%) 0.1sd (0.21%) 0.2sd (0.83%) 0.3sd (1.85%) 0.4sd (3.25%) 0.5sd (4.99%) 0.5sd (4.99%) 0.16sd (0.53%) 0.4sd (3.25%) 0.12sd (0.30%) Statistical power 0.3sd (1.85%) 0.08sd (0.13%) 0.2sd (0.83%) 0.04sd 0(.03%) 0.1sd (0.21%) Compression level Compression is robust across species
22
Compressed MLM is more general
Zhang et al., Nature Genetics, 2010 GLM (1 group) SA, GC, PCA and QTDT Compressed MLM Sire model Compressed MLM (s groups) n ≥ s ≥ 1 Full MLM (n groups) Henderson’s MLM Unified MLM Pedigree based kinship Marker based kinship
23
ZZLab.Net
24
Modeling in GAPIT Model PCA.total group.from group.to t 1 GLM >0
1 GLM >0 MLM n CMLM
25
Run GAPIT setwd("~/Desktop/temp") myGAPIT=GAPIT( Y=mySim$Y, GD=myGD,
GM=myGM, QTN.position=mySim$QTN.position, PCA.total=0, group.from = 1, group.to = 1, group.by = 10, #sangwich.top="MLM", #options are GLM,MLM,CMLM, FaST and SUPER #sangwich.bottom="SUPER", #options are GLM,MLM,CMLM, FaST and SUPER memo="ttest")
26
Manhattan plot
27
Power, type I error and FDR
Power: Proportion of QTNs identified Type I error: empirical null distribution of non QTN SNPs FDR: Proportion of false positives
28
Mapping resolution 10Kb is really good, 100Kb is OK
Bins with QTNs for power Bins without QTNs for type I error
29
GAPIT.FDR.TypeI Function
myStat=GAPIT.FDR.TypeI( WS=c(1e0,1e3,1e4,1e5), GM=myGM, seqQTN=mySim$QTN.position, GWAS=myGAPIT$GWAS) str(myStat)
30
Return
31
Area Under Curve (AUC) par(mfrow=c(1,2),mar = c(5,2,5,2))
plot(myStat$FDR[,1],myStat$Power,type="b") plot(myStat$TypeI[,1],myStat$Power,type="b")
32
Replicates nrep=5 set.seed(99164) statRep=replicate(nrep,{
mySim=GAPIT.Phenotype.Simulation(GD=GD.candidate,GM=myGM[index1to5,],h2=.5,NQTN=10,QTNDist="norm") myGAPIT=GAPIT( Y=mySim$Y, GD=myGD, GM=myGM, QTN.position=mySim$QTN.position, PCA.total=0, group.from = 1, group.to = 1, group.by = 10, #sangwich.top="MLM", #options are GLM,MLM,CMLM, FaST and SUPER #sangwich.bottom="SUPER", #options are GLM,MLM,CMLM, FaST and SUPER file.output = F, memo="ttest") myStat=GAPIT.FDR.TypeI(WS=c(1e0,1e3,1e4,1e5),GM=myGM,seqQTN=mySim$QTN.position,GWAS=myGAPIT$GWAS) })
33
str(statRep)
34
Means over replicates power=statRep[[2]] #FDR
s.fdr=seq(3,length(statRep),7) fdr=statRep[s.fdr] fdr.mean=Reduce ("+", fdr) / length(fdr)
35
Plots of power vs. FDR theColor=rainbow(4)
plot(fdr.mean[,1],power , type="b", col=theColor [1],xlim=c(0,1)) for(i in 2:ncol(fdr.mean)){ lines(fdr.mean[,i], power , type="b", col= theColor [i]) }
36
Compare methods
37
Experimental design Methods: t, GLM, MLM, CMLM… Sample size
Populations: Association, RILs... Marker sensity Heritability Number of genes Major genes
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.