Genome-Wide Pharmacogenomic Study on Methadone Maintenance Treatment

Genome-Wide Pharmacogenomic Study on Methadone Maintenance Treatment

Analysis Flow of the Study
Data preprocessing Genome-wide single locus association test Manhattan plot & Q-Q plot False discovery rate (FDR) correction Regional association plot Analysis of the proportion of variation explained by significant SNPs

Data Availability GSE78098_series_matrix.txt.gz is downloaded from GSE78098 wget ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE78nnn/GSE78098/matrix/GSE78098_series_matrix.txt.gz gunzip *gz GPL txt is downloaded from GSE78098 download full table from GSE78098_MMT_stage_discovery_postqc.txt.gz is downloaded from GSE78098 wget ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE78nnn/GSE78098/suppl/GSE78098_MMT_stage_discovery_postqc.txt.gz

File Documentation GSE78098_series_matrix.xlsx (phenotype & covariates) Row 30 is sample ID, Row 41 is r_met_plasma_concentration, Row 42 is s_met_plasma_concentration, Row 43 is r_eddp_plasma_concentration, Row 44 is s_eddp_plasma_concentration, Row 45 is age, Row 46 is gender, Row 47 is bmi First 344 samples: discovery stage Last 76 samples: replication stage Data transformation to normality was performed GPL txt (SNP information) Row 31- is SNP information Col 1 is ID, Col 3 is SNP ID, Col 5 is chromosome, Col 6 is physical position GSE78098_MMT_stage_discovery_postqc.txt (genotype) Row 1 is sample ID, Col 1 is SNP ID Statistical qualtity control procedures were performed using PLINK software

Preprocessed Data Covariates: covariates.txt
Col 1 is sample ID, Col 2 is age, Col 3 is gender, Col 4 is bmi Phenotype data: phenotype.txt Col 1 is sample ID, Col 2 is trait 1, Col 3 is trait 2, Col 4 is trait 3, Col 5 is trait 4

Summary Statistics Summarize covariates and transformed data of quantitative traits by gender (using R) # Read data pheno=read.table("phenotype.txt", header=T) covar=read.table("covariates.txt", header=T) df=data.frame(covar$gender, covar$age, covar$bmi, pheno[,2:5]) names(df)=c("gender", "age", "bmi", "r_met", "s_met", "r_eddp", "s_eddp") # Sample size, mean and standard deviation by gender library(plyr) ddply(df, ~gender,summarise, count=length(r_met[!is.na(r_met)]), mean=mean(r_met[!is.na(r_met)]), sd=sd(r_met[!is.na(r_met)])) # Normality test install.packages(“fBasics") library(fBasics) ksnormTest(df$r_met[!is.na(df$r_met)]) # Histogram hist(df$r_met[!is.na(df$r_met)],xlab=”r_met”,ylab=”histogram of r_met”)

Summary Statistics Of Covariates And The Transformed Data Of Quantitative Traits By Gender
Characteristics Male Female Sample size Mean± SD Normality test (p value) Age (years) 281 ± 63 ± - BMI (kg/m2) 278 ± ± Transformed plasma R-methadone/dose (ng/ml/mg) ± ± 0.6255 Transformed plasma S-methadone/dose (ng/ml/mg) ± ± 0.0802 Transformed plasma R-EDDP/dose (ng/ml/mg) 272 ± ± 0.7903 Transformed plasma S-EDDP/dose (ng/ml/mg) 277 ± ± 0.1876

Covariates Adjustments
Replace missing values in phenotype and covariates with mean of the variable (using R) # Phenotype pheno=read.table("phenotype.txt", header=T) pheno0=pheno[,-1] n=dim(pheno0)[1] p=dim(pheno0)[2] for (i in 1:p){ pheno0[is.na(pheno0[,i]),i]=mean(pheno0[!is.na(pheno0[,i]),i])} write.table(pheno0, "phenotype0.txt", row.names=F, col.names=F, quote=F, sep=" ") # Covariates covar=read.table("covariates.txt", header=T) covar0=covar[,-1] n=dim(covar0)[1] p=dim(covar0)[2] covar0[is.na(covar0[,i]),i]=mean(covar0[!is.na(covar0[,i]),i])} write.table(covar0, "covariates0.txt", row.names=F, col.names=F, quote=F, sep=" ")

Covariates Adjustments
Covariates adjustments (using R) pheno=read.table("phenotype0.txt") covar=read.table("covariates0.txt") pheno=as.matrix(pheno) covar=as.matrix(covar) n=dim(pheno)[1] p=dim(pheno)[2] fit=list() residpheno=matrix(0,n,p) for (i in 1:p){ fit[[i]]=lm(pheno[,i]~covar) residpheno[,i]=resid(fit[[i]]) } write.table(residpheno, "resid_phenotype0.txt", row.names=F, col.names=F, quote=F, sep=" ")

PLINK PLINK is a free, open-source whole genome association analysis toolset, designed to perform a range of basic, large- scale analyses in a computationally efficient manner, see We will use PLINK to perform genome-wide single-locus association analysis.

PLINK Input Files PED file Example: Col 1: Family ID
Col 2: Individual ID Col 3: Paternal ID Col 4: Maternal ID Col 5: Sex (1=male; 2=female; other character=unknown) Col 6: Phenotype (The missing phenotype value for quantitative traits is, by default, -9) Col 7-: Genotypes Example: FAM A A G G A C C C FAM A A A G 0 0 A C

PLINK Input Files MAP file Example:
Col 1: Chromosome (1-22, X, Y or 0 if unplaced) Col 2: rs# or SNP identifier Col 3: Genetic distance (morgans) Col 4: Base-pair position (bp units) Example: 1 rs 1 rs 1 rs 1 rs

PLINK Input Files Example:
Alternate phenotype files (to specify an alternate phenotype for analysis, other than the one in the PED file) Col 1: Family ID Col 2: Individual ID Col 3: Phenotype A Col 4: Phenotype B Col 5: Phenotype C Col 6: Phenotype D …… Example: FAM FAM

PLINK-Ready Files PED file: genotype.ped MAP file: genotype.map
Alternate phenotype file: resid_phenotype.txt

Genome-Wide Single Locus Association Test
See PLINK reference page Run PLINK: plink --noweb --file genotype --assoc –adjust --pheno resid_phenotype.txt --all- pheno --out younameit Usage --file specifies .ped and .map files, --assoc performs case/control or QTL association, --adjust generates a file of adjusted significance values that correct for all tests performed and other metrics, --pheno specifies alternate phenotype, --all-pheno performs association for all phenotypes in file, --out specifies output filename.

This will generate the files younameit.P1.qassoc, younameit.P2.qassoc, younameit.P3.qassoc, younameit.P4.qassoc with fields as follows: CHR Chromosome number SNP SNP identifier BP Physical position (base-pair) NMISS Number of non-missing genotypes BETA Regression coefficient SE Standard error R Regression r-squared T Wald test (based on t-distribtion) P Wald test asymptotic p-value

--adjust generates the file younameit.adjust, which contains the following fields CHR Chromosome number SNP SNP identifer UNADJ Unadjusted p-value GC Genomic-control corrected p-values BONF Bonferroni single-step adjusted p-values HOLM Holm (1979) step-down adjusted p-values SIDAK_SS Sidak single-step adjusted p-values SIDAK_SD Sidak step-down adjusted p-values FDR_BH Benjamini & Hochberg (1995) step-up FDR control FDR_BY Benjamini & Yekutieli (2001) step-up FDR control

qqman R Package qqman is an R package for creating Q-Q and manhattan plots from GWAS results. See the reference page package-for-qq-and-manhattan-plots-for-gwas-results.html. The qqman R package assumes you have columns named SNP, CHR, BP, and P, corresponding to the SNP name (rs number), chromosome number, base-pair position, and p-value. Here is what the data looks like: SNP CHR BP P rs rs rs rs rs

Manhattan Plot and Q-Q Plot
Prepare qqman R input files (CHR, SNP, BP, P) awk '{print $1,$2,$3,$9}' younameit.P1.qassoc > P1.qassoc awk '{print $1,$2,$3,$9}' younameit.P2.qassoc > P2.qassoc awk '{print $1,$2,$3,$9}' younameit.P3.qassoc > P3.qassoc awk '{print $1,$2,$3,$9}' younameit.P4.qassoc > P4.qassoc

Manhattan Plot and Q-Q Plot
Create Manhattan plots and Q-Q plots (using R) traits=c("r_met", "s_met", "r_eddp", "s_eddp") traits=as.matrix(traits) library(qqman) i=1 #i=2/i=3/i=4 qassoc=read.table(paste0("P", i, ".qassoc"), header=T) qassoc=qassoc[qassoc$CHR!=0,] png(filename=paste0("Manhattan_Plot_for_", traits[i], ".png"), type="cairo") manhattan(qassoc, col=c("green4", "red"), suggestiveline=F, genomewideline=F) dev.off() png(filename=paste0("Q-Q_Plot_for_", traits[i], ".png"), type="cairo") qq(qassoc$P)

Manhattan Plot of Genome-Wide Single Locus Association Test for R-Methadone and S-Methadone

Q-Q Plot of Genome-Wide Single Locus Association Test for R-methadone and S-Methadone

Identify Significant SNPs After a Multiple-Test Correction of a False Discovery Rate (FDR)
Prepare R input files (CHR, SNP, UNADJ, FDR_BH) awk '{print $1,$2,$3,$9}' younameit.P1.qassoc.adjusted > P1.qassoc.adjusted awk '{print $1,$2,$3,$9}' younameit.P2.qassoc.adjusted > P2.qassoc.adjusted awk '{print $1,$2,$3,$9}' younameit.P3.qassoc.adjusted > P3.qassoc.adjusted awk '{print $1,$2,$3,$9}' younameit.P4.qassoc.adjusted > P4.qassoc.adjusted

Identify Significant SNPs After a Multiple-Test Correction of a False Discovery Rate (FDR)
Significant SNPs after a multiple-test correction of FDR (using R) traits=c("r_met", "s_met", "r_eddp", "s_eddp") traits=as.matrix(traits) for (i in 1:4){ qassoc.adjusted=read.table(paste0("P", i, ".qassoc.adjusted"), header=T) sigidx=which(qassoc.adjusted$FDR_BH<0.05) #index of significant SNPs sigSNP=qassoc.adjusted[sigidx,] write.table(sigSNP, paste0("significant_SNPs_for_", traits[i],".txt"), row.names=F, col.names=T, quote=F, sep=" ") }

The Significant SNPs Identified by Genome-Wide Single Locus Association Analysis
The genome-wide single locus association analysis identified only SNP rs (Chr 9, ) to be significantly associated with the plasma concentration of R-methadone after a multiple-test correction of a false discovery rate (raw p=4.692e-09).

Usage of make.fancy.locus.plot
make.fancy.locus.plot is an R function for highlighting the statistical strength of an association in the context of the association results for surrounding markers, gene annotations, estimated recombination rates and pairwise correlations between the surrounding markers and the putative associated variant, see the reference page You have to provide a file that contains the following data for every SNP across the region of interest: position, p-value, a label to indicate whether a SNP is “typed" or "imputed", and the r-squared between that SNP and the putative associated variant. All SNPs in this file will be plotted with their corresponding P-values (as -log10 values) as a function of chromosomal position. SNPs that are "typed" are plotted as diamonds; "imputed" SNPs are plotted as circles. Estimated recombination rates are plotted to reflect the local LD structure around the associated SNP and their correlated proxies (bright red indicating highly correlated, faint red indicating weakly correlated).

Significant SNPs in a Regional Association Plot
Obtain regional SNPs (using R) qassoc=read.table("younameit.P1.qassoc", header=T, stringsAsFactors=F) idx=which(qassoc$CHR==9 & qassoc$BP>= & qassoc$BP<= & qassoc$SNP!="- --") TYPE=rep("typed", length(idx)) region=data.frame(qassoc$SNP[idx], qassoc$BP[idx], qassoc$P[idx], TYPE, qassoc$R2[idx]) write.table(region, "regional_SNPs.txt", row.names=F, col.names=c("SNP", "POS", "PVAL", "TYPE", "RSQR"), quote=F, sep=" ") The souce code "regional_association_plot.r", the estimated recombination rate from HapMap and the gene annotations from the UCSC genome browser (using Build 35 coordinates) should be available in the same folder.

Significant SNPs in a Regional Association Plot
Create regional association plot (using R) source("regional association plot.r") locus=read.table("regional_SNPs.txt", header=T, row.names=1) pdf("assocplot_rs pdf", width=8, height=6) make.fancy.locus.plot("rs ", "rs ", "9", locus, 9, 4.69e-9) dev.off()

Regional Association Plot of rs17180299

Distribution of Plasma Concentration of R-Methadone for the Genotypes of rs17180299
Phenotype data: r_met.txt Genotype data for significant SNPs: rs txt Create Box plot (using R) pheno=read.table("r_met.txt") geno=read.table("rs txt", sep="\t") pheno=as.matrix(pheno) geno=as.matrix(geno) table(geno) #number of individuals having AA, AG and GG boxplot(pheno~geno, xlab="Genotype", ylab="R-methadone")

Distribution of Plasma Concentration of R-Methadone for Three Genotypes of rs17180299

Proportion of Variation Explained by Significant SNPs
Based on the variable(s) or covariate(s) in a regression model, the next SNP or was included if the SNP produced the maximal increment of model R2. Model R2 revealed the coefficient of determination of a full regression model that contained one or more SNPs. In addition, the marginal R2 was calculated for each SNP according to the regression model that contained only that SNP.

Analysis of the Proportion of Variation Explained by Significant SNPs
Significant SNPs: rs Genotype data of significant SNPs: rs txt Phenotype data: r_met.txt

Analysis of the Proportion of Variation Explained by Significant SNPs
Calculate marginal R2 (using R) pheno=read.table("r_met.txt") geno=read.table("rs txt") pheno=as.matrix(pheno) geno=as.matrix(geno) fit=lm(pheno~geno) summary(fit)$r.squared [1]

Genome-Wide Case/Control Association Test

PLINK Input Files Phenotype data: Genotype data:
We convert the four continuous traits to binary traits based on the sign of value: Values greater than 0 are coded as 1 Values less than 0 are coded as 2 Phenotype data: Binary_phenotype.txt Genotype data: genotype.ped genotype.map

Run PLINK: plink --noweb --file genotype --assoc –adjust --pheno binary_phenotype.txt -- all-pheno --out younameit This will generate the files younameit.P1.assoc,younameit.P2.assoc, younameit.P3.assoc, younameit.P4.assoc with fields as follows CHR Chromosome SNP SNP ID BP Physical position (base-pair) A Minor allele name (based on whole sample) F_A Frequency of this allele in cases F_U Frequency of this allele in controls A Major allele name CHISQ Basic allelic test chi-square (1df) P Asymptotic p-value for this test OR Estimated odds ratio (for A1, i.e. A2 is reference)

--adjust generates the file younameit.adjust, which contains the following fields CHR Chromosome number SNP SNP identifer UNADJ Unadjusted p-value GC Genomic-control corrected p-values BONF Bonferroni single-step adjusted p-values HOLM Holm (1979) step-down adjusted p-values SIDAK_SS Sidak single-step adjusted p-values SIDAK_SD Sidak step-down adjusted p-values FDR_BH Benjamini & Hochberg (1995) step-up FDR control FDR_BY Benjamini & Yekutieli (2001) step-up FDR control

Manhattan Plot of Genome-Wide Case/Control Association Test for Binary R-Methadone and Binary S-Methadone

Q-Q Plot of Genome-Wide Case/Control Association Test for Binary R-Methadone and Binary S-Methadone

The Significant SNPs Identified by Genome-Wide Case/Control Association Analysis
The genome-wide case/control association analysis did not identify any SNP to be significantly associated with any of the four binary traits after a multiple-test correction of a false discovery rate.

Top 10 SNPs for Binary R-Methadone and Continuous R-Methadone
CHR SNP P 16 rs 1.14E-06 14 rs 1.18E-06 9 rs 3.75E-06 7 rs 6.77E-06 rs 7.10E-06 rs 9.38E-06 17 rs rs 1.06E-05 rs 1.32E-05 18 rs 1.46E-05 Continuous R-Methadone CHR SNP P 9 rs 4.69E-09 rs 3.96E-07 rs 5.74E-07 5 rs26411 1.14E-06 21 rs 4.38E-06 12 rs 5.28E-06 rs 6.49E-06 11 rs 6.86E-06 rs 7.17E-06 rs 8.70E-06

Genome-Wide Pharmacogenomic Study on Methadone Maintenance Treatment

Similar presentations

Presentation on theme: "Genome-Wide Pharmacogenomic Study on Methadone Maintenance Treatment"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Genome-Wide Pharmacogenomic Study on Methadone Maintenance Treatment

Similar presentations

Presentation on theme: "Genome-Wide Pharmacogenomic Study on Methadone Maintenance Treatment"— Presentation transcript:

Similar presentations

About project

Feedback