Presentation is loading. Please wait.

Presentation is loading. Please wait.

GWAS Analysis Pipeline

Similar presentations

Presentation on theme: "GWAS Analysis Pipeline"— Presentation transcript:

1 GWAS Analysis Pipeline
GWAS Data Less time Less errors Easier Module 1 Module 2 Module 3 Analysis Options (universal interface) Results

2 Pipeline Structure Input (data + analysis options) job1 job2 ….. Job N
gwaOption_R => R module gwaOption_sas => SAS module gwaOption_C => C module Perl, Java, Matlab, … Result 1 Result 2 ….. Result N Final Results

3 An Example options.gwa gwas.R job1 job2 ….. Job N
sas -sysparm ‘optionfile.jobi’ (single machine mode) bsub –J jobname sas -sysparm ‘optionfile.jobi’ (cluster mode) Result 1 Result 2 ….. Result N Final Results

4 options.gwa [DATA] database=SAS
genotype_dir=/dsg1/gwas/fhsgeno genotype_file=chr6_2 phenotype_dir=/dsg1/gwas/fhspheno phenotype_file=fhs100 markerinfo_dir= /dsg1/gwas/fhsmap markerinfo_file=mapall marker_selection=MAF<0.01 pedigree_dir= /dsg1/gwas/fhspedi pedigree_file=pediall subjectID=subject pedgreeID=famid markername=snp globbing=*/1:22 or 1,2,3,5:10 [ANALYSIS] phenolist_file= pheno_list=bmi/qt covariates= program=SASGLM analysis=mixed [OUTPUT] output_dir=/dsguser/qunyuan/fhs/bmi output_file= output_replace=no [RUN] clusterjobname=bmimixed memsize=1000M #!/bin/sh OPFILE=$1 R --no-save < /dsguser/qunyuan/gwas/code/gwas.R $OPFILE > gwas.rlog & genotype_dir=IN_FILE genotype_file=genolist.txt Pheno type covar program analysis run Bmi qt age,sex SASGLM mixed YES Obes ql NA SASGLM gee YES HD ql age SASGLM gee NO Age … Sex … Program language location Maintainer SASGLM SAS /dsg1/code/sas/ Q.Zhang GSTAT R /dsg1/code/R/gstat.R Q.Zhang sasPLINK SAS /dsg1/code/sas/ J. Czajkowski

5 gwas.R sasOption.job1 sasOption.job2 sasOption.job3 … sasOption.jobN
%let genotype_dir=/dsguser/qunyuan/gwas/cluster/impudata/; %let genotype_file=mldosechr6_12; %let phenotype_dir=/dsguser/qunyuan/gwas/cluster/data/; %let phenotype_file=pheno100; %let markerinfo_dir=; %let markerinfo_file=; %let marker_selection=MAF<0.01; %let pedigree_dir=; %let pedigree_file=; %let subjectID=subject; %let pedgreeID=famid; %let markername=snp; %let phenolist_file=; %let phenolist=bmi; %let phenodata_type=qt; %let covariates=; %let program=SASGLM; %let analysis=mixed; %let output_dir=/dsguser/qunyuan/gwas/run/run2/mixck/; %let output_replace=yes; %let clusterjobname=; %let memsize=1000M; source("/dsguser/qunyuan/gwas/code/gwaslib.R") option.file=commandArgs()[3] read.table(option.file,sep="=",stringsAsFactors=F)->x x$V1=gsub(" ","",x$V1) x$V2=gsub(" ","",x$V2) rownames(x)=x$V1 gf=x["genotype_file",2] if (nchar(gf)>0) strsplit(gf,split=",")[[1]]->gfs if (nchar(gf)==0) { gfs=dir(x["genotype_dir",2]) if (x["output_replace",2]=="no") outfs=dir(x["output_dir",2]) gfs=gfs[!(gfs %in% outfs)] } gfs=gsub(".sas7bdat","",gfs) (dir.create(x["output_dir",2],recursive=T)) for (fi in gfs) x["genotype_file",2]=fi xi=paste("%let ", x$V1,"=", x$V2, ";",sep="") optfile=paste(fi,".opt",sep="") write.table(xi,file=optfile,quote=F,row.names=F,col.names=F) cmd=paste("sas -memsize",x["memsize",2],"-sysparm",paste("'",optfile,"'",sep=""),x["program",2]) if (nchar(x["clusterjobname",2])>0) cmd=paste ("bsub -J",x["clusterjobname",2],cmd) system(cmd) sasOption.job1 sasOption.job2 sasOption.job3 sasOption.jobN sas –memsize 1000 M-sysparm ‘sasOption.job1’ /dsg1/code/sas/ & bsub –J jobname sas –sysparm ‘sasOption.job1’ /dsg1/code/sas/ %include “&sysparm”; .

6 Programming Tips Get all information that your program needs from one option file, pass the option file to your program via command line (e.g. sas –sysparm ‘optionfile’, or at least define all the options using variables at the top your script. For any information that may change in different analyses, instead of fixed names, use variables. (e.g. set my data(keep=&markname) ) Keep a good structure, use functionally independent macros and/or functions to organize your code. Don’t put every thing in one macro/function. %MACRO main(); %macro1(); *get p value for markers; %macro2(); *merge with marker positions & allele frequencies; %MEND main; %main(); Avoid name conflict. Name your variables and datasets using unique prefixes (e.g. QZ_temp). Write as little as you can. chrom loop; pheno loop; geno loop; yourmacro(indat=,y=,x=,covar=,method=,outdat=); end;end;end; Error handling.

Download ppt "GWAS Analysis Pipeline"

Similar presentations

Ads by Google