Presentation is loading. Please wait.

Presentation is loading. Please wait.

GWAS Analysis Pipeline GWAS Data Analysis Options (universal interface) Results  Less time  Less errors  Easier Module 1 Module 2 Module 3 …

Similar presentations


Presentation on theme: "GWAS Analysis Pipeline GWAS Data Analysis Options (universal interface) Results  Less time  Less errors  Easier Module 1 Module 2 Module 3 …"— Presentation transcript:

1 GWAS Analysis Pipeline GWAS Data Analysis Options (universal interface) Results  Less time  Less errors  Easier Module 1 Module 2 Module 3 …

2 Pipeline Structure Final Results job1 job2 …..Job N Input (data + analysis options) gwaOption_R => R module gwaOption_sas => SAS module gwaOption_C => C module Perl, Java, Matlab, … Result 1 Result 2 ….. Result N

3 An Example Final Results job1 job2 …..Job N gwas.R sas -sysparm ‘optionfile.jobi’ glm.sas (single machine mode) bsub –J jobname sas -sysparm ‘optionfile.jobi’ glm.sas (cluster mode) Result 1 Result 2 ….. Result N gwas.sh options.gwa

4 #!/bin/sh OPFILE=$1 R --no-save gwas.rlog & [DATA] database=SAS genotype_dir=/dsg1/gwas/fhsgeno genotype_file=chr6_2 phenotype_dir=/dsg1/gwas/fhspheno phenotype_file=fhs100 markerinfo_dir= /dsg1/gwas/fhsmap markerinfo_file=mapall marker_selection=MAF<0.01 pedigree_dir= /dsg1/gwas/fhspedi pedigree_file=pediall subjectID=subject pedgreeID=famid markername=snp globbing=*/1:22 or 1,2,3,5:10 [ANALYSIS] phenolist_file= pheno_list=bmi/qt covariates= program=SASGLM analysis=mixed [OUTPUT] output_dir=/dsguser/qunyuan/fhs/bmi output_file= output_replace=no [RUN] clusterjobname=bmimixed memsize=1000M … genotype_dir=IN_FILE genotype_file=genolist.txt Phenotypecovarprogramanalysis run Bmiqtage,sexSASGLMmixed YES Obes qlNASASGLMgee YES HD qlageSASGLMgee NO Age… Sex… … Programlanguagelocation Maintainer SASGLMSAS/dsg1/code/sas/glm.sasQ.Zhang GSTATR/dsg1/code/R/gstat.RQ.Zhang sasPLINKSAS /dsg1/code/sas/plink.sas J. Czajkowski …

5 gwas.R source("/dsguser/qunyuan/gwas/code/gwaslib.R") option.file=commandArgs()[3] read.table(option.file,sep="=",stringsAsFactors=F)->x x$V1=gsub(" ","",x$V1) x$V2=gsub(" ","",x$V2) rownames(x)=x$V1 gf=x["genotype_file",2] if (nchar(gf)>0) strsplit(gf,split=",")[[1]]->gfs if (nchar(gf)==0) { gfs=dir(x["genotype_dir",2]) if (x["output_replace",2]=="no") { outfs=dir(x["output_dir",2]) gfs=gfs[!(gfs %in% outfs)] } gfs=gsub(".sas7bdat","",gfs) } (dir.create(x["output_dir",2],recursive=T)) for (fi in gfs) { x["genotype_file",2]=fi xi=paste("%let ", x$V1,"=", x$V2, ";",sep="") optfile=paste(fi,".opt",sep="") write.table(xi,file=optfile,quote=F,row.names=F,col.n ames=F) cmd=paste("sas -memsize",x["memsize",2],"- sysparm",paste("'",optfile,"'",sep=""),x["program",2]) if (nchar(x["clusterjobname",2])>0) cmd=paste ("bsub -J",x["clusterjobname",2],cmd) system(cmd) } sasOption.job1 sasOption.job2 sasOption.job3 … sasOption.jobN %let genotype_dir=/dsguser/qunyuan/gwas/cluster/impudata/; %let genotype_file=mldosechr6_12; %let phenotype_dir=/dsguser/qunyuan/gwas/cluster/data/; %let phenotype_file=pheno100; %let markerinfo_dir=; %let markerinfo_file=; %let marker_selection=MAF<0.01; %let pedigree_dir=; %let pedigree_file=; %let subjectID=subject; %let pedgreeID=famid; %let markername=snp; %let phenolist_file=; %let phenolist=bmi; %let phenodata_type=qt; %let covariates=; %let program=SASGLM; %let analysis=mixed; %let output_dir=/dsguser/qunyuan/gwas/run/run2/mixck/; %let output_replace=yes; %let clusterjobname=; %let memsize=1000M; sas –memsize 1000 M-sysparm ‘sasOption.job1’ /dsg1/code/sas/glm.sas & bsub –J jobname sas –sysparm ‘sasOption.job1’ /dsg1/code/sas/glm.sas %include “&sysparm”;.

6 Programming Tips Get all information that your program needs from one option file, pass the option file to your program via command line (e.g. sas –sysparm ‘optionfile’ yourscript.sas), or at least define all the options using variables at the top your script. For any information that may change in different analyses, instead of fixed names, use variables. (e.g. set my data(keep=&markname) ) Keep a good structure, use functionally independent macros and/or functions to organize your code. Don’t put every thing in one macro/function. %MACRO main(); %macro1(); *get p value for markers; %macro2(); *merge with marker positions & allele frequencies; %MEND main; %main(); Avoid name conflict. Name your variables and datasets using unique prefixes (e.g. QZ_temp). Write as little as you can. chrom loop; pheno loop; geno loop; yourmacro(indat=,y=,x=,covar=,method=,outdat=); end;end;end; Error handling.


Download ppt "GWAS Analysis Pipeline GWAS Data Analysis Options (universal interface) Results  Less time  Less errors  Easier Module 1 Module 2 Module 3 …"

Similar presentations


Ads by Google