Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Associating Genomic Variations with Phenotypes Model comparison, rare variants, and analysis pipeline Qunyuan Zhang Division of Statistical Genomics.

Similar presentations


Presentation on theme: "1 Associating Genomic Variations with Phenotypes Model comparison, rare variants, and analysis pipeline Qunyuan Zhang Division of Statistical Genomics."— Presentation transcript:

1 1 Associating Genomic Variations with Phenotypes Model comparison, rare variants, and analysis pipeline Qunyuan Zhang Division of Statistical Genomics & Genome Institute Washington University School of Medicine

2 2 Data & Question Relationship between X and Y ? Genotypes: SNP Insertion Deletion Duplication Inversion Translocation … Phenotypes (quantitative, categorical)

3 3 Linkage & Association Association: (Y,X) Linkage: (Y,Q) Q is unobservable Genotypes Phenotype Putative QTL r 1 Q r 2

4 4 A Fixed-effect Mixture Model For Linkage Commonly used in plant genetics r 1 Q r 2 P 1 X P 2 F1F1 F2F2 SNP A SNP B

5 5 A Variance-component Model For Linkage Commonly used in human genetics r 1 Q r 2 Background IBD matrix QTL IBD matrix Diagonal unit matrix SNP A SNP B

6 6 Variance-component Model = Random-effect Linear Model Random effects

7 7 From Linkage to Association marker effect(s) Family-based association model Linkage model QTL effect(s) fixed effect(s)

8 8 A Simple Association Model For Unrelated Subjects

9 9 Covariate(s): Adjusting For Confounder(s) Observed confounders: age, sex etc. Hidden confounders: population structure Population structure can be estimated by: -PCA -Clustering -Admixture/ancestry

10 10 Modeling Hidden Genetic Correlation Between Subjects marker fixed effect(s) Family data, pedigree => IBD matrix Population data, hidden, marker data => IBS matrix covariate fixed effect(s) Genetic background random effects

11 11 Modeling Rare Variants Common variants, tested individually, H0: β 1 =0. One p-value per variant Rare variants, tested as an entire group (burden test), usually by gene H0: β 1 = β 2 =…=β k =0. One p-value per group of variants  Incorporated with variable selection, with loose criteria  β can be treated as random effects, variance components test, can be weighted by prior information

12 12 Collapsing Model Collapsing multiple variables into one

13 13 Weighted Sum Model Weighted sum score

14 14 Weighting Variants  Base on allele frequency, continuous or binary(0,1) weight, variable threshold;  Based on function annotation/prediction;  Based on sequencing quality (coverage, mapping quality, genotyping quality, validated or not etc.);  Data-driven, using both genotype and phenotype data, learning weights (including effect directions) from data, requiring permutation test;  Any combination … Grouping Variants By geneBy transcriptBy exon By gene set / pathwayBy protein domain ……

15 15 Modeling More Data Types Generalized Linear (Mixed) Model Link function For binary Y, logistic model

16 16 Longitudinal Data (quantitative)  Fixed effect, time as covariate  Repeated measures, random effect, correlation within subjects Time

17 17 Longitudinal Data (binary)  Linear model, time as covariate  Survival analysis, CoxPH model etc. Time

18 18 Tools SAS Procedures REG, LOGISTIC, GENMOD, MIXED, HPMIXED, GLIMMIX, PHREG/LIFETEST R Functions/Packages lm (), glm() gee, nlme, kinship2/coxme, lme4, survival Other Programs SOLAR, MMAP, EMMA, EMMAX, SKAT

19 19 Pipeline job1 job2 ….. Job N Input (data + options) Options.jobi => self-programmed modules (SAS, R,…) Options.jobi => external program modules (MMAP, SKAT,..) Result 1 Result 2 ….. Result N Job generating/submitting module Job number controlling module Job status monitoring module (all done ?) Yes Result summarizing module no Wait … LSF bsub

20 20 gwas.sh options.gwa #!/bin/sh OPFILE=$1... … [DATA] database=SAS genotype_dir=/dsg1/gwas/fhsgeno genotype_file= phenotype_file=fhs100 markerinfo_file=mapall marker_selection=MAF>0.01 pedigree_file=pediall subjectID=subject pedgreeID=famid markername=snp … [ANALYSIS] phenolist_file= pheno_list=bmi/qt covariates= program=SASGLM analysis=mixed [OUTPUT] output_dir=/dsguser/qunyuan/fhs/bmi output_file= output_replace=no [RUN] clusterjobname=bmimixed memsize=1000M maxjobn=300 … Phenotypecovarprogramanalysis run Bmiqtage,sexSASGLMmixed YES Obes qlNASASGLMgee YES HD qlageSASGLMgee NO Age… Sex… … Programlanguagelocation Maintainer SASGLMSAS/dsg1/code/sas/glm.sasQ.Zhang GSTATR/dsg1/code/R/gstat.RQ.Zhang MMAPC /dsg1/code/sas/mmap.sh J. Czajkowski …

21 21 Thanks !


Download ppt "1 Associating Genomic Variations with Phenotypes Model comparison, rare variants, and analysis pipeline Qunyuan Zhang Division of Statistical Genomics."

Similar presentations


Ads by Google