Structural Equation Modeling analysis for causal inference from multiple -omics datasets So-Youn Shin, Ann-Kristin Petersen Christian Gieger, Nicole Soranzo.

Structural Equation Modeling analysis for causal inference from multiple -omics datasets So-Youn Shin, Ann-Kristin Petersen Christian Gieger, Nicole Soranzo

[J. L. Griffin and J. P. Shockcor (2004) Nature Reviews Cancer]

[K. Suhre, S.-Y. Shin, et al. (2011) Nature]

Integrative Analysis for multiple –omics 1.Motivations – To dissect biological and genetic determinants of normal phenotypic variation and disease states – To validate results of individual –omics levels by reducing false positives caused by technical and methodological biases 2.Several analytical challenges – High dimensional, highly correlated datasets Normalization and Missing value estimation/imputation Biologically relevant dimension reduction – Methodologies Correlation → Causation Linearity → Nonlinearity (e.g. interaction) Multiple testing correction, Validation and Replication

Causal inference Study aim: To dissect mediation at serum lipid loci using metabolomics DNA variationMetabolomicsLipids ? [A.-K. Petersen, S.-Y. Shin, et al. (2011) Under Review]

Study design Model selection(KORA, N=~1,800) Linear models on 95 LipidSNPs, 151 Metabolites (and ~10,000 ratios), 4 Lipids P ≤ 3.4x10 -6 P ≤ 8.7x10 -5 P ≤ 0.05 LipidSNPMetabolite LipidLipidSNPLipid [A.-K. Petersen, S.-Y. Shin, et al. (2011) Under Review] Model testing (Structural Equation Modeling) LipidSNP Metabolite Lipid Replication (TwinsUK, N=~800) Model testing (Structural Equation Modeling) LipidSNP PC Lipid 50 Principal Components (97% Variance) Metabolite PC Replication (TwinsUK, N=~800) Interpretation of Principal Components

Structural Equation Modeling MAMA A MBMB B MPMP R package “sem”

Structural Equation Modeling MET LIP SNP Model 1 MET LIP SNP Model 2 MET LIP SNP Model 3 MET LIP SNP Model 4 MET LIP SNP Model 5 MET LIP SNP Model 6 MET LIP SNP Model 7 MET LIP SNP Model 8 MET LIP SNP Model 9 MET LIP SNP Model 10 All possible models -> Best fit (p-value, BIC)

Structural Equation Modeling Assumptions – Statistical assumptions (like any regression models) – Causal assumptions (based on biological knowledge) Pros – Flexible hypotheses : Both direct and indirect effects are allowed. (vs. Mendelian Randomization) – A variable can be both predictor and response simultaneously. (vs. Bayesian network analysis) Cons – Nonlinearity cannot be detected. – Hidden confounders or measurement errors can mislead causal inference. (same with biological experiments)

Causal inference We tested 95 loci associated with serum lipid levels. We applied SEM to test causal inference, on METs or PCs. 260 association sets met our criteria for significant edges in SNP -> MET -> Lipid at 3 loci (FADS1, GCKR, APOA1). METs and PCs showed similar results. We suggest that SEM is an appropriate statistical instrument to dissect the contribution of intermediate phenotypes to complex biological pathways. DNA variationMetabolomicsLipids [A.-K. Petersen, S.-Y. Shin, et al. (2011) Under Review]

Our ongoing project TwinsUK ~600k SNPs ~48k Probes ~32k Metabolic traits Overlapping N = ~600

Missing Values Issues in multivariate analyses Ignore vs. Impute How to impute – Impute with mean (row mean) – K-nearest-neighbors (kNN) – Transform based methods (SVD, Bayesian PCA) BPCA and GMC (Gaussian mixtures) seemed to perform better than SVD, row mean and kNN [R. Jornsten et al. (2005) Bioinformatics] BPCA and LSA (least squares adaptive) appeared to be the best [S. Oh et al. (2011) Bioinformatics]

Test of Bayesian PCA R-package “pcaMethods”

Dimension Reduction Methods Principal Component Analysis – kernel PCA Factor Analysis, Multidimensional Scaling Canonical Correlation Analysis – regularized CCA – kernel CCA Partial Least Squares (canonical mode) – Sparse PLS

Test of rCCA No significant cross correlation in two datasets rCCA extract features (canonical covariates) while maximizing the correlation between two datasets Significant cross correlation after rCCA : Is this biological meaningful or not? R package “CCA” or “mixOmics”

Open questions remain how best to integrate the multiple omics datasets to understand underlying biological mechanisms and infer causal pathways. Integrative Analysis for multiple –omics

Acknowledgements Wellcome Trust Sanger Institute Nicole Soranzo, YasinMemari, AparnaRadhakrishnan PanosDeloukas, ElinGrunberg KORA Ann-Kristin Petersen, Christian Gieger, KarstenShure TwinsUK Tim Spector, Massimo Mangino, GuangjuZhai, Kerrin Small

Structural Equation Modeling analysis for causal inference from multiple -omics datasets So-Youn Shin, Ann-Kristin Petersen Christian Gieger, Nicole Soranzo.

Similar presentations

Presentation on theme: "Structural Equation Modeling analysis for causal inference from multiple -omics datasets So-Youn Shin, Ann-Kristin Petersen Christian Gieger, Nicole Soranzo."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Structural Equation Modeling analysis for causal inference from multiple -omics datasets So-Youn Shin, Ann-Kristin Petersen Christian Gieger, Nicole Soranzo.

Similar presentations

Presentation on theme: "Structural Equation Modeling analysis for causal inference from multiple -omics datasets So-Youn Shin, Ann-Kristin Petersen Christian Gieger, Nicole Soranzo."— Presentation transcript:

Similar presentations

About project

Feedback