Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Short Tutorial on Causal Network Modeling and Discovery

Similar presentations


Presentation on theme: "A Short Tutorial on Causal Network Modeling and Discovery"— Presentation transcript:

1 A Short Tutorial on Causal Network Modeling and Discovery
Greg Cooper Department of Biomedical Informatics University of Pittsburgh Modeling the World’s Systems 5/22/2018

2 Outline Brief background on causal network discovery
Introduction to a method for causal network discovery Software tools for causal network discovery

3 Causal modeling and discovery are at the core of much of science, engineering, medicine, business, and other domains

4 Current Circumstances Favorable for Causal Network Discovery from Data
Abundance of data + Dramatic increases in computing power Algorithmic advances in causal network discovery

5 Basic Causal Discovery Workflow
Causal Networks Prior Knowledge Causal Analysis Causal Hypotheses Experiments Data

6 Types of Data Include … Experimental data – controlled manipulation of some variables and observation of the others Observational data – observation only, with no manipulation

7 Types of Data Include … Experimental data – controlled manipulation of some variables and observation of the others Observational data – observation only, with no manipulation

8 Basic Components Needed to Learn Causal Networks from Observational Data
Causal network representation Causal network search Causal network evaluation

9 Causal Bayesian Networks (CBNs)
A directed acyclic graph Nodes represent variables Arcs represent direct causation A variable is modeled as independent of its non-effects, given its causal parents Example: The structure implies a factorization of the joint probability distribution P(A, B, C) = P(A) P(B | A) P(C| B) CBN structure } A B C Causation is important because it estimates the effects of possible actions, which can guide which actions we choose to take. The concept is a general one and includes predicting the response of a cell to a drug, as well as predicting how a given patient is likely to respond to alternative surgical procedures, for example. CBN parameters }

10 Methods for Learning CBNs from Observational Data
Constraint-based Bayesian Other

11 Methods for Learning CBNs from Observational Data
Constraint-based Bayesian Other

12 The Constraint-Based Method
Determine constraints that hold among the nodes (e.g., independence conditions based on statistical tests) Use the patterns of constraints to narrow the causal possibilities

13 A Hypothetical Example of the Constraint-Based Method
Three binary variables X, Y, Z The following is known: X occurs before Y X occurs before Z For instance X: gene mutation status Y: gene expression level Z: disease status Question: Does Y cause Z?

14 A Hypothetical Example of the Constraint-Based Method
Suppose statistical testing yields the following constraints dep(X, Y), dep(Y, Z), dep(X, Z), ind(X, Z | Y) Consider the consistency of these constraints with respect to the following causal networks: X X Y Z X Y Z H X X Y Z X Y Z X X Z Y H X Y Z None of these satisfy all four tests 94 additional causal networks

15 The Three Accepted ( ) Causal Networks Are No Longer Accepted If There is Hidden Confounding between Y and Z H X X Y Z H H X X Y Z H H X Y Z X

16 The Only Three Networks Consistent with the Four Constraints
X Y Z H X Y Z H X Y Z

17 The Only Three Networks Consistent with the Four Constraints
X Y Z H X Y Z H X Y Z

18 Summary of the Constraint-Based Causal Discovery Method
Reduces a large number of causal network possibilities to just those networks consistent with the constraints obtained from the data Looks for causal relationships that are common across those networks (e.g., Y  Z) Generalizes to many variables (1000s) and to more complex patterns of constraints that support causal relationships

19 A Real Application of the Simple Constraint-Based Method*
354 rheumatoid arthritis (RA) cases and 337 controls from Sweden X: SNPs measured with Illumina Human Hap chip Y: Differentiated methylation positions (DMPs) measured with an Illumina HumanMethylation450 array Z: RA status (yes/no) Core discovery algorithm: For all combinations of SNPs (X) and DMPs (Y), output Y  Z whenever the 4 statistical tests (above) hold Results Found 9 DMPs (Y) that the data support as causally influencing RA (Z) A validation study using a separate set of 24 total cases found changes in the 9 DMPs that were consistent with the original study X Y Z * Epigenome-wide association data implicate DNA methylation as an intermediary of genetic risk in rheumatoid arthritis. Liu Y, Aryee MJ, Padyukov L, et al. Nature Biotechnology 31 (2013)

20 Center for Causal Discovery

21

22

23 Suggested Reading Lagani V, Triantafillou S, Ball G, Tegner J, Tsamardinos I. Probabilistic computational causal discovery for systems biology. In: Uncertainty in Biology: A Computational Modeling Approach. Editors: Geris L, Gomez-Cabrero D (2016, Springer) Available at: mensxmachina.org under Publications for 2016

24 Acknowledgements The Center for Causal Discovery is supported by grant U54HG008540 awarded by the National Human Genome Research Institute through funds provided by the trans-NIH Big Data to Knowledge (BD2K) initiative ( The content of this presentation is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. 

25 Thank you

26


Download ppt "A Short Tutorial on Causal Network Modeling and Discovery"

Similar presentations


Ads by Google