Presentation is loading. Please wait.

Presentation is loading. Please wait.

Workshop Files www.phil.cmu.edu/projects/tetrad_download/download/ccd.2018.workshop.

Similar presentations


Presentation on theme: "Workshop Files www.phil.cmu.edu/projects/tetrad_download/download/ccd.2018.workshop."— Presentation transcript:

1 Workshop Files

2 Center for Causal Discovery: Summer Short Course/Datathon - 2018
June 11-15, 2018 Carnegie Mellon University

3 Goals Basic working knowledge of graphical causal models
Basic working knowledge of Tetrad 6 Basic understanding of search algorithms “Fully started” on using CCD algorithms/tools on real data, preferably your own. Grow the community of researchers, users, and students interested in causal discovery in biomedical research

4 Monday: Basics of Graphical Causal Models, Tetrad
Morning: 9 AM – Noon, Baker Hall A51 : Giant Eagle Auditorium Introduction – Greg Cooper Representing/Modeling Causal Systems Causal Graphs/Interventions Parametric Models Instantiated Models Afternoon: 1:30 PM – 4 PM, Baker Hall A51 : Giant Eagle Auditorium Estimation, Inference, and Model fit Real data examples: Charitable Giving College Plans Dinner: On your own

5 Tuesday: Basics of Search, Break-out Sessions
Morning: 9 AM – Noon, Baker Hall A51 : Giant Eagle Auditorium Searching for Causal Systems Model Equivalence Basic Search Algorithms Hands-on Work Afternoon: 1:30 PM – 4 PM, Baker Hall A51  breakout rooms CCD Software and Bridges Supercomputer (Jeremy Espino) Break-out Sessions 1: Brain/fMRI Cancer Protein signaling: single cell cytometry 4 – 6 PM - Reception: Hors d’oeuvres etc., Lobby outside A51 Dinner: On your own

6 Wednesday: Latent Variables, etc., Break-out Sessions
Morning: 9 AM – Noon, Baker Hall A51 : Giant Eagle Auditorium Latent Variable Model Search Mixed Data (Continuous and Discrete) Afternoon: 1:30 PM – 3:30 PM, Baker Hall A51  breakout rooms Break-out Sessions – 2 Dinner: On your own

7 Thursday: Research Area Overviews, Datathon
Morning: 9 AM – Noon, Baker Hall A51 : Giant Eagle Auditorium fMRI – Brain Cancer: Genomic Drivers Protein Signaling Pathways: Single Cell Data Consultation sign-up Short Course : Ends at Noon Datathon – Noon – 8 PM, Baker Hall A51: Giant Eagle Auditorium

8 Friday: DataThon Morning: 9 AM – Noon, Baker Hall A51 : Giant Eagle Auditorium Breakfast with Q&A (9 -10 AM) Data Hacking (10-12) Lunch, 12-1: on your own Afternoon: 1:00 PM – 4 PM, Giant Eagle Auditorium: Datathon 1:00 – 3:00 Data Hacking & Visualization 3:00 Participant Presentations

9 Questions?

10 Overview of CCD Greg Cooper, MD, PhD

11 Causal Discovery

12 25 Years of Advances 1992  2017: tremendous progress in:
Mathematically representing causal networks Discovery algorithms for finding causal networks from a combination of data and knowledge. These methods apply to biomedical data. Statistics, Computer Science, Philosophy, etc. America, Europe, Japan

13 Modern Theory of Statistical Causal Models
Graphical Models Intervention & Manipulation Modern Theory of Statistical Causal Models Potential Outcome Models Testable Constraints (e.g., Independence) Counterfactuals 1

14 Causal Inference Requires More than Probability
Prediction from Observation ≠ Prediction from Intervention P(Lung Cancer 1960 = yes | Tar-stained fingers = yes) P(Lung Cancer 1960 = yes | Tar-stained fingers do(1950 = yes)) In general: P(Y=y | X=x, Z=z) ≠ P(Y=y | do(X=x), Z=z)) Causal Prediction vs. Statistical Prediction: Non-experimental data (observational study) Background Knowledge P(Y,X,Z) P(Y=y | X=x, Z=z) Causal Structure P(Y=y | do(X=x, Z=z)

15 P(Stained Teeth = yes) =
.5

16 P(Smoker= yes) = .5

17 P(Lung Cancer = yes) = .5

18 Conditioning on Stained Teeth = no

19 P(Lung Cancer = yes | Stained Teeth = no) =
.25

20 Lung Cancer _||_ Stained Teeth

21 Manipulating White Teeth

22 Pm(Lung cancer = yes | do(Stained Teeth= no)) =
.5

23 Lung Cancer _||_m Stained Teeth
P(LC = yes) = = Pm(LC= yes | do(Stained Teeth = no)) = .5 Lung Cancer _||_m Stained Teeth

24

25 Causal Estimation vs. Causal Search
Estimation (Potential Outcomes) Causal Question: Effect of Zidovudine on Survival among HIV-positive men (Hernan, et al., 2000) Problem: confounders (CD4 lymphocyte count) vary over time, and they are dependent on previous treatment with Zidovudine Estimation method discussed: marginal structural models Assumptions: Treatment measured reliably Measured covariates sufficient to capture major sources of confounding Model of treatment given the past is accurate Output: Effect estimate with confidence intervals Fundamental Problem: estimation/inference is conditional on the model 1

26 Causal Estimation vs. Causal Search
Search (Causal Graphical Models) Causal Question: which genes regulate flowering in Arbidopsis Problem: over 25,000 potential genes. Method: graphical model search Assumptions: RNA microarray measurement reasonable proxy for gene expression Causal Markov assumption Etc. Output: Suggestions for follow-up experiments Fundamental Problem: model space grows super-exponentially with the number of variables 1

27 Number of variables (nodes)
The Number of Causal Model Structures as a Function of the Number of Measured Variables* Number of variables (nodes) Number of Causal Model Structures 1 2 3 25 4 543 5 29,281 6 3,781,503 7 1.1 x 109 8 7.8 x 1011 9 1.2 x 1015 10 4.2 x 1018 * Assumes there are no latent variables and no directed cycles.

28 Causal Search Causal Search:
Find/compute all the causal models that are indistinguishable given background knowledge and data Represent features common to all such models Multiple Regression is often the wrong tool for Causal Search: Example: Foreign Investment & Democracy

29 Foreign Investment Does Foreign Investment in 3rd World Countries inhibit Democracy? Timberlake, M. and Williams, K. (1984). Dependence, political exclusion, and government repression: Some cross-national evidence. American Sociological Review 49, N = 72 PO degree of political exclusivity CV lack of civil liberties EN energy consumption per capita (economic development) FI level of foreign investment 1

30 Foreign Investment Correlations po fi en cv po 1.0 fi -.175 1.0

31 Case Study: Foreign Investment
Regression Results po = *fi *en *cv SE (.058) (.059) (.060) t P Interpretation: foreign investment increases political repression 1

32 Case Study: Foreign Investment Alternative Models
There is no model with testable constraints (df > 0) that is not rejected by the data, in which FI has a positive effect on PO.

33 Outline Representing/Modeling Causal Systems Causal Graphs
Parametric Models Bayes Nets Structural Equation Models Generalized SEMs

34 Causal Graphs Causal Graph G = {V,E}
Each edge X  Y represents a direct causal claim: X is a direct cause of Y relative to V Years of Education Income 1. don’t define causality - but will introduce axioms to connect probability to causality 2. many fields proceed without agreement on definition - probability, “force” in mechanics, interpretation of quantum mechanics, etc. 3. a number of different kinds of graphs represent probability distributions and independence - advantage of directed graphs is also represents causal relations 4. will introduce several extensions Income Skills and Knowledge Years of Education

35 Omitteed Common Causes
Causal Graphs Omitteed Causes Not Cause Complete Income Skills and Knowledge Years of Education Common Cause Complete Omitteed Common Causes 1. don’t define causality - but will introduce axioms to connect probability to causality 2. many fields proceed without agreement on definition - probability, “force” in mechanics, interpretation of quantum mechanics, etc. 3. a number of different kinds of graphs represent probability distributions and independence - advantage of directed graphs is also represents causal relations 4. will introduce several extensions Income Skills and Knowledge Years of Education

36 Tetrad: Complete Causal Modeling Tool

37 Tetrad Demo & Hands-On Smoking Stained_Teeth LC
Build and Save two acyclic causal graphs: Build the Smoking graph picture above Build your own graph with 4 variables Build 2 causal graphs - one for smoking, yf, lc

38 Modeling Ideal Interventions
Interventions on the Effect Post Pre-experimental System Room Temperature Sweaters On

39 Modeling Ideal Interventions
Interventions on the Cause Post Pre-experimental System Sweaters On Room Temperature

40 Interventions & Causal Graphs
Model an ideal intervention by adding an “intervention” variable outside the original system as a direct cause of its target. Pre-intervention graph Intervene on Income “Hard” Intervention Fat Hand - intervention - cholesterol drug -- arythmia “Soft” Intervention

41 Interventions & Causal Graphs
X5 X2 X1 Pre-intervention Graph X4 X3 X6 Intervention: hard intervention on both X1, X4 Soft intervention on X3 X1 X2 X3 X4 X6 X5 I S Fat Hand - intervention - cholesterol drug -- arythmia Post-Intervention Graph?

42 Interventions & Causal Graphs
X5 X2 X1 Pre-intervention Graph X4 X3 X6 Intervention: hard intervention on both X1, X4 Soft intervention on X3 X1 X2 X3 X4 X6 X5 I S Fat Hand - intervention - cholesterol drug -- arythmia Post-Intervention Graph?

43 Interventions & Causal Graphs
X5 X2 X1 Pre-intervention Graph X4 X3 X6 Intervention: hard intervention on X3 Soft interventions on X6, X4 I S X1 X2 X3 X4 X6 X5 Fat Hand - intervention - cholesterol drug -- arythmia Post-Intervention Graph?

44 Interventions & Causal Graphs
Smoking Pre-intervention Graph Stained_Teeth LC Trek between Stained_Teeth and LC In Pre-Intervention Graph? Treks  Association Yes  Stained_Teeth _||_ LC Smoking Paint Teeth White Stained_Teeth LC Fat Hand - intervention - cholesterol drug -- arythmia Trek between Stained_Teeth and LC In Post-Intervention Graph? Treks  Association No  Stained_Teeth _||_m LC

45 Parametric Models

46 Instantiated Models

47 Causal Bayes Networks The Joint Distribution Factors
According to the Causal Graph, P(S,YF, L) = P(S) P(YF | S) P(LC | S)

48 Causal Bayes Networks The Joint Distribution Factors
According to the Causal Graph, P(S) P(YF | S) P(LC | S) = f() All variables binary [0,1]:  = {1, 2,3,4,5, } P(S = 0) = 1 P(S = 1) = 1 - 1 P(YF = 0 | S = 0) = 2 P(LC = 0 | S = 0) = 4 P(YF = 1 | S = 0) = 1- 2 P(LC = 1 | S = 0) = 1- 4 P(YF = 0 | S = 1) = 3 P(LC = 0 | S = 1) = 5 P(YF = 1 | S = 1) = 1- 3 P(LC = 1 | S = 1) = 1- 5

49 Causal Bayes Networks The Joint Distribution Factors
According to the Causal Graph, P(S,YF, LC) = P(S) P(YF | S) P(LC | S) = f() All variables binary [0,1]:  = {1, 2,3,4,5, } P(S,YF, LC) = P(S) P(YF | S) P(LC | YF, S) = f() All variables binary [0,1]:  = {1, 2,3,4,5, 6,7, }

50 Causal Bayes Networks The Joint Distribution Factors
According to the Causal Graph, P(S,YF, L) = P(S) P(YF | S) P(LC | S) P(S = 0) = .7 P(S = 1) = .3 P(YF = 0 | S = 0) = .99 P(LC = 0 | S = 0) = .95 P(YF = 1 | S = 0) = .01 P(LC = 1 | S = 0) = .05 P(YF = 0 | S = 1) = .20 P(LC = 0 | S = 1) = .80 P(YF = 1 | S = 1) = .80 P(LC = 1 | S = 1) = .20 P(S=1,YF=1, LC=1) = ?

51 Causal Bayes Networks The Joint Distribution Factors
According to the Causal Graph, P(S,YF, L) = P(S) P(YF | S) P(LC | S) P(S = 0) = .7 P(S = 1) = .3 P(YF = 0 | S = 0) = .99 P(LC = 0 | S = 0) = .95 P(YF = 1 | S = 0) = .01 P(LC = 1 | S = 0) = .05 P(YF = 0 | S = 1) = .20 P(LC = 0 | S = 1) = .80 P(YF = 1 | S = 1) = .80 P(LC = 1 | S = 1) = .20 P(S=1,YF=1, LC=1) = P(S=1) P(YF=1 | S=1) P(LC = 1 | S=1) P(S=1,YF=1, LC=1) = * * .20 = .048

52 Calculating the effect of a hard interventions
P(YF,S,L) = P(S) P(YF|S) P(L|S) Pm (YF,S,L) = P(S) P(L|S) P(YF| I)

53 Calculating the effect of a hard intervention
P(S,YF, L) = P(S) P(YF | S) P(LC | S) P(S=1,YF=1, LC=1) = * * = .048 P(YF =1 | I ) = .5 Pm (S=1,do(YF=1), LC=1) = ? Pm (S=1, do(YF=1), LC=1) = P(S) P(YF | I) P(LC | S) Pm (S=1, do(YF=1), LC=1) = .3 * * = .03

54 Calculating the effect of a soft intervention
P(YF,S,L) = P(S) P(YF|S) P(L|S) Pm (YF,S,L) = P(S) P(L|S) P(YF| S, Soft)

55 Tetrad Demo & Hands-On Use the DAG you built for Smoking, YF, and LC
Define the Bayes PM (# and values of categories for each variable) Attach a Bayes IM to the Bayes PM Fill in the Conditional Probability Tables (make the values plausible). Build 2 causal graphs - one for smoking, yf, lc

56 Updating

57 Tetrad Demo Use the IM just built of Smoking, YF, LC
Update LC on evidence: YF = yes Update LC on evidence: do(YF = yes)

58 Structural Equation Models
Causal Graph Structural Equations For each variable X  V, an assignment equation: X := fX(immediate-causes(X), eX) 1. example of recursive structural equation model without correlated errors 2. can show that assumption of independence of errors guarantees correctness of probabilitic interpretation 3. this represents both probability and causality Exogenous Distribution: Joint distribution over the exogenous vars : P(e)

59 Linear Structural Equation Models
Path diagram Causal Graph Equations: Education := Education Income :=Educationincome Longevity :=EducationLongevity Exogenous Distribution: P(ed, Income,Income ) - i≠j ei  ej (pairwise independence) - no variance is zero 1. example of recursive structural equation model without correlated errors 2. can show that assumption of independence of errors guarantees correctness of probabilitic interpretation 3. this represents both probability and causality Structural Equation Model: V = BV + E E.g. (ed, Income,Income ) ~N(0,2) 2 diagonal, - no variance is zero

60 Hands On Parameterize a DAG as a SEM. Add an IM
Make all coefficients positive Examine “Implied Covariance Matrix” and “Implied Correlation Matrix” Attach another IM  Standardized SEM

61 Lunch


Download ppt "Workshop Files www.phil.cmu.edu/projects/tetrad_download/download/ccd.2018.workshop."

Similar presentations


Ads by Google