Presentation is loading. Please wait.

Presentation is loading. Please wait.

Workshop Files www.phil.cmu.edu/projects/tetrad_download/download/acis.2018.workshop.

Similar presentations


Presentation on theme: "Workshop Files www.phil.cmu.edu/projects/tetrad_download/download/acis.2018.workshop."— Presentation transcript:

1 Workshop Files

2 Goals Basic working knowledge of graphical causal models
Basic working knowledge of Tetrad 6 Basic understanding of search algorithms

3 Causal Estimation vs. Causal Search
Estimation (Potential Outcomes) Causal Question: Effect of Zidovudine on Survival among HIV-positive men (Hernan, et al., 2000) Problem: confounders (CD4 lymphocyte count) vary over time, and they are dependent on previous treatment with Zidovudine Estimation method discussed: marginal structural models Assumptions: Treatment measured reliably Measured covariates sufficient to capture major sources of confounding Model of treatment given the past is accurate Output: Effect estimate with confidence intervals Fundamental Problem: estimation/inference is conditional on the model 1

4 Causal Estimation vs. Causal Search
Search (Causal Graphical Models) Causal Question: which genes regulate flowering in Arbidopsis Problem: over 25,000 potential genes. Method: graphical model search Assumptions: RNA microarray measurement reasonable proxy for gene expression Causal Markov assumption Etc. Output: Suggestions for follow-up experiments Fundamental Problem: model space grows super-exponentially with the number of variables 1

5 Which genes regulate flowering time in Arabidopsis thaliana?
Example* Which genes regulate flowering time in Arabidopsis thaliana? Why is this an interesting question to answer?: Timing of flowering is an important agronomical trait that greatly affects yield. Thus, improved knowledge about genes controlling flowering time is of substantial economic value. * Stekhoven DJ, et al. Causal stability ranking. Bioinformatics 28 (2012)

6 Genetic Regulatory Networks
Micro-array data ~22,000 variables N = 47 gene expression profiles of 4-day old seedlings for which subsequent flowering time was also measured Affymetrix ATH1 arrays Causal Discovery Candidate Regulators of Flowering time Greenhouse experiments on flowering time

7 Candidate Regulators of Flowering Time
Considered the 25 genes that robustly emerged as “candidate causes” according to a customized discovery procedure based on PC 5 of those 25 genes : known regulators of flowering 13 of those 25 genes: not known regulators, and mutant seeds available

8 Greenhouse Experiments
Seeds were plated on Murashige and Skoog medium, stratified for 2 days at 4o C, grown on plates for 10 days, and transferred into soil in Conviron growth chambers Flowering time was measured in days to bolting Seed types yielding 4 or more plants were considered for analysis Flowering time

9 Results Flowering time 9 seed types  4 or more plants
4/9 : shorter flowering time (p < 0.05) than the control, wild-type plants Other methods (correlation, high dimensional regression, etc.) performed essentially at chance

10 Outline Representing/Modeling Causal Systems Causal Graphs
Parametric Models Bayes Nets Linear Structural Equation Models

11 Tetrad: Complete Causal Modeling Tool

12 Tetrad Demo & Hands-On Smoking Stained_Teeth LC
Build and Save two acyclic causal graphs: Build the Smoking graph picture above Build your own graph with 4 variables Build 2 causal graphs - one for smoking, yf, lc

13 Parametric Models

14 Instantiated Models

15 Causal Bayes Networks The Joint Distribution Factors
According to the Causal Graph, P(S,YF, L) = P(S) P(YF | S) P(LC | S)

16 Causal Bayes Networks The Joint Distribution Factors
According to the Causal Graph, P(S) P(YF | S) P(LC | S) = f() All variables binary [0,1]:  = {1, 2,3,4,5, } P(S = 0) = 1 P(S = 1) = 1 - 1 P(YF = 0 | S = 0) = 2 P(LC = 0 | S = 0) = 4 P(YF = 1 | S = 0) = 1- 2 P(LC = 1 | S = 0) = 1- 4 P(YF = 0 | S = 1) = 3 P(LC = 0 | S = 1) = 5 P(YF = 1 | S = 1) = 1- 3 P(LC = 1 | S = 1) = 1- 5

17 Causal Bayes Networks The Joint Distribution Factors
According to the Causal Graph, P(S,YF, LC) = P(S) P(YF | S) P(LC | S) = f() All variables binary [0,1]:  = {1, 2,3,4,5, } P(S,YF, LC) = P(S) P(YF | S) P(LC | YF, S) = f() All variables binary [0,1]:  = {1, 2,3,4,5, 6,7, }

18 Causal Bayes Networks The Joint Distribution Factors
According to the Causal Graph, P(S,YF, L) = P(S) P(YF | S) P(LC | S) P(S = 0) = .7 P(S = 1) = .3 P(YF = 0 | S = 0) = .99 P(LC = 0 | S = 0) = .95 P(YF = 1 | S = 0) = .01 P(LC = 1 | S = 0) = .05 P(YF = 0 | S = 1) = .20 P(LC = 0 | S = 1) = .80 P(YF = 1 | S = 1) = .80 P(LC = 1 | S = 1) = .20 P(S=1,YF=1, LC=1) = ?

19 Causal Bayes Networks The Joint Distribution Factors
According to the Causal Graph, P(S,YF, L) = P(S) P(YF | S) P(LC | S) P(S = 0) = .7 P(S = 1) = .3 P(YF = 0 | S = 0) = .99 P(LC = 0 | S = 0) = .95 P(YF = 1 | S = 0) = .01 P(LC = 1 | S = 0) = .05 P(YF = 0 | S = 1) = .20 P(LC = 0 | S = 1) = .80 P(YF = 1 | S = 1) = .80 P(LC = 1 | S = 1) = .20 P(S=1,YF=1, LC=1) = P(S=1) P(YF=1 | S=1) P(LC = 1 | S=1) P(S=1,YF=1, LC=1) = * * .20 = .048

20 Tetrad Demo & Hands-On Use the DAG you built for Smoking, YF, and LC
Define the Bayes PM (# and values of categories for each variable) Attach a Bayes IM to the Bayes PM Fill in the Conditional Probability Tables (make the values plausible). Build 2 causal graphs - one for smoking, yf, lc

21 Structural Equation Models
Causal Graph Structural Equations For each variable X  V, an assignment equation: X := fX(immediate-causes(X), eX) 1. example of recursive structural equation model without correlated errors 2. can show that assumption of independence of errors guarantees correctness of probabilitic interpretation 3. this represents both probability and causality Exogenous Distribution: Joint distribution over the exogenous vars : P(e)

22 Linear Structural Equation Models
Path diagram Causal Graph Equations: Education := Education Income :=Educationincome Longevity :=EducationLongevity Exogenous Distribution: P(ed, Income,Income ) - i≠j ei  ej (pairwise independence) - no variance is zero 1. example of recursive structural equation model without correlated errors 2. can show that assumption of independence of errors guarantees correctness of probabilitic interpretation 3. this represents both probability and causality Structural Equation Model: V = BV + E E.g. (ed, Income,Income ) ~N(0,2) 2 diagonal, - no variance is zero

23 Standardized SEMs Attach a SEM PM to your 3-4 variable graph
Attach a SEM IM to the SEM PM Change the coefficient values. Attach a Standardized SEM IM to the SEM PM, or the SEM IM Compare the Implied Matrices Build 2 causal graphs - one for smoking, yf, lc

24 Estimation

25 Estimation

26 Tetrad Demo and Hands-on
Build the standardized SEM IM shown below Generate simulated data N=1000 Estimate model. Save session as “Estimate1”

27 Estimation

28 Coefficient inference vs. Model Fit
Coefficient Inference: Null: coefficient = 0, e.g., bX1 X3 = 0 p-value = p(Estimated value bX1 X3 ≥ | bX1 X3 = 0 & rest of model correct) Reject null (coefficient is “significant”) when p-value < a, a usually = .05

29 Coefficient inference vs. Model Fit
Coefficient Inference: Null: coefficient = 0, e.g., bX1 X3 = 0 p-value = p(Estimated value bX1 X3 ≥ | bX1 X3 = 0 & rest of model correct) Reject null (coefficient is “significant”) when p-value < < a, a usually = .05, Model fit: Null: Model is correctly specified (constraints true in population) p-value = p(f(Deviation(Sml,S)) ≥ | Model correctly specified)

30 Tetrad Demo and Hands-on
Create two DAGs with the same variables – each with one edge flipped, and attach a SEM PM to each new graph (copy and paste by selecting nodes, Ctl-C to copy, and then Ctl-V to paste) Estimate each new model on the data produced by original graph Check p-values of: Edge coefficients Model fit Save session as: “estimation2”

31 Charitable Giving What influences giving? Sympathy? Impact?
"The Donor is in the Details", Organizational Behavior and Human Decision Processes, Issue 1, 15-23, C. Cryder, with G. Loewenstein, R. Scheines. N = 94 TangibilityCondition [1,0] Randomly assigned experimental condition Imaginability [1..7] How concrete scenario I Sympathy [1..7] How much sympathy for target Impact [1..7] How much impact will my donation have AmountDonated [0..5] How much actually donated 1

32 Theoretical Hypothesis

33 Tetrad Demo and Hands-on
Load charity.txt (tabular – not covariance data) Build graph of theoretical hypothesis Build SEM PM from graph Estimate PM, check results

34 Break

35 Search Models  Data Representing/Modeling Causal Systems
Estimation and Model fit Hands on with Real Data Models  Data PC & FGES (causally sufficient case) FCI (allow latent confounding) Regression vs. Casual Search

36 Search Methods Scoring Searches Constraint Based Searches
PC, FCI Pointwise, but not uniformly consistent Scoring Searches FGES Scores: BIC, AIC, etc. Search: Hill Climb, Genetic Alg., Simulated Annealing Difficult to extend to latent variable models Meek and Chickering Greedy Equivalence Class (GES) Latent Variable Psychometric Model Search BPC, MIMbuild, etc. Linear non-Gaussian models (Lingam) Models with cycles Images, and more!!!

37 Tetrad Demo and Hands On

38 Tetrad Demo and Hands-on
Go to “estimation2.tet” Add Search node (from Simulation1) - “forbid latent common causes” - choose “FGES” Add a “Graph” node to search result: choose “Dag in Pattern” Add a PM to Graph Estimate the PM on the data Compare model-fit to model fit for true model

39 Backround Knowledge Tetrad Demo and Hands-on
Create new session Select “Simulate from a given graph, and then Search” from Pipeline menu Build graph below, PM, IM, and generate sample data N=1,000. Execute PC search, a = .05

40 Backround Knowledge Tetrad Demo and Hands-on
Add “Knowledge” node Create “Tiers” as shown below. Execute PC search again, a = .05 Compare results (Search2) to previous search (Search1)

41 Backround Knowledge Direct and Indirect Consequences
True Graph PC Output Background Knowledge PC Output No Background Knowledge

42 Backround Knowledge Direct and Indirect Consequences
True Graph Direct Consequence Of Background Knowledge Indirect Consequence Of Background Knowledge PC Output Background Knowledge PC Output No Background Knowledge

43 College Plans (Search)
Load in College_plans_data Use file: College_plans_data Set Variables = discrete Enter Background Knowledge: Tier 1: Sex , SES Tier 2: IQ Tier 3: pe, cp Add search node Choose FCI (Allow Latent Common Causes) Add another search node Set Bootstrap = 100

44 Regression vs. Causal Search
Multiple Regression (X = putative cause , Y = putative effect) Identify all potential confounders Z, such that Z is prior to X Z is associated with X, and Z is associated with Y Regress Y on X, Z Coefficient on X is evidence of X  Y Causal Search: Find/compute all the causal models that are indistinguishable given background knowledge and data Represent features common to all such models Multiple Regression is the wrong tool for Causal Discovery

45 Regression vs. Causal Search

46 Foreign Investment Does Foreign Investment in 3rd World Countries inhibit Democracy? Timberlake, M. and Williams, K. (1984). Dependence, political exclusion, and government repression: Some cross-national evidence. American Sociological Review 49, N = 72 PO degree of political exclusivity CV lack of civil liberties EN energy consumption per capita (economic development) FI level of foreign investment 1

47 Foreign Investment Correlations po fi en cv po 1.0 fi -.175 1.0

48 Case Study: Foreign Investment
Regression Results po = *fi *en *cv SE (.058) (.059) (.060) t P Interpretation: foreign investment increases political repression 1

49 Case Study: Foreign Investment Alternative Models
There is no model with testable constraints (df > 0) that is not rejected by the data, in which FI has a positive effect on PO.

50 Charitable Giving (Search)
Load in charity data Add search node Enter Background Knowledge: Tangibility is exogenous Amount Donated is endogenous only Tangibility  Imaginability is required Choose and execute one of the “Pattern searches” Add a “Graph” node to search result: “choose Dag in Pattern” Add a PM to Graph Estimate the PM on the data Compare model-fit to hypothetical model

51 For More https://www.ccd.pitt.edu/


Download ppt "Workshop Files www.phil.cmu.edu/projects/tetrad_download/download/acis.2018.workshop."

Similar presentations


Ads by Google