Workshop Files www.phil.cmu.edu/projects/tetrad_download/download/acis.2018.workshop.

Workshop Files

Goals Basic working knowledge of graphical causal models
Basic working knowledge of Tetrad 6 Basic understanding of search algorithms

Causal Estimation vs. Causal Search
Estimation (Potential Outcomes) Causal Question: Effect of Zidovudine on Survival among HIV-positive men (Hernan, et al., 2000) Problem: confounders (CD4 lymphocyte count) vary over time, and they are dependent on previous treatment with Zidovudine Estimation method discussed: marginal structural models Assumptions: Treatment measured reliably Measured covariates sufficient to capture major sources of confounding Model of treatment given the past is accurate Output: Effect estimate with confidence intervals Fundamental Problem: estimation/inference is conditional on the model 1

Causal Estimation vs. Causal Search
Search (Causal Graphical Models) Causal Question: which genes regulate flowering in Arbidopsis Problem: over 25,000 potential genes. Method: graphical model search Assumptions: RNA microarray measurement reasonable proxy for gene expression Causal Markov assumption Etc. Output: Suggestions for follow-up experiments Fundamental Problem: model space grows super-exponentially with the number of variables 1

Which genes regulate flowering time in Arabidopsis thaliana?
Example* Which genes regulate flowering time in Arabidopsis thaliana? Why is this an interesting question to answer?: Timing of flowering is an important agronomical trait that greatly affects yield. Thus, improved knowledge about genes controlling flowering time is of substantial economic value. * Stekhoven DJ, et al. Causal stability ranking. Bioinformatics 28 (2012)

Genetic Regulatory Networks
Micro-array data ~22,000 variables N = 47 gene expression profiles of 4-day old seedlings for which subsequent flowering time was also measured Affymetrix ATH1 arrays Causal Discovery Candidate Regulators of Flowering time Greenhouse experiments on flowering time

Candidate Regulators of Flowering Time
Considered the 25 genes that robustly emerged as “candidate causes” according to a customized discovery procedure based on PC 5 of those 25 genes : known regulators of flowering 13 of those 25 genes: not known regulators, and mutant seeds available

Greenhouse Experiments
Seeds were plated on Murashige and Skoog medium, stratified for 2 days at 4o C, grown on plates for 10 days, and transferred into soil in Conviron growth chambers Flowering time was measured in days to bolting Seed types yielding 4 or more plants were considered for analysis Flowering time

Results Flowering time 9 seed types  4 or more plants
4/9 : shorter flowering time (p < 0.05) than the control, wild-type plants Other methods (correlation, high dimensional regression, etc.) performed essentially at chance

Outline Representing/Modeling Causal Systems Causal Graphs
Parametric Models Bayes Nets Linear Structural Equation Models

Tetrad: Complete Causal Modeling Tool

Tetrad Demo & Hands-On Smoking Stained_Teeth LC
Build and Save two acyclic causal graphs: Build the Smoking graph picture above Build your own graph with 4 variables Build 2 causal graphs - one for smoking, yf, lc

Parametric Models

Instantiated Models

Causal Bayes Networks The Joint Distribution Factors
According to the Causal Graph, P(S,YF, L) = P(S) P(YF | S) P(LC | S)

According to the Causal Graph, P(S) P(YF | S) P(LC | S) = f() All variables binary [0,1]:  = {1, 2,3,4,5, } P(S = 0) = 1 P(S = 1) = 1 - 1 P(YF = 0 | S = 0) = 2 P(LC = 0 | S = 0) = 4 P(YF = 1 | S = 0) = 1- 2 P(LC = 1 | S = 0) = 1- 4 P(YF = 0 | S = 1) = 3 P(LC = 0 | S = 1) = 5 P(YF = 1 | S = 1) = 1- 3 P(LC = 1 | S = 1) = 1- 5

According to the Causal Graph, P(S,YF, LC) = P(S) P(YF | S) P(LC | S) = f() All variables binary [0,1]:  = {1, 2,3,4,5, } P(S,YF, LC) = P(S) P(YF | S) P(LC | YF, S) = f() All variables binary [0,1]:  = {1, 2,3,4,5, 6,7, }

Tetrad Demo & Hands-On Use the DAG you built for Smoking, YF, and LC
Define the Bayes PM (# and values of categories for each variable) Attach a Bayes IM to the Bayes PM Fill in the Conditional Probability Tables (make the values plausible). Build 2 causal graphs - one for smoking, yf, lc

Structural Equation Models
Causal Graph Structural Equations For each variable X  V, an assignment equation: X := fX(immediate-causes(X), eX) 1. example of recursive structural equation model without correlated errors 2. can show that assumption of independence of errors guarantees correctness of probabilitic interpretation 3. this represents both probability and causality Exogenous Distribution: Joint distribution over the exogenous vars : P(e)

Linear Structural Equation Models
Path diagram Causal Graph Equations: Education := Education Income :=Educationincome Longevity :=EducationLongevity Exogenous Distribution: P(ed, Income,Income ) - i≠j ei  ej (pairwise independence) - no variance is zero 1. example of recursive structural equation model without correlated errors 2. can show that assumption of independence of errors guarantees correctness of probabilitic interpretation 3. this represents both probability and causality Structural Equation Model: V = BV + E E.g. (ed, Income,Income ) ~N(0,2) 2 diagonal, - no variance is zero

Standardized SEMs Attach a SEM PM to your 3-4 variable graph
Attach a SEM IM to the SEM PM Change the coefficient values. Attach a Standardized SEM IM to the SEM PM, or the SEM IM Compare the Implied Matrices Build 2 causal graphs - one for smoking, yf, lc

Estimation

Tetrad Demo and Hands-on
Build the standardized SEM IM shown below Generate simulated data N=1000 Estimate model. Save session as “Estimate1”

Estimation

Coefficient inference vs. Model Fit
Coefficient Inference: Null: coefficient = 0, e.g., bX1 X3 = 0 p-value = p(Estimated value bX1 X3 ≥ | bX1 X3 = 0 & rest of model correct) Reject null (coefficient is “significant”) when p-value < a, a usually = .05

Coefficient inference vs. Model Fit
Coefficient Inference: Null: coefficient = 0, e.g., bX1 X3 = 0 p-value = p(Estimated value bX1 X3 ≥ | bX1 X3 = 0 & rest of model correct) Reject null (coefficient is “significant”) when p-value < < a, a usually = .05, Model fit: Null: Model is correctly specified (constraints true in population) p-value = p(f(Deviation(Sml,S)) ≥ | Model correctly specified)

Create two DAGs with the same variables – each with one edge flipped, and attach a SEM PM to each new graph (copy and paste by selecting nodes, Ctl-C to copy, and then Ctl-V to paste) Estimate each new model on the data produced by original graph Check p-values of: Edge coefficients Model fit Save session as: “estimation2”

Charitable Giving What influences giving? Sympathy? Impact?
"The Donor is in the Details", Organizational Behavior and Human Decision Processes, Issue 1, 15-23, C. Cryder, with G. Loewenstein, R. Scheines. N = 94 TangibilityCondition [1,0] Randomly assigned experimental condition Imaginability [1..7] How concrete scenario I Sympathy [1..7] How much sympathy for target Impact [1..7] How much impact will my donation have AmountDonated [0..5] How much actually donated 1

Theoretical Hypothesis

Load charity.txt (tabular – not covariance data) Build graph of theoretical hypothesis Build SEM PM from graph Estimate PM, check results

Search Models  Data Representing/Modeling Causal Systems
Estimation and Model fit Hands on with Real Data Models  Data PC & FGES (causally sufficient case) FCI (allow latent confounding) Regression vs. Casual Search

Search Methods Scoring Searches Constraint Based Searches
PC, FCI Pointwise, but not uniformly consistent Scoring Searches FGES Scores: BIC, AIC, etc. Search: Hill Climb, Genetic Alg., Simulated Annealing Difficult to extend to latent variable models Meek and Chickering Greedy Equivalence Class (GES) Latent Variable Psychometric Model Search BPC, MIMbuild, etc. Linear non-Gaussian models (Lingam) Models with cycles Images, and more!!!

Tetrad Demo and Hands On

Go to “estimation2.tet” Add Search node (from Simulation1) - “forbid latent common causes” - choose “FGES” Add a “Graph” node to search result: choose “Dag in Pattern” Add a PM to Graph Estimate the PM on the data Compare model-fit to model fit for true model

Backround Knowledge Tetrad Demo and Hands-on
Create new session Select “Simulate from a given graph, and then Search” from Pipeline menu Build graph below, PM, IM, and generate sample data N=1,000. Execute PC search, a = .05

Backround Knowledge Tetrad Demo and Hands-on
Add “Knowledge” node Create “Tiers” as shown below. Execute PC search again, a = .05 Compare results (Search2) to previous search (Search1)

Backround Knowledge Direct and Indirect Consequences
True Graph PC Output Background Knowledge PC Output No Background Knowledge

Backround Knowledge Direct and Indirect Consequences
True Graph Direct Consequence Of Background Knowledge Indirect Consequence Of Background Knowledge PC Output Background Knowledge PC Output No Background Knowledge

College Plans (Search)
Load in College_plans_data Use file: College_plans_data Set Variables = discrete Enter Background Knowledge: Tier 1: Sex , SES Tier 2: IQ Tier 3: pe, cp Add search node Choose FCI (Allow Latent Common Causes) Add another search node Set Bootstrap = 100

Regression vs. Causal Search
Multiple Regression (X = putative cause , Y = putative effect) Identify all potential confounders Z, such that Z is prior to X Z is associated with X, and Z is associated with Y Regress Y on X, Z Coefficient on X is evidence of X  Y Causal Search: Find/compute all the causal models that are indistinguishable given background knowledge and data Represent features common to all such models Multiple Regression is the wrong tool for Causal Discovery

Regression vs. Causal Search

Foreign Investment Does Foreign Investment in 3rd World Countries inhibit Democracy? Timberlake, M. and Williams, K. (1984). Dependence, political exclusion, and government repression: Some cross-national evidence. American Sociological Review 49, N = 72 PO degree of political exclusivity CV lack of civil liberties EN energy consumption per capita (economic development) FI level of foreign investment 1

Foreign Investment Correlations po fi en cv po 1.0 fi -.175 1.0

Case Study: Foreign Investment
Regression Results po = *fi *en *cv SE (.058) (.059) (.060) t P Interpretation: foreign investment increases political repression 1

Case Study: Foreign Investment Alternative Models
There is no model with testable constraints (df > 0) that is not rejected by the data, in which FI has a positive effect on PO.

Charitable Giving (Search)
Load in charity data Add search node Enter Background Knowledge: Tangibility is exogenous Amount Donated is endogenous only Tangibility  Imaginability is required Choose and execute one of the “Pattern searches” Add a “Graph” node to search result: “choose Dag in Pattern” Add a PM to Graph Estimate the PM on the data Compare model-fit to hypothetical model

For More https://www.ccd.pitt.edu/

Workshop Files www.phil.cmu.edu/projects/tetrad_download/download/acis.2018.workshop.

Similar presentations

Presentation on theme: "Workshop Files www.phil.cmu.edu/projects/tetrad_download/download/acis.2018.workshop."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Workshop Files www.phil.cmu.edu/projects/tetrad_download/download/acis.2018.workshop.

Similar presentations

Presentation on theme: "Workshop Files www.phil.cmu.edu/projects/tetrad_download/download/acis.2018.workshop."— Presentation transcript:

Similar presentations

About project

Feedback