Presentation is loading. Please wait.

Presentation is loading. Please wait.

Penn State - March 23, 20071 The TETRAD Project: Computational Aids to Causal Discovery Peter Spirtes, Clark Glymour, Richard Scheines and many others.

Similar presentations


Presentation on theme: "Penn State - March 23, 20071 The TETRAD Project: Computational Aids to Causal Discovery Peter Spirtes, Clark Glymour, Richard Scheines and many others."— Presentation transcript:

1 Penn State - March 23, 20071 The TETRAD Project: Computational Aids to Causal Discovery Peter Spirtes, Clark Glymour, Richard Scheines and many others Department of Philosophy Carnegie Mellon

2 Penn State - March 23, 20072 Agenda 1.Morning I: Theoretical Overview Representation, Axioms, Search 2.Morning II: Research Problems 3.Afternoon: TETRAD Demo - Workshop

3 Penn State - March 23, 20073 Part I: Agenda 1.Motivation 2.Representation 3.Connecting Causation to Probability (Independence) 4.Searching for Causal Models 5.Improving on Regression for Causal Inference

4 Penn State - March 23, 20074 1. Motivation Non-experimental Evidence Typical Predictive Questions Can we predict aggressiveness from the amount of violent TV watched Causal Questions: Does watching violent TV cause Aggression? I.e., if we intervene to change TV watching, will the level of Aggression change?

5 Penn State - March 23, 20075 Causal Estimation Manipulated Probability P(Y | X set= x, Z=z) from Unmanipulated Probability P(Y,X,Z) When and how can we use non-experimental data to tell us about the effect of an intervention?

6 Penn State - March 23, 20076 Spartina in the Cape Fear Estuary

7 Penn State - March 23, 20077 What FactorsDirectly Influence Spartina Growth in the Cape Fear Estuary? pH, salinity, sodium, phosphorus, magnesium, ammonia, zinc, potassium…, what? 14 variables for 45 samples of Spartina from Cape Fear Estuary. Biologist concluded salinity must be a factor. Bayes net analysis says only pH directly affects Spartina biomass Biologist’s subsequent greenhouse experiment says: if pH is controlled for, variations in salinity do not affect growth; but if salinity is controlled for, variations in pH do affect growth.

8 Penn State - March 23, 20078 2. Representation 1.Association & causal structure - qualitatively 2.Interventions 3.Statistical Causal Models 1.Bayes Networks 2.Structural Equation Models

9 Penn State - March 23, 20079 Causation & Association X is a cause of Y iff  x 1  x 2 P(Y | X set= x 1 )  P(Y | X set= x 2 ) Causation is asymmetric: X Y  Y X X and Y are associated (X _||_ Y) iff  x 1  x 2 P(Y | X = x 1 )  P(Y | X = x 2 ) Association is symmetric: X _||_ Y  Y _||_ X

10 Penn State - March 23, 200710 Direct Causation X is a direct cause of Y relative to S, iff  z,x 1  x 2 P(Y | X set= x 1, Z set= z)  P(Y | X set= x 2, Z set= z) where Z = S - {X,Y}

11 Penn State - March 23, 200711 Causal Graphs Causal Graph G = {V,E} Each edge X  Y represents a direct causal claim: X is a direct cause of Y relative to V Chicken Pox

12 Penn State - March 23, 200712 Causal Graphs Not Cause Complete Common Cause Complete

13 Penn State - March 23, 200713 Sweaters On Room Temperature Pre-experimental SystemPost Modeling Ideal Interventions Interventions on the Effect

14 Penn State - March 23, 200714 Modeling Ideal Interventions Sweaters On Room Temperature Pre-experimental SystemPost Interventions on the Cause

15 Penn State - March 23, 200715 Ideal Interventions & Causal Graphs Model an ideal intervention by adding an “intervention” variable outside the original system Erase all arrows pointing into the variable intervened upon Intervene to change Inf Post-intervention graph ? Pre-intervention graph

16 Penn State - March 23, 200716 Conditioning vs. Intervening P(Y | X = x 1 ) vs. P(Y | X set= x 1 ) Teeth Slides

17 Penn State - March 23, 200717 Causal Bayes Networks P(S = 0) =.7 P(S = 1) =.3 P(YF = 0 | S = 0) =.99P(LC = 0 | S = 0) =.95 P(YF = 1 | S = 0) =.01P(LC = 1 | S = 0) =.05 P(YF = 0 | S = 1) =.20P(LC = 0 | S = 1) =.80 P(YF = 1 | S = 1) =.80P(LC = 1 | S = 1) =.20 P(S,YF, L) = P(S) P(YF | S) P(LC | S) The Joint Distribution Factors According to the Causal Graph, i.e., for all X in V P(V) =  P(X|Immediate Causes of(X))

18 Penn State - March 23, 200718 Structural Equation Models 1. Structural Equations 2. Statistical Constraints Statistical Model Causal Graph

19 Penn State - March 23, 200719 Structural Equation Models  Structural Equations: One Equation for each variable V in the graph: V = f(parents(V), error V ) for SEM (linear regression) f is a linear function  Statistical Constraints: Joint Distribution over the Error terms Causal Graph

20 Penn State - March 23, 200720 Structural Equation Models Equations: Education =  ed Income =    Education  income Longevity =    Education  Longevity Statistical Constraints: (  ed,  Income,  Income ) ~N(0,  2 )  2  diagonal - no variance is zero Causal Graph SEM Graph (path diagram)

21 Penn State - March 23, 200721 3. Connecting Causation to Probability

22 Penn State - March 23, 200722 Causal Structure Statistical Predictions The Markov Condition Causal Graphs Independence X _||_ Z | Y i.e., P(X | Y) = P(X | Y, Z) Causal Markov Axiom

23 Penn State - March 23, 200723 Causal Markov Axiom If G is a causal graph, and P a probability distribution over the variables in G, then in P: every variable V is independent of its non-effects, conditional on its immediate causes.

24 Penn State - March 23, 200724 Causal Markov Condition Two Intuitions: 1) Immediate causes make effects independent of remote causes (Markov). 2) Common causes make their effects independent (Salmon).

25 Penn State - March 23, 200725 Causal Markov Condition 1) Immediate causes make effects independent of remote causes (Markov). E || S | I E = Exposure to Chicken Pox I = Infected S = Symptoms Markov Cond.

26 Penn State - March 23, 200726 Causal Markov Condition 2) Effects are independent conditional on their common causes. YF || LC | S Markov Cond.

27 Penn State - March 23, 200727 Causal Structure  Statistical Data

28 Penn State - March 23, 200728 Causal Markov Axiom In SEMs, d-separation follows from assuming independence among error terms that have no connection in the path diagram - i.e., assuming that the model is common cause complete.

29 Penn State - March 23, 200729 Causal Markov and D-Separation In acyclic graphs: equivalent Cyclic Linear SEMs with uncorrelated errors: D-separation correct Markov condition incorrect Cyclic Discrete Variable Bayes Nets: If equilibrium --> d-separation correct Markov incorrect

30 Penn State - March 23, 200730 D-separation: Conditioning vs. Intervening P(X 3 | X 2 )  P(X 3 | X 2, X 1 ) X 3 _||_ X 1 | X 2 P(X 3 | X 2 set= ) = P(X 3 | X 2 set=, X 1 ) X 3 _||_ X 1 | X 2 set=

31 Penn State - March 23, 200731 4. Search From Statistical Data to Probability to Causation

32 Penn State - March 23, 200732 Causal Discovery Statistical Data  Causal Structure Background Knowledge - X 2 before X 3 - no unmeasured common causes Statistical Inference

33 Penn State - March 23, 200733 Faithfulness

34 Penn State - March 23, 200734 Faithfulness Assumption Statistical Constraints arise from Causal Structure, not Coincidence All independence relations holding in a probability distribution P generated by a causal structure G are entailed by d- separation applied to G.

35 Penn State - March 23, 200735 Faithfulness Assumption Revenues = aRate + cEconomy +  Rev. Economy = bRate +  Econ. a  -bc

36 Penn State - March 23, 200736 Representations of D-separation Equivalence Classes We want the representations to: Characterize the Independence Relations Entailed by the Equivalence Class Represent causal features that are shared by every member of the equivalence class

37 Penn State - March 23, 200737 Patterns & PAGs Patterns (Verma and Pearl, 1990): graphical representation of an acyclic d-separation equivalence - no latent variables. PAGs: (Richardson 1994) graphical representation of an equivalence class including latent variable models and sample selection bias that are d-separation equivalent over a set of measured variables X

38 Penn State - March 23, 200738 Patterns

39 Penn State - March 23, 200739 Patterns: What the Edges Mean

40 Penn State - March 23, 200740 Patterns

41 Penn State - March 23, 200741 PAGs: Partial Ancestral Graphs What PAG edges mean.

42 Penn State - March 23, 200742 PAGs: Partial Ancestral Graphs

43 Penn State - March 23, 200743 Overview of Search Methods Constraint Based Searches TETRAD Scoring Searches Scores: BIC, AIC, etc. Search: Hill Climb, Genetic Alg., Simulated Annealing Difficult to extend to latent variable models Heckerman, Meek and Cooper (1999). “A Bayesian Approach to Causal Discovery” chp. 4 in Computation, Causation, and Discovery, ed. by Glymour and Cooper, MIT Press, pp. 141-166

44 Penn State - March 23, 200744 Search - Illustration

45 Penn State - March 23, 200745 Search: Adjacency

46 Penn State - March 23, 200746

47 Penn State - March 23, 200747 Search: Orientation in Patterns

48 Penn State - March 23, 200748 Search: Orientation X1 || X2 X1 || X4 | X3 X2 || X4 | X3 After Orientation Phase

49 Penn State - March 23, 200749 The theory of interventions, simplified Start with an graphical causal model, without feedback. Simplest Problem: To predict the probability distribution of other represented variables resulting from an intervention that forces a value x on a variable X, (e.g., everybody has to smoke) but does not otherwise alter the causal structure.

50 Penn State - March 23, 200750 First Thing Remember: The probability distribution for values of Y conditional on X = x is not in general the same as the probability distribution for values of Y on an intervention that sets X = x. Recent work by Waldemann gives evidence that adults are sensitive to the difference.

51 Penn State - March 23, 200751 Example XYZWXYZW Because X influences Y, the value of X gives information about the value of Y, and vice versa. X and Y are dependent in probability. But: An intervention that forces a value Y = y on Y, and otherwise does not disturb the system should not change the probability distribution for values of X. It should, necessarily, make the value of Y independent of X— informally, the value of Y should give no information about the value of X, and vice-versa.

52 Penn State - March 23, 200752 Representing a Simple Manipulation Observed Structure: Structure upon Manipulating Yellow Fingers:

53 Penn State - March 23, 200753 Intervention Calculations XYZWXYZW 1.Set Y = y 2.Do “surgery” on the graph: eliminate edges into Y 3.Use the Markov factorization of the resulting graph and probability distribution to compute the probability distribution for X, Z, W—various effective rules incorporated in what Pearl calls the “Do” calculus.

54 Penn State - March 23, 200754 Intervention Calculations X Y= yZW Original Markov Factorization Pr(X, Y, Z, W) = Pr(W | X,Z) Pr(Z | Y) Pr(Y | X) Pr(X) The Factorization After Intervention Pr(X, Y, W | Do(Y = y) = Pr(W | X,Z) Pr(Z | Y = y) Pr(X)

55 Penn State - March 23, 200755 What’s The Point? Pr(X, Y, W | Do(Y = y) = Pr(W | X,Z) Pr(Z | Y = y) Pr(X) The probability distribution on the left hand side is a prediction of the effects of an intervention. The probabilities on the right are all known before the intervention. So causal structure plus probabilities => prediction of intervention effects; provide a basis for planning.

56 Penn State - March 23, 200756 Surprising Results XYZWXYZW Suppose we know: the causal structure the joint probability for Y, Z, W only—not for X We CAN predict the effect on Z of an intervention on Y—even though Y and W are confounded by unobserved X.

57 Penn State - March 23, 200757 Surprising Result XYZWXYZW The effect of Z on W is confounded by the the probabilistic effect of Y on Z, X on Y and X on W. But the probabilistic effect on W of an intervention on Z CAN be computed from the probabilities before the intervention. How? By conditioning on Y.

58 Penn State - March 23, 200758 Surprising Result XYZWXYZW Pr(W, Y, X | Do(Z = z )) = Pr(W | Z =z, X) Pr(Y | X) Pr(X) (by surgery) Pr(W, X | Do(Z = z), Y) = Pr(W | Z =z, X) Pr(X | Y) (condition on Y) Pr(W | Do(Z = z), Y) =  x Pr(W | Z = z, X, Y) Pr(X | Y) (marginalize out X) Pr(W | Do(Z = z), Y) = Pr(W | Z = z, Y = y) (obscure probability theorem) Pr(W | Do(Z = z)) =  y Pr(W | Z = z, Y) Pr(Y) (marginalize out Y) The right hand side is composed entirely of observed probabilities.

59 Penn State - March 23, 200759 Pearl’s “Do” Calculus Provides rules that permit one to avoid the probability calculations we just went through—graphical properties determine whether effects of an intervention can be predicted.

60 Penn State - March 23, 200760 Applications Spartina Grass Parenting among Single, Black Mothers Pneumonia Photosynthesis Lead - IQ College Retention Corn Exports Rock Classification College Plans Political Exclusion Satellite Calibration Naval Readiness

61 Penn State - March 23, 200761 References Causation, Prediction, and Search, 2 nd Edition, (2000), by P. Spirtes, C. Glymour, and R. Scheines ( MIT Press) Computation, Causation, & Discovery (1999), edited by C. Glymour and G. Cooper, MIT Press Causality in Crisis?, (1997) V. McKim and S. Turner (eds.), Univ. of Notre Dame Press. TETRAD IV: www.phil.cmu.edu/projects/tetradwww.phil.cmu.edu/projects/tetrad Web Course on Causal and Statistical Reasoning : www.phil.cmu.edu/projects/csr/ www.phil.cmu.edu/projects/csr/ Causality Lab: www.phil.cmu.edu/projects/causality-labwww.phil.cmu.edu/projects/causality-lab


Download ppt "Penn State - March 23, 20071 The TETRAD Project: Computational Aids to Causal Discovery Peter Spirtes, Clark Glymour, Richard Scheines and many others."

Similar presentations


Ads by Google