# Peter Spirtes, Jiji Zhang 1. Faithfulness comes in several flavors and is a kind of principle that selects simpler (in a certain sense) over more complicated.

## Presentation on theme: "Peter Spirtes, Jiji Zhang 1. Faithfulness comes in several flavors and is a kind of principle that selects simpler (in a certain sense) over more complicated."— Presentation transcript:

Peter Spirtes, Jiji Zhang 1

Faithfulness comes in several flavors and is a kind of principle that selects simpler (in a certain sense) over more complicated models. We show how to weaken the assumption of standard faithfulness so that it needs to be applied in fewer circumstances. We show how to weaken the assumption of strong (ε)- faithfulness) so that it does not prohibit the existence of weak edges. We show how to modify the causal search algorithms so that they make fewer mind changes as the sample size grows. 2

3 X Y Z W True Graph W = aZ + ε W Z = bX + cY + ε Z X = ε X Y = ε Y X Y Z W X Y Z W X Y Z W X Y Z W I P (W,X|Z) = 0 I P (W,Y|Z) = 0 I P (X,Y| ∅ ) = 0

S1. Form the complete undirected graph H on the given set of variables V. S2. For each pair of variables X and Y in V, search for a subset S of V\{X, Y} such that X and Y are independent conditional on S. Remove the edge between X and Y in H iff such a set is found. S3. Let K be the graph resulting from S2. For each unshielded triple (i.e., X and Y are adjacent, Y and Z are adjacent, but X and Z are not adjacent), if X and Z are independent conditional on some subset of V\{X, Y} that does not contain Y, then orient the triple as a collider: X  Y  Z. S4. Execute the entailed orientation rules. 4

Causal Markov Assumption: For a set of variables for which there are no unmeasured common causes, each variable is independent of its non-effects conditional on its direct causes. Non-obvious equivalent formulation: If I G (X,Y|Z) in causal DAG G with no unmeasured common causes then I P (X,Y|Z) = 0. If I P (X,Y|Z) = 0 then I G (X,Y|Z) in causal DAG G. Converse of Causal Markov Assumption. If I P (X,Y|Z) is a rational function of parameters, then violations are Lebesgue measure 0. 5

Reduction of Underdetmination If I(A,B| ∅  then prefer A → C ← B to A → C → B Computational Efficiency If A – C – B and I(A,B| ∅  then don’t need to check I(A,B|C  Statistical Efficiency The Markov equivalence class can be found without testing independence conditional on a set with more than maximum degree of any variable in the true causal graph. 6

If causal sufficiency, Causal Markov and Causal Faithfulness Assumptions, then there exist pointwise consistent estimators of Markov equivalence class SGS PC GES (Gaussian, multinomial) If just assume Causal Markov Assumption and causal sufficiency there are no pointwise consistent estimators of Markov Equivalence Class Gaussian Multinomial Unrestricted 7

If causal sufficiency, Causal Markov and Causal Faithfulness Assumptions, then no uniform consistent estimator of Markov Equivalence Class Gaussian Multinomial Unrestricted 8

(A4: ε-faithfulness) The partial correlations between X(i) and X( j) given {X(r); r  k} for some set k  {1,…,p n }\{i,j} are denoted by r n;i,j|k. Their absolute values are bounded from below and above: 9

10

Uhler et al.: (A4) tends to be violated fairly often, if the parameter values are assigned randomly, and ε is not very small. There are two ways to get very small partial correlations – almost cancellations and very weak edges. (A4) forbids both – it entails that there are no very weak edges. 11

X Y X Y Z Z Z Z W W W W 12

13 X Y I P (W,{X,Y}|Z) I P (W,{X,Y}| ∅ ) Z I P (X,Y| ∅ ) I P (W,Z| ∅ ) W Output Small Sample X Y I P (W,{X,Y}|Z) I P (W,{X,Y}| ∅ ) Z I P (X,Y| ∅ ) I P (W,Z| ∅ ) W Output Medium- Sample X Y I P (W,{X,Y}|{Z}) I P (W,{X,Y}| ∅ ) Z I P (X,Y| ∅ ) I P (W,Z| ∅ ) W Output Medium+ Sample X Y I P (W,{X,Y}|{Z}) I P (W,{X,Y}| ∅ ) Z I P (X,Y| ∅ ) I P (W,Z| ∅ ) W Output Large Sample

14 X Y I P (W,{X,Y}|Z) I P (W,{X,Y}| ∅ ) Z I P (X,Y| ∅ ) I P (W,Z| ∅ ) W Output Small Sample X Y I P (W,{X,Y}|Z) I P (W,{X,Y}| ∅ ) Z I P (X,Y| ∅ ) I P (W,Z| ∅ ) W Output Medium- Sample X Y I P (W,{X,Y}|{Z}) I P (W,{X,Y}| ∅ ) Z I P (X,Y| ∅ ) I P (W,Z| ∅ ) W Output Medium+ Sample X Y I P (W,{X,Y}|{Z}) I P (W,{X,Y}| ∅ ) Z I P (X,Y| ∅ ) I P (W,Z| ∅ ) W Output Large Sample

X → Y → Z → W X – Y – Z – W X – Y – Z → W I P (X,Z|Y) I P (X,Z|Y) I P (X,Z|Y) I P (Y,W|{X,Z)}I P (Y,W|{X,Z)} I P (Y,W|{X,Z)} I P (X,W| ∅ ) True Graph Small Sample Large Sample 15

X → Y → Z → W X – Y – Z – W X – Y – Z → W I P (X,Z|Y) I P (X,Z|Y) I P (X,Z|Y) I P (Y,W|{X,Z)}I P (Y,W|{X,Z)} I P (Y,W|{X,Z)} I P (X,W| ∅ ) True Graph Small Sample Large Sample 16

X Y Z W 17

18 X Y I P (W,{X,Y}|Z) I P (W,{X,Y}| ∅ ) Z I P (X,Y| ∅ ) I P (W,Z| ∅ ) W Output Small Sample X Y I P (W,{X,Y}|Z) I P (W,{X,Y}| ∅ ) Z I P (X,Y| ∅ ) I P (W,Z| ∅ ) W Output Medium- Sample X Y I P (W,{X,Y}|{Z}) I P (W,{X,Y}| ∅ ) Z I P (X,Y| ∅ ) I P (W,Z| ∅ ) W Output Medium+ Sample X Y I P (W,{X,Y}|{Z}) I P (W,{X,Y}| ∅ ) Z I P (X,Y| ∅ ) I P (W,Z| ∅ ) W Output Large Sample

S3*. Let K be the undirected graph resulting from S2. For each unshielded triple, If X and Z are not independent conditional on any subset of V\{X, Y} that contains Y, then orient the triple as a collider: X  Y  Z. If X and Z are not independent conditional on any subset of V\{X, Y} that does not contain Y, then mark the triple as a non-collider. Otherwise, mark the triple as ambiguous (or unfaithful). 19

Adjacency – If X – Y in the causal DAG then I P (X,Y|Z) ≠ 0 for any Z. 20

Triangle – For any three variables that form a triangle in causal DAG G If Z is a non-collider on the path, then X and Y are not independent conditional on any subset of V\{X, Y} that does not contain Z; If Z is a collider on the path, then X and Y are not independent conditional on any subset of V\{X, Y} that contains Z. Suppose X → Y ← Z and I P (X,Z|Y) = 0. This is faithful to X → Y → Z. This cannot be detected, so it must be assumed. 21

X ¬I(X,Z| ∅ ) Z¬I(X,Y|Z) Y¬I(Y,Z| ∅ ) X ¬I(X,Z| ∅ )¬I(X,Z|W)¬I(X,Z|Y,W) Z¬I(Y,Z| ∅ )¬I(Y,Z|W) ¬I(Y,Z|X,W) Y¬I(X,Y|Z) ¬I(X,Y|W) ¬I(X,Y|Z,W) W¬I(X,W| ∅ ) ¬I(X,W|Z)¬I(X,W|Y) ¬I(Y,W| ∅ )¬I(Y,W|X) ¬I(Y,W|Z) ¬I(Z,W| ∅ )¬I(Z,W|X) ¬I(Z,W|Y) 22

The population distribution is not Markov to any proper subDAG of the true causal DAG. Causal Minimality is entailed by manipulation definition of causation if a distribution is positive. There is a weaker kind of causal minimality – P- minimality: the population distribution is not Markov to any DAG that entails a proper superset of the conditional independence relations. Is this sufficient for the correctness of VCSGS? 23

X → Y → Z → W X – Y – Z – W X – Y – Z – W I P (X,Z|Y) I P (X,Z|Y) I P (X,Z|Y) I P (Y,W|{X,Z)}I P (Y,W|{X,Z)} I P (Y,W|{X,Z)} I P (X,W| ∅ ) True Graph Small Sample Large Sample 24

X → Y → Z → W X – Y – Z – W X – Y – Z → W I P (X,Z|Y) I P (X,Z|Y) I P (X,Z|Y) I P (Y,W|{X,Z)}I P (Y,W|{X,Z)} I P (Y,W|{X,Z)} I P (X,W| ∅ ) True Graph Small Sample Large Sample 25

X → Y → Z → W X – Y – Z – W X – Y – Z → W I P (X,Z|Y) I P (X,Z|Y) I P (X,Z|Y) I P (Y,W|{X,Z)}I P (Y,W|{X,Z)} I P (Y,W|{X,Z)} I P (X,W| ∅ ) True Graph Small Sample Large Sample 26

V1. Form the complete undirected graph H on the given set of variables V. V2. For each pair of variables X and Y in V, search for a subset S of V\{X, Y} such that X and Y are independent conditional on S. Remove the edge between X and Y in H and mark the pair as ‘apparently non-adjacent’, if and only if such a set is found. V3. Let K be the graph resulting from V2. For each apparently unshielded triple (i.e., X and Y are adjacent, Y and Z are adjacent, but X and Z are apparently non-adjacent), If X and Z are not independent conditional on any subset of V\{X, Y} that contains Y, then orient the triple as a collider: X  Y  Z. If X and Z are not independent conditional on any subset of V\{X, Y} that does not contain Y, then mark the triple as a non-collider. Otherwise, mark the triple as ambiguous (or unfaithful), and mark the pair as ‘definitely non-adjacent’. 27

V4. Execute the same orientation rules as in S4, until none of them applies. V5. Let M be the graph resulting from V4. For each consistent disambiguation of the ambiguous triples in M (i.e., each disambiguation that leads to a pattern), test whether each vertex V in the resulting pattern satisfies the Markov condition. If V and W satisfy the Markov condition in every pattern, then mark the ‘apparently non-adjacent’ pair as ‘definitely non-adjacent’. 28

If Triangle Faithfulness Assumption, Causal Minimality Assumption, and Causal Markov Assumption, then VCSGS is a consistent estimator of the extended Markov equivalence class. Is it complete? 30

V5*. Let M be the graph resulting from V4. For each consistent disambiguation of the ambiguous triples in M (i.e., each disambiguation that leads to a pattern), test whether each vertex V in the resulting pattern satisfies the Markov condition. If V and W satisfy the Markov condition in some pattern, then mark the ‘apparently non-adjacent’ pair as ‘definitely non-adjacent’. 31

Assumption NVV(J): Assumption UBC(C): 32

Given a set of variables V, suppose the true causal model over V is M =, where P is a Gaussian distribution over V, and G is a DAG with vertices V For any three variables X, Y, Z that form a triangle in G (i.e., each pair of vertices is adjacent), If Y is a non-collider on the path, then |r(X, Z|W)| ≥ k  |e M (X – Z)| for all W  V that do not contain Y; and If Y is a collider on the path, then |r(X, Z|W)| ≥ k  |e M (X – Z)| for all W  V that do contain Y. 33

S3* (sample version). Let K be the undirected graph resulting from the adjacency phase. For each unshielded triple, If there is a set W not containing Y such that the test of r(X, Z|W) = 0 returns 0 (i.e., accepts the hypothesis), and for every set U that contains Y, the test of |r(X,Z|U)| = 0 returns 1 (i.e., rejects the hypothesis), and the test of |r(X,Z|U) – r(X,Z|W)|  L returns 0 (i.e., accepts the hypothesis), then orient the triple as a collider: X  Y  Z. If there is a set W containing Y such that the test of r(X, Z|W) = 0 returns 0 (i.e., accepts the hypothesis), and for every set U that does not contain Y, the test of |r(X,Z|U)| = 0 returns 1 (i.e., rejects the hypothesis), and the test of |r(X,Z|U) – r(X,Z|W)|  L returns 0 (i.e., accepts the hypothesis), then mark the triple as a non-collider. Otherwise, mark the triple as ambiguous. 34

Say that CSGS(L, n, M) errs if it contains (i) an adjacency not in G M ; or (ii) a marked non-collider not in G M, or (iii) an orientation not in G M. Theorem: Given causal sufficiency of the measured variables V, the Causal Markov, k-Triangle- Faithfulness, NVV(J), and UBC(C) Assumptions, the CSGS algorithm is uniformly consistent in the sense that 35

For each vertex Z If every vertex not adjacent to Z is not confirmed to be non-adjacent to Z return ‘Unknown’ for every edge containing Z else For every non-adjacent pair in EP(G), let the estimate be 0 For each vertex Z such that all of the edges containing Z are oriented in EP(G), if Y is a parent of Z in EP(G), let the estimate be the sample regression coefficient of Y in the regression of Z on its parents in EP(G). 36

Let M 1 be an output of the Estimation Algorithm, and M 2 be a causal model. We define the structural coefficient distance, d[M 1,M 2 ], between M 1 and M 2 to be where by convention if = “Unknown”. 37

E1. Run the CSGS algorithm on an i.i.d. sample of size n from P M. E2. Let the output from E1 be CSGS(L, n, M). Apply step V5 in the VCSGS algorithm (from section 3), using tests of zero partial correlations and record which non- adjacencies are confirmed. E3. Apply the Estimation Algorithm to CSGS(L, n, M), the confirmed non-adjacencies, and the sample of size n. 38

Given causal sufficiency of the measured variables V, the Causal Markov, k-Triangle-Faithfulness, NVV(J), and UBC(C) Assumptions, the Edge Estimation I algorithm is uniformly consistent in the sense that for every  > 0 For a large enough and dense enough graph, this still allows for the possibility of large manipulation errors (due to many small edge errors. 39

40 X 1 X 2 X 3 1.0 0.011.0 0.78777810.6121571.0

41 if k > 0.014, then the k-Triangle-Faithfulness Assumption is violated for models M 2 and M 3, but not for M 1. If 0.008 < k < 0.014 then the k-Triangle-Faithfulness Assumption is violated for models M 3, but not for M 1 or M 2.

E1. Run Edge Estimation Algorithm I. E2. Set ForbiddenOrientations = {}. E3. For each maximal clique in CSGS(L, n, M) such that if a vertex in the clique is not adjacent to some vertex not in the clique, it is definitely non-adjacent (i) for each possible orientation O of all of the unoriented edges in the maximal clique Apply the orientation O to each of the unoriented edges. Apply Meeks’ orientation rules. If application of the rules produces a cycle or a new unshielded collider add O to ForbiddenOrientations Add O to ForbiddenOrientations if for any Y and W such that Y is a non-collider the path, and W  V and does contain Y 42

E4. For each unoriented edge X – Y in CSGS(L, n, M), if there is only one orientation X  Y that does not occur in ForbiddenOrientations, and every vertex that Y is not adjacent to, Y is definitely not adjacent to, orient as X  Y E5. For each vertex V such that some edge containing V in CSGS(L, n, M) is not oriented, if there is only one orientation of all of the edges containing V that is not in ForbiddenOrientations, and every vertex that V is not adjacent to, V is definitely not adjacent to, let the estimate of each edge equal be the sample regression coefficient of V on its parents in the non-forbidden orientation. 43

Theorem: Given causal sufficiency of the measured variables V, the Causal Markov, k-Triangle- Faithfulness, NVV(J), and UBC(C) Assumptions, the Edge Estimation II algorithm is uniformly consistent in the sense that for every  > 0 where O(L,n,M) is the graphical output of the Edge Estimation II algorithm, and is the output of the Edge Estimation II algorithm. 44

We weaken the assumption of faithfulness so that fewer inferences from conditional independence to d- separation need to be made. We strengthened the assumption so that it allows one to make inferences from “almost independence” in a probability distribution to d-separation in a causal graph, allowing for the existence of uniformly consistent estimation algorithms. 45

We changed the concept of correctness to allow for missing weak edges, and saying “don’t know” about some features of Markov equivalence classes. The new simplicity assumption broke up the Markov equivalence class in the sense that it considers some models in a Markov equivalence class simpler than other models in the same Markov equivalence class. This allowed for uniformly consistent estimates of linear coefficients in a causal model, as well as causal structure. 46

Can we get similar results for: PC FCI non-linear models increasing numbers of variables and vertex degree and decreasing k (analogous to Kalisch and Buhlmann)? If parameter values are randomly assigned, how often is k-triangle faithfulness violated as a function of sample size clique size parameter distribution k 47

Kalisch, M., and P. Bühlmann (2007). Estimating high- dimensional directed acyclic graphs with the PC- algorithm. Journal of Machine Learning Research 8, 613–636. Spirtes, P., and Zhang, J. (forthcoming) A Uniformly Consistent Estimator of Causal Effects Under The k- Triangle-Faithfulness Assumption, Statistical Science. Spirtes, P., and Zhang, J. (submitted) Three Faces of Faithfulness, Synthese. 48

Download ppt "Peter Spirtes, Jiji Zhang 1. Faithfulness comes in several flavors and is a kind of principle that selects simpler (in a certain sense) over more complicated."

Similar presentations