Inferring Causal Graphs

Inferring Causal Graphs
Computing 882 Simon Fraser University Spring 2002

Applications of Bayes Nets (I)
Windows Office “Paper Clip.” Bill Gates: “The competitive advantage of Microsoft lies in our expertise in Bayes Nets.” UBC Intelligent Tutoring System (ASI X-change).

Applications of Bayes Nets (II)
University Drop-outs: Search program Tetrad II says that higher SAT score would lead to lower drop-out rate. Carnegie Mellon uses this to reduce its drop-out rates. Tetrad II recalibrates Mass Spectrometer on earth satellite. Tetrad II predicts relation between corn exports and exchange rates.

Bayes Nets: Basic Definitions
Defn: A and B are independent iff P(A and B) = P(A) x P(B). Exercise: Prove that A and B are independent iff P(A|B) = P(A). Thus independence implies irrelevance.

Independence Among Variables
Let X,Y,Z be random variables. X is independent of Y iff P(X=x| Y=y) = P(X=x) for all x,y s.t. P(Y=y) > 0. X is independent of Y given Z iff P(X=x|Y=y,Z=z) = P(Z=z) for all y,z s.t. P(Y=y and Z=z) >0. Notation: (X Y|Z). Intuitively: given information Z, Y is irrelevant to X.

Axioms for Informational Relevance
Pearl (2000), p.11. It’s possible to read the  symbol as “irrelevant”. Then we can consider a number of axioms for  as axiomatizations of relevance, for example: Symmetry: if (X  Y|Z) then (Y  X|Z). Decomposition: if (X  YW|Z) then (X  Y|Z).

Markovian Parents In constructing a Bayes net, we look for “direct causes” – variables that “immediately determine” the value of another value. Such direct causes “screen off” other variables. Formally: Let an ordering of variables X1, …, Xn be given. Consider Xj. Let PA be any subset of X1,…,Xj-1. Suppose that P(Xj|PA) = P(Xj|X1,..,Xj) and that no subset of PA has this property. Then PA forms the Markovian parents of Xj. Mention Reichenbach 1956 for screening off. Note connection with Markov decision processes.

Markovian Parents and Bayes Nets
Given an ordering of variables, we can construct a causal graphs by drawing arrows between Markovian parents and children. Note that graphs are suitable for drawing the distinction between “direct” and “intermediate” causes. Exercise: For the variables in figure 1.2, construct a Bayes net in the given ordering. Exercise: Construct a Bayes net along the ordering (X5, X1, X3, X2, X4).

Independence in Bayes Nets
Note how useful irrelevance information is – think of a Prolog-style logical database. A typical problem: Given some information Z, and a query about X, is Y relevant to X? For Bayes nets, the d-separation criterion is a powerful answer.

d-separation In principle, information can flow along any path between two variables X and Y. Provisos: A path is blocked by any collider. Conditioning on a node reverses its status. Conditioning on non-collider makes it block. Conditioning on collider or its descendant makes it unblocked. Examples: In 1.2, X2 and x3 are d-separated by X1. (Check that this holds for other ordering as well). But X2 and X3 are not d-separated by {X1,X5}. Also look at figure 1.3.

d-separation characterizes independence
If X,Y d-separated by Z in a DAG G, then (X Y|Z) in all probability distributions compatible with G. If X,Y not d-separated by Z in a DAG G, then not [(X Y|Z) in all probability distributions compatible with G.].

Observational Equivalence
Suppose we can observe the probabilities of various occurrences (rain vs. umbrellas, smoking vs. lung cancer etc.). How does prob constrain graph? Two causal graphs G1,G2 are compatible with the same probs iff. G1 has the same adjacencies as G2 and the same v-structures (basically, colliders).

Observational Equivalence: Examples (I)
In sprinkler network, cannot tell whether X1 -> X2 or vice versa. But can tell that X2 -> X4 and X4 -> X5. General note: You cannot always tell in machine learning what the correct hypothesis is even if you have all possible data -> need more assumptions or other kinds of data.

Observational Equivalence: Examples (II)
Vancouver sun, March 29, “Adolescents …. Are more likely to turn to violence in their early twenties if they watch more than an hour of television a day… The team tracked more than 700 children and took into account the “chicken and egg” question: Does watching television cause aggression or do people prone to aggression watch more television?” [Science, Dr. Johnson, Columbia U.]

Two Models of Aggressive behaviour
Disposition to aggression TV watching Violent behavour Disposition to aggression TV watching Violent behavour Are these two graphs observationally distinguishable?

Minimal Graphs A graph G is minimal for a probability distribution P iff G is compatible with P, and no subgraph of G is compatible with P. Example: not minimal if A {B,C,D} A C D B

Note on minimality Intuitively, minimality requires that you add an edge between A and B only if there is some dependence between A and B. In statistical tests, dependence is observable but independence is not. So minimality amounts to “assume independence until dependence is observed”. That is exactly the strategy for minimizing mind changes! (“assume reaction is impossible until observed”).

Stable Distributions A distribution P is stable iff there is a graph G such that (X  Y |Z) in P iff X and Y are d-separated given Z in G. Intuitively, stability rules out “exact counterbalance”: two forces both having a causal effect but cancelling out each other exactly in every circumstance.

Inferring Causal Structure: The IC Algorithm
Assume a stable probability distribution P. Find a minimal graph for P with as many edges directed as possible. General idea: First find variables that are “directly causally related”. Connect those. Add arrows as far as possible.

Inferring Causal Structure: The IC Algorithm
For each pair of variables X and Y, look for a “screen off” set S(X,Y) s.t. X  Y| S(X,Y) holds. If there is no such set, add an undirected edge between X and Y. For each pair X,Y with a common neighbour Z, check if Z is part of a “screening off” set S(X,Y). If not, make Z a common consequence of X,Y. Orient edges without creating cycles or v-structures. Do edges in Sprinkler example.

Rules for Orientation Given a  b, b – c add b c if a,c are not linked (no new collider). Given a c b, a – b add a  b (no cycle). Given a – c d and c d  b and a – b add a  b if c,d are not linked (no cycle + no new collider). Given a – c b and a – d b and a – b add a  b if c,d are not linked (no cycle + no new collider).

Inferring Causal Graphs

Similar presentations

Presentation on theme: "Inferring Causal Graphs"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Inferring Causal Graphs

Similar presentations

Presentation on theme: "Inferring Causal Graphs"— Presentation transcript:

Similar presentations

About project

Feedback