Presentation is loading. Please wait.

Presentation is loading. Please wait.

Structure Learning Using Causation Rules Raanan Yehezkel PAML Lab. Journal Club March 13, 2003.

Similar presentations


Presentation on theme: "Structure Learning Using Causation Rules Raanan Yehezkel PAML Lab. Journal Club March 13, 2003."— Presentation transcript:

1 Structure Learning Using Causation Rules Raanan Yehezkel PAML Lab. Journal Club March 13, 2003

2 Main References Pearl, J., Verma, T., A Theory of Inferred Causation, Proceedings of the Second International Conference of Representation and Reasoning, San Francisco Spirtes, P., Glymour, C., Scheines, R., Causation Prediction and Search, second edition, 2000, MIT Press.

3 Taken from Judea Pearl web-site Simpson’s “Paradox” The sure thing principle (Savage, 1954) Let a, b be two alternative acts of any sort, and let G be any event. If you would definitely prefer b to a, either knowing that the event G obtained, or knowing that the event G did not obtain, then you definitely prefer b to a.

4 Taken from Judea Pearl web-site New treatment is preferred for male group (G). New treatment is preferred for female group (G’). =>New treatment is preferred. Simpson’s “Paradox” Local Success Rate G = male patientsG’ = female patients Old5% (50/1000)50% (5000/10000) New10% (1000/10000)92% (95/100) Global Success Rate all patients Old46% (5050/11000) New11% (1095/10100)

5 Simpson’s “Paradox” Intuitive way of thinking: GT S P(S,G,T)=P(G)  P(T) · P(S|G,T) P(S=1 | G,T=new) = 0.51 P(S=1 | G,T=old) = 0.27

6 Simpson’s “Paradox” The faithful DAG: GT S P(S,G,T)=P(G) · P(T | G) · P(S | G,T) P(S=1 | G,T=new) = 0.11 P(S=1 | G,T=old) = 0.46

7 Assumptions: Directed Acyclic Graph, Bayesian Networks. All variables are observable. No errors in Conditional Independence test results.

8 Identifying cause and effect relations Statistical data. Statistical data and temporal information.

9 Identifying cause and effect relations Potential Cause Genuine Cause Spurious Association

10 Intransitive Triplet I(C 1,C 2 ) ~I(C 1,E) ~I(C 2,E) C1C1 C2C2 E H1H1 H2H2 C1C1 C2C2 E H1H1 H2H2 C1C1 C2C2 E

11 Potential Cause X has a potential causal influence on Y if: X and Y are dependent in every context. ~I(Z,Y|S context ) I(X,Z|S context ) X Y Z

12 Genuine Cause X has a genuine causal influence on Y if: Z is a potential cause of X. ~I(Z,Y|S context ) I(Z,Y|X,S context ) ZX Potential Y Given context S Given X and context S ZX Potential Y

13 Spurious Association X and Y are spuriously associated if: 1.~I(X,Y| S context ) 2.~I(Z 1,X|S context ) 3.~I(Z 2,Y|S context ) 4.I(Z 1,Y|S context ) 5.I(Z 2,X|S context ) Z1Z1 X Y From conditions 1,2,4 From conditions 1,3,5 Z2Z2 X Y

14 Genuine Cause with temporal information X has a genuine causal influence on Y if: Z and S context precedes X. ~I(Z,Y|S context ) I(Z,Y|X,S context ) Z Y Given context S Given X and context S Z X Y

15 Spurious Association with temporal information X and Y are spuriously associated if: 1.~I(X,Y|S) 2.X precedes Y. 3.I(Z,Y|S context ) 4.~I(Z,X|S context ) Z Y From conditions 1,2 X From conditions 1,3,4 X Y

16 Algorithms Inductive Causation (IC). PC. Other.

17 Pearl and Verma, 1991 For each pair of non-adjacent nodes (X,Y) with a common neighbor C, if C is not in S XY then add arrowheads to C: X  C  Y. For each pair (X,Y) find the set of nodes S XY such that I(X,Y|S XY ). If S XY is empty, place an undirected link between X and Y. For each pair of non-adjacent nodes (X,Y) with a common neighbor C, if C is not in S XY then add arrowheads to C: X  C  Y. Inductive Causation (IC)

18 Pearl and Verma, 1991 Recursively: 1. If X-Y and there is a strictly directed path from X to Y then add an arrowhead at Y. 2. If X and Y aren’t adjacent but X  C and there is Y-C then direct the link C  Y. Recursively: 1. If X-Y and there is a strictly directed path from X to Y then add an arrowhead at Y. 2. If X and Y aren’t adjacent but X  C and there is Y-C then direct the link C  Y. Mark uni-directed links X  Y if there is some link with an arrow head at X. Inductive Causation (IC) Mark uni-directed links X  Y if there is some link with an arrow head at X.

19 Example (IC) X1X1 X2X2 X3X3 X4X4 X5X5 True graph

20 Example (IC) X1X1 X2X2 X3X3 X4X4 X5X5 For each pair (X,Y) find the set of nodes S XY such that I(X,Y|S XY ). If S XY is empty, place an undirected link between X and Y.

21 Example (IC) X1X1 X2X2 X3X3 X4X4 X5X5 For each pair of non-adjacent nodes (X,Y) with a common neighbor C, if C is not in S XY then add arrowheads to C: X  C  Y

22 Example (IC) X1X1 X2X2 X3X3 X4X4 X5X5 Recursively: 1. If X-Y and there is a strictly directed path from X to Y then add an arrowhead at Y. 2. If X and Y aren’t adjacent but X  C and there is Y-C then direct the link C  Y.

23 Example (IC) X1X1 X2X2 X3X3 X4X4 X5X5 Mark uni-directed links X  Y if there is some link with an arrow head at X.

24 Spirtes and Glymour, Form a complete undirected graph C on vertex set V. PC

25 Spirtes and Glymour, n = 0; 3.Repeat Repeat Select an ordered pair X and Y such that: |Adj(C,X)\{Y}|  n, and a subset S such that: S  Adj(C,X)\{Y}, |S| = n if: I(X,Y|S) = true, then delete edge(X,Y) Until all possible sets were tested. n = n + 1. Until:  X,Y, |Adj(C,X)\{Y}| < n. 2.n = 0; 3.Repeat Repeat Select an ordered pair X and Y such that: |Adj(C,X)\{Y}|  n, and a subset S such that: S  Adj(C,X)\{Y}, |S| = n if: I(X,Y|S) = true, then delete edge(X,Y) Until all possible sets were tested. n = n + 1. Until:  X,Y, |Adj(C,X)\{Y}| < n. PC

26 Spirtes and Glymour, For each triple of vertices X, Y, Z, such that edge(X,Z) and edge(Y,Z), orient X  Z  Y, if and only if: Z  S XY PC 4.For each triple of vertices X, Y, Z, such that edge(X,Z) and edge(Y,Z), orient X  Z  Y, if and only if: Z  S XY

27 Pearl and Verma, 1991 Mark uni-directed links X  Y if there is some link with an arrow head at X. Recursively: 1. If X-Y and there is a strictly directed path from X to Y then add an arrowhead at Y. 2. If X and Y aren’t adjacent but X  C and there is Y-C then direct the link C  Y. Use Inductive Causation (IC)

28 Spirtes, Glymour and Scheines Example (PC) True graph X5X5 X2X2 X4X4 X1X1 X3X3

29 Example (PC) Form a complete undirected graph C on vertex set V. X5X5 X2X2 X4X4 X1X1 X3X3

30 Example (PC) n = 0;|S XY | = n Independencies: None X5X5 X2X2 X4X4 X1X1 X3X3

31 Example (PC) n = 1;|S XY | = n Independencies: I(X 1,X 3 |X 2 ) X5X5 X2X2 X4X4 X1X1 X3X3 I(X 1,X 4 |X 2 )I(X 1,X 5 |X 2 )I(X 3,X 4 |X 2 )

32 Example (PC) n = 2;|S XY | = n Independencies: I(X 2,X 5 |X 3,X 4 ) X5X5 X2X2 X4X4 X1X1 X3X3

33 Example (PC) For each triple of vertices X, Y, Z, such that edge(X,Z) and edge(Y,Z), orient X  Z  Y, if and only if: Z  S XY X5X5 X2X2 X4X4 X1X1 X3X3 D-Separation set: S 3,4 ={X 2 }S 1,3 = {X 2 }

34 PC* - tests conditional independence between X,Y given a subset S, where S  { [(Adj(X)  Adj(Y)]  path(X,Y) } CI test prioritization according to: for a given variable X, first test those variables Y that are least dependent on X, conditional on those subsets of variables that are most dependent on X. PC* - tests conditional independence between X,Y given a subset S, where S  { [(Adj(X)  Adj(Y)]  path(X,Y) } CI test prioritization according to: for a given variable X, first test those variables Y that are least dependent on X, conditional on those subsets of variables that are most dependent on X. Possible PC improvements (2)

35 Markov Equivalence (Verma and Pearl, 1990). Two casual models are equivalent if and only if their dags have the same links and same set of uncoupled head-to-head nodes (colliders). Z XY P=P(X)·P(Y)·P(Z|X,Y) Z XY Z XY P=P(Z)·P(X|Z)·P(Y|Z) = P(Y)·P(X|Z)·P(Z|Y)

36 Algorithms such as PC and IC produce a partially directed graphs, which represent a family of Markov equivalent graphs. The remaining undirected arcs can be oriented arbitrarily (under DAG restrictions), in order to construct a classifier. The main flaw of the IC and PC algorithms, is that they might be unstable in a noisy environment. An error in one CI test for an arc, might lead to an error in other arcs. And one erroneous orientation might lead to other erroneous orientations. Summery Algorithms such as PC and IC produce a partially directed graphs, which represent a family of Markov equivalent graphs. The remaining undirected arcs can be oriented arbitrarily (under DAG restrictions), in order to construct a classifier. The main flaw of the IC and PC algorithms, is that they might be unstable in a noisy environment. An error in one CI test for an arc, might lead to an error in other arcs. And one erroneous orientation might lead to other erroneous orientations.

37


Download ppt "Structure Learning Using Causation Rules Raanan Yehezkel PAML Lab. Journal Club March 13, 2003."

Similar presentations


Ads by Google