Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 6: Causal Discovery Isabelle Guyon

Similar presentations


Presentation on theme: "Lecture 6: Causal Discovery Isabelle Guyon"— Presentation transcript:

1 Lecture 6: Causal Discovery Isabelle Guyon isabelle@clopinet.com

2 Causal discovery Which actions will have beneficial effects? …your health? …climate changes? … the economy? What affects…

3 What is causality? Many definitions: –Science –Philosophy –Law –Psychology –History –Religion –Engineering “Cause is the effect concealed, effect is the cause revealed” (Hindu philosophy)

4 The system Systemic causality External agent

5 Difficulty A lot of “observational” data. Correlation  Causality! Experiments are often needed, but: –Costly –Unethical –Infeasible

6 Formalism: Causal Bayesian networks Bayesian network: –Graph with random variables X 1, X 2, …X n as nodes. –Dependencies represented by edges. –Allow us to compute P(X 1, X 2, …X n ) as  i P( X i | Parents(X i ) ). –Edge directions have no meaning. Causal Bayesian network: egde directions indicate causality.

7 Causal discovery from “observational data” Example algorithm: PC (Peter Spirtes and Clarck Glymour, 1999) Let A, B, C  X and V  X. Initialize with a fully connected un-oriented graph. 1. Conditional independence. Cut connection if  V s.t. (A  B | V). 2. Colliders. In triplets A — C — B (A — B) if there is no subset V containing C s.t. A  B | V, orient edges as: A  C  B. 3. Constraint-propagation. Orient edges until no change: (i) If A  B  …  C, and A — C then A  C. (ii) If A  B — C then B  C.

8 Computational and statistical complexity Computing the full causal graph poses: Computational challenges (intractable for large numbers of variables) Statistical challenges (difficulty of estimation of conditional probabilities for many var. w. few samples). Compromise: Develop algorithms with good average- case performance, tractable for many real-life datasets. Abandon learning the full causal graph and instead develop methods that learn a local neighborhood. Abandon learning the fully oriented causal graph and instead develop methods that learn unoriented graphs.

9 Target Y A prototypical MB algo: HITON Aliferis-Tsamardinos-Statnikov, 2003)

10 Target Y 1 – Identify variables with direct edges to the target (parent/children) Aliferis-Tsamardinos-Statnikov, 2003)

11 Target Y Aliferis-Tsamardinos-Statnikov, 2003) 1 – Identify variables with direct edges to the target (parent/children) A B Iteration 1: add A Iteration 2: add B Iteration 3: remove B because A  Y | B etc. A A B B

12 Target Y Aliferis-Tsamardinos-Statnikov, 2003) 2 – Repeat algorithm for parents and children of Y (get depth two relatives)

13 Target Y Aliferis-Tsamardinos-Statnikov, 2003) 3 – Remove non-members of the MB A member A of PCPC that is not in PC is a member of the Markov Blanket if there is some member of PC B, such that A becomes conditionally dependent with Y conditioned on any subset of the remaining variables and B. A B

14 Causality workbench

15 Our approach What is the causal question? Why should we care? What is hard about it? Is this solvable? Is this a good benchmark?

16 Four tasks Toy datasets Challenge datasets

17 On-line feed-back

18 Toy Examples

19 Lung Cancer SmokingGenetics Coughing Attention Disorder Allergy AnxietyPeer Pressure Yellow Fingers Car Accident Born an Even Day Fatigue LUCAS 0 : natural Causality assessment with manipulations

20 LUCAS 1 : manipulated Lung Cancer Smoking Genetics Coughing Attention Disorder Allergy AnxietyPeer Pressure Yellow Fingers Car Accident Born an Even Day Fatigue Causality assessment with manipulations

21 Lung Cancer SmokingGenetics Coughing Attention Disorder Allergy AnxietyPeer Pressure Yellow Fingers Car Accident Born an Even Day Fatigue LUCAS 2 : manipulated Causality assessment with manipulations

22 Goal driven causality 0 9 4 11 6 1 102 3 7 5 8 We define: V=variables of interest (e.g. MB, direct causes,...) We assess causal relevance: Fscore=f(V,S). 4 11 2 3 1 Participants return: S=selected subset (ordered or not).

23 Causality assessment without manipulation?

24 Using artificial “probes” Lung Cancer SmokingGenetics Coughing Attention Disorder Allergy AnxietyPeer Pressure Yellow Fingers Car Accident Born an Even Day Fatigue LUCAP 0 : natural Probes P1P1 P2P2 P3P3 PTPT

25 Lung Cancer SmokingGenetics Coughing Attention Disorder Allergy AnxietyPeer Pressure Yellow Fingers Car Accident Born an Even Day Fatigue LUCAP 0 : natural Probes P1P1 P2P2 P3P3 PTPT Using artificial “probes”

26 Probes Lung Cancer SmokingGenetics Coughing Attention Disorder Allergy AnxietyPeer Pressure Yellow Fingers Car Accident Born an Even Day Fatigue P1P1 P2P2 P3P3 PTPT LUCAP 1&2 : manipulated Using artificial “probes”

27 Scoring using “probes” What we can compute (Fscore): –Negative class = probes (here, all “non-causes”, all manipulated). –Positive class = other variables (may include causes and non causes). What we want (Rscore): –Positive class = causes. –Negative class = non-causes. What we get (asymptotically): Fscore = (N TruePos /N Real ) Rscore + 0.5 (N TrueNeg /N Real )

28 AUC distribution

29 Top ranking methods According to the rules of the challenge: –Yin Wen Chang: SVM => best prediction accuracy on REGED and CINA. –Gavin Cawley: Causal explorer + linear ridge regression ensembles => best prediction accuracy on SIDO and MARTI. According to pairwise comparisons: –Jianxin Yin and Prof. Zhi Geng’s group: Partial Orientation and Local Structural Learning => best on Pareto front, new original causal discovery algorithm.

30 Pairwise comparisons

31 Causal vs. non-causal Jianxin Yin: causal Vladimir Nikulin: non-causal

32 Insensitivity to irrelevant features Simple univariate predictive model, binary target and features, all relevant features correlate perfectly with the target, all irrelevant features randomly drawn. With 98% confidence, abs(feat_weight) < w and  i w i x i < v. n g number of “ good ” (relevant) features n b number of “ bad ” (irrelevant) features m number of training examples.

33 Conclusion Causal discovery from observational data is not an impossible task, but a very hard one. This points to the need for further research and benchmark. Don’t miss the “pot-luck challenge” http://clopinet.com/causality

34 1) Causal Feature Selection I. Guyon, C. Aliferis, A. Elisseeff In “Computational Methods of Feature Selection”, Huan Liu and Hiroshi Motoda Eds., Chapman and Hall/CRC Press, 2007. 2) Design and Analysis of the Causation and Prediction Challenge I. Guyon, C. Aliferis, G. Cooper, A. Elisseeff, J.-P. Pellet, P. Spirtes, A. Statnikov, JMLR workshop proceedings, in press. http://clopinet.com/causality


Download ppt "Lecture 6: Causal Discovery Isabelle Guyon"

Similar presentations


Ads by Google