Lecture 6: Causal Discovery Isabelle Guyon

Slides:

Advertisements

Similar presentations

Alexander Statnikov1, Douglas Hardin1,2, Constantin Aliferis1,3

Advertisements

Markov Networks Alan Ritter.

Topic Outline Motivation Representing/Modeling Causal Systems

1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.

Dynamic Bayesian Networks (DBNs)

Can causal models be evaluated? Isabelle Guyon ClopiNet / ChaLearn

Lecture 5: Causality and Feature Selection Isabelle Guyon

EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.

The bumpy road of the search for a (good) cause

Causality Workbenchclopinet.com/causality Results of the Causality Challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt.

Causality Workbenchclopinet.com/causality The LOCANET task (Pot-luck challenge, NIPS 2008) Isabelle Guyon, Clopinet Alexander Statnikov, Vanderbilt Univ.

Bayes Nets Rong Jin. Hidden Markov Model  Inferring from observations (o i ) to hidden variables (q i )  This is a general framework for representing.

Causality challenge workshop (IEEE WCCI) June 2, Slide 1 Bernoulli Mixture Models for Markov Blanket Filtering and Classification Mehreen Saeed Department.

Feature selection methods from correlation to causality Isabelle Guyon NIPS 2008 workshop on kernel learning.

Review: The Logic Underlying ANOVA The possible pair-wise comparisons: X 11 X 12. X 1n X 21 X 22. X 2n Sample 1Sample 2 means: X 31 X 32. X 3n Sample 3.

CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.

Computer vision: models, learning and inference Chapter 10 Graphical Models.

Causal Modeling for Anomaly Detection Andrew Arnold Machine Learning Department, Carnegie Mellon University Summer Project with Naoki Abe Predictive Modeling.

Feature selection and causal discovery fundamentals and applications Isabelle Guyon

Ensemble Learning (2), Tree and Forest

1 Day 2: Search June 9, 2015 Carnegie Mellon University Center for Causal Discovery.

Bayes Net Perspectives on Causation and Causal Inference

1 Patch Complexity, Finite Pixel Correlations and Optimal Denoising Anat Levin, Boaz Nadler, Fredo Durand and Bill Freeman Weizmann Institute, MIT CSAIL.

Causality Workbenchclopinet.com/causality Cause-Effect Pair Challenge Isabelle Guyon, ChaLearn IJCNN 2013 IEEE/INNS.

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

Feature Selection and Causal discovery Isabelle Guyon, Clopinet André Elisseeff, IBM Zürich Constantin Aliferis, Vanderbilt University.

315 Feature Selection. 316 Goals –What is Feature Selection for classification? –Why feature selection is important? –What is the filter and what is the.

Bayesian Networks Martin Bachler MLA - VO

Causality challenge #2: Pot-Luck

Random Sets Approach and its Applications Basic iterative feature selection, and modifications. Tests for independence & trimmings (similar to HITON algorithm).

Introduction Osborn. Daubert is a benchmark!!!: Daubert (1993)- Judges are the “gatekeepers” of scientific evidence. Must determine if the science is.

V13: Causality Aims: (1) understand the causal relationships between the variables of a network (2) interpret a Bayesian network as a causal model whose.

Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.

Lecture 5: Causality and Feature Selection Isabelle Guyon

Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.

Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.

Learning With Bayesian Networks Markus Kalisch ETH Zürich.

273 Discovery of Causal Structure Using Causal Probabilistic Network Induction AMIA 2003, Machine Learning Tutorial Constantin F. Aliferis & Ioannis Tsamardinos.

Evolutionary Algorithms for Finding Optimal Gene Sets in Micro array Prediction. J. M. Deutsch Presented by: Shruti Sharma.

1 CMSC 671 Fall 2001 Class #21 – Tuesday, November 13.

INTERVENTIONS AND INFERENCE / REASONING. Causal models  Recall from yesterday:  Represent relevance using graphs  Causal relevance ⇒ DAGs  Quantitative.

Competitions in machine learning: the fun, the art, and the science Isabelle Guyon Clopinet, Berkeley, California

Slides for “Data Mining” by I. H. Witten and E. Frank.

The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)

Dependency Networks for Collaborative Filtering and Data Visualization UAI-2000 발표 : 황규백.

Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André.

1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,

276 Causal Discovery Methods Using Causal Probabilistic Networks MEDINFO 2004, T02: Machine Learning Methods for Decision Support and Discovery Constantin.

Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks Authors: Pegna, J.M., Lozano, J.A., Larragnaga, P., and Inza, I. In.

Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.

04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.

Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:

To Personalize or Not to Personalize: Modeling Queries with Variation in User Intent Presented by Jaime Teevan, Susan T. Dumais, Daniel J. Liebling Microsoft.

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

Introduction Many problems in Engineering, Management, Health Sciences and other Sciences involve exploring the relationships between two or more variables.

Multiple Imputation using SOLAS for Missing Data Analysis

Introduction to translational and clinical bioinformatics Connecting complex molecular information to clinically relevant decisions using molecular.

COMP61011 Foundations of Machine Learning Feature Selection

Lecture 3: Causality and Feature Selection

Markov Properties of Directed Acyclic Graphs

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18

Feature Selection Ioannis Tsamardinos Machine Learning Course, 2006

Causal Data Mining Richard Scheines

An Algorithm for Bayesian Network Construction from Data

Class #19 – Tuesday, November 3

CS 416 Artificial Intelligence

Searching for Graphical Causal Models of Education Data

Presentation transcript:

Lecture 6: Causal Discovery Isabelle Guyon

Causal discovery Which actions will have beneficial effects? …your health? …climate changes? … the economy? What affects…

What is causality? Many definitions: –Science –Philosophy –Law –Psychology –History –Religion –Engineering “Cause is the effect concealed, effect is the cause revealed” (Hindu philosophy)

The system Systemic causality External agent

Difficulty A lot of “observational” data. Correlation  Causality! Experiments are often needed, but: –Costly –Unethical –Infeasible

Formalism: Causal Bayesian networks Bayesian network: –Graph with random variables X 1, X 2, …X n as nodes. –Dependencies represented by edges. –Allow us to compute P(X 1, X 2, …X n ) as  i P( X i | Parents(X i ) ). –Edge directions have no meaning. Causal Bayesian network: egde directions indicate causality.

Causal discovery from “observational data” Example algorithm: PC (Peter Spirtes and Clarck Glymour, 1999) Let A, B, C  X and V  X. Initialize with a fully connected un-oriented graph. 1. Conditional independence. Cut connection if  V s.t. (A  B | V). 2. Colliders. In triplets A — C — B (A — B) if there is no subset V containing C s.t. A  B | V, orient edges as: A  C  B. 3. Constraint-propagation. Orient edges until no change: (i) If A  B  …  C, and A — C then A  C. (ii) If A  B — C then B  C.

Computational and statistical complexity Computing the full causal graph poses: Computational challenges (intractable for large numbers of variables) Statistical challenges (difficulty of estimation of conditional probabilities for many var. w. few samples). Compromise: Develop algorithms with good average- case performance, tractable for many real-life datasets. Abandon learning the full causal graph and instead develop methods that learn a local neighborhood. Abandon learning the fully oriented causal graph and instead develop methods that learn unoriented graphs.

Target Y A prototypical MB algo: HITON Aliferis-Tsamardinos-Statnikov, 2003)

Target Y 1 – Identify variables with direct edges to the target (parent/children) Aliferis-Tsamardinos-Statnikov, 2003)

Target Y Aliferis-Tsamardinos-Statnikov, 2003) 1 – Identify variables with direct edges to the target (parent/children) A B Iteration 1: add A Iteration 2: add B Iteration 3: remove B because A  Y | B etc. A A B B

Target Y Aliferis-Tsamardinos-Statnikov, 2003) 2 – Repeat algorithm for parents and children of Y (get depth two relatives)

Target Y Aliferis-Tsamardinos-Statnikov, 2003) 3 – Remove non-members of the MB A member A of PCPC that is not in PC is a member of the Markov Blanket if there is some member of PC B, such that A becomes conditionally dependent with Y conditioned on any subset of the remaining variables and B. A B

Causality workbench

Our approach What is the causal question? Why should we care? What is hard about it? Is this solvable? Is this a good benchmark?

Four tasks Toy datasets Challenge datasets

On-line feed-back

Toy Examples

Lung Cancer SmokingGenetics Coughing Attention Disorder Allergy AnxietyPeer Pressure Yellow Fingers Car Accident Born an Even Day Fatigue LUCAS 0 : natural Causality assessment with manipulations

LUCAS 1 : manipulated Lung Cancer Smoking Genetics Coughing Attention Disorder Allergy AnxietyPeer Pressure Yellow Fingers Car Accident Born an Even Day Fatigue Causality assessment with manipulations

Lung Cancer SmokingGenetics Coughing Attention Disorder Allergy AnxietyPeer Pressure Yellow Fingers Car Accident Born an Even Day Fatigue LUCAS 2 : manipulated Causality assessment with manipulations

Goal driven causality We define: V=variables of interest (e.g. MB, direct causes,...) We assess causal relevance: Fscore=f(V,S) Participants return: S=selected subset (ordered or not).

Causality assessment without manipulation?

Using artificial “probes” Lung Cancer SmokingGenetics Coughing Attention Disorder Allergy AnxietyPeer Pressure Yellow Fingers Car Accident Born an Even Day Fatigue LUCAP 0 : natural Probes P1P1 P2P2 P3P3 PTPT

Lung Cancer SmokingGenetics Coughing Attention Disorder Allergy AnxietyPeer Pressure Yellow Fingers Car Accident Born an Even Day Fatigue LUCAP 0 : natural Probes P1P1 P2P2 P3P3 PTPT Using artificial “probes”

Probes Lung Cancer SmokingGenetics Coughing Attention Disorder Allergy AnxietyPeer Pressure Yellow Fingers Car Accident Born an Even Day Fatigue P1P1 P2P2 P3P3 PTPT LUCAP 1&2 : manipulated Using artificial “probes”

Scoring using “probes” What we can compute (Fscore): –Negative class = probes (here, all “non-causes”, all manipulated). –Positive class = other variables (may include causes and non causes). What we want (Rscore): –Positive class = causes. –Negative class = non-causes. What we get (asymptotically): Fscore = (N TruePos /N Real ) Rscore (N TrueNeg /N Real )

AUC distribution

Top ranking methods According to the rules of the challenge: –Yin Wen Chang: SVM => best prediction accuracy on REGED and CINA. –Gavin Cawley: Causal explorer + linear ridge regression ensembles => best prediction accuracy on SIDO and MARTI. According to pairwise comparisons: –Jianxin Yin and Prof. Zhi Geng’s group: Partial Orientation and Local Structural Learning => best on Pareto front, new original causal discovery algorithm.

Pairwise comparisons

Causal vs. non-causal Jianxin Yin: causal Vladimir Nikulin: non-causal

Insensitivity to irrelevant features Simple univariate predictive model, binary target and features, all relevant features correlate perfectly with the target, all irrelevant features randomly drawn. With 98% confidence, abs(feat_weight) < w and  i w i x i < v. n g number of “ good ” (relevant) features n b number of “ bad ” (irrelevant) features m number of training examples.

Conclusion Causal discovery from observational data is not an impossible task, but a very hard one. This points to the need for further research and benchmark. Don’t miss the “pot-luck challenge”

1) Causal Feature Selection I. Guyon, C. Aliferis, A. Elisseeff In “Computational Methods of Feature Selection”, Huan Liu and Hiroshi Motoda Eds., Chapman and Hall/CRC Press, ) Design and Analysis of the Causation and Prediction Challenge I. Guyon, C. Aliferis, G. Cooper, A. Elisseeff, J.-P. Pellet, P. Spirtes, A. Statnikov, JMLR workshop proceedings, in press.