Causal Modeling for Anomaly Detection Andrew Arnold Machine Learning Department, Carnegie Mellon University Summer Project with Naoki Abe Predictive Modeling.

Slides:

Advertisements

Similar presentations

1 Learning Causal Structure from Observational and Experimental Data Richard Scheines Carnegie Mellon University.

Advertisements

A Tutorial on Learning with Bayesian Networks

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for 1 Lecture Notes for E Alpaydın 2010.

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.

Causal Data Mining Richard Scheines Dept. of Philosophy, Machine Learning, & Human-Computer Interaction Carnegie Mellon.

Topic Outline Motivation Representing/Modeling Causal Systems

Weakening the Causal Faithfulness Assumption

Bayesian Networks, Winter Yoav Haimovitch & Ariel Raviv 1.

Outline 1)Motivation 2)Representing/Modeling Causal Systems 3)Estimation and Updating 4)Model Search 5)Linear Latent Variable Models 6)Case Study: fMRI.

BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.

Structure Learning Using Causation Rules Raanan Yehezkel PAML Lab. Journal Club March 13, 2003.

Peter Spirtes, Jiji Zhang 1. Faithfulness comes in several flavors and is a kind of principle that selects simpler (in a certain sense) over more complicated.

Introduction of Probabilistic Reasoning and Bayesian Networks

Learning Causality Some slides are from Judea Pearl’s class lecture

1 Automatic Causal Discovery Richard Scheines Peter Spirtes, Clark Glymour Dept. of Philosophy & CALD Carnegie Mellon.

Using Markov Blankets for Causal Structure Learning Jean-Philippe Pellet Andre Ellisseeff Presented by Na Dai.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Civil and Environmental Engineering Carnegie Mellon University Sensors & Knowledge Discovery (a.k.a. Data Mining) H. Scott Matthews April 14, 2003.

1Causality & MDL Causal Models as Minimal Descriptions of Multivariate Systems Jan Lemeire June 15 th 2006.

Mediating Between Causes and Probabilities: the Use of Graphical Models in Econometrics Alessio Moneta Max Planck Institute of Economics, Jena, and Sant’Anna.

Temporal Causal Modeling with Graphical Granger Methods

Simulation and Application on learning gene causal relationships Xin Zhang.

Ambiguous Manipulations

Learning Equivalence Classes of Bayesian-Network Structures David M. Chickering Presented by Dmitry Zinenko.

1 gR2002 Peter Spirtes Carnegie Mellon University.

Bayesian Networks Alan Ritter.

Causal Models, Learning Algorithms and their Application to Performance Modeling Jan Lemeire Parallel Systems lab November 15 th 2006.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

1 Day 2: Search June 9, 2015 Carnegie Mellon University Center for Causal Discovery.

Bayes Net Perspectives on Causation and Causal Inference

1 Part 2 Automatically Identifying and Measuring Latent Variables for Causal Theorizing.

Social Network Analysis via Factor Graph Model

Constraint Based (CB) Approach - ‘PC algorithm’  CB algorithm that learns a structure from complete undirected graph and then "thins" it to its accurate.

A Brief Introduction to Graphical Models

Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:

1 Causal Data Mining Richard Scheines Dept. of Philosophy, Machine Learning, & Human-Computer Interaction Carnegie Mellon.

第十讲概率图模型导论 Chapter 10 Introduction to Probabilistic Graphical Models

Reverse engineering gene regulatory networks Dirk Husmeier Adriano Werhli Marco Grzegorczyk.

Nov. 13th, Causal Discovery Richard Scheines Peter Spirtes, Clark Glymour, and many others Dept. of Philosophy & CALD Carnegie Mellon.

Learning Linear Causal Models Oksana Kohutyuk ComS 673 Spring 2005 Department of Computer Science Iowa State University.

Penn State - March 23, The TETRAD Project: Computational Aids to Causal Discovery Peter Spirtes, Clark Glymour, Richard Scheines and many others.

Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.

Learning With Bayesian Networks Markus Kalisch ETH Zürich.

273 Discovery of Causal Structure Using Causal Probabilistic Network Induction AMIA 2003, Machine Learning Tutorial Constantin F. Aliferis & Ioannis Tsamardinos.

Computing & Information Sciences Kansas State University Data Sciences Summer Institute Multimodal Information Access and Synthesis Learning and Reasoning.

INTERVENTIONS AND INFERENCE / REASONING. Causal models  Recall from yesterday:  Represent relevance using graphs  Causal relevance ⇒ DAGs  Quantitative.

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.

Exploratory studies: you have empirical data and you want to know what sorts of causal models are consistent with it. Confirmatory tests: you have a causal.

Lecture 2: Statistical learning primer for biologists

The Visual Causality Analyst: An Interactive Interface for Causal Reasoning Jun Wang, Stony Brook University Klaus Mueller, Stony Brook University, SUNY.

Information Extraction Entity Extraction: Statistical Methods Sunita Sarawagi.

1Causal Performance Models Causal Models for Performance Analysis of Computer Systems Jan Lemeire TELE lab May 24 th 2006.

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

1 Day 2: Search June 9, 2015 Carnegie Mellon University Center for Causal Discovery.

An Algorithm to Learn the Structure of a Bayesian Network Çiğdem Gündüz Olcay Taner Yıldız Ethem Alpaydın Computer Engineering Taner Bilgiç Industrial.

1Causal Inference and the KMSS. Predicting the future with the right model Stijn Meganck Vrije Universiteit Brussel Department of Electronics and Informatics.

1 Day 2: Search June 14, 2016 Carnegie Mellon University Center for Causal Discovery.

Bayesian Decision Theory Introduction to Machine Learning (Chap 3), E. Alpaydin.

INTRODUCTION TO Machine Learning 2nd Edition

Markov Properties of Directed Acyclic Graphs

Center for Causal Discovery: Summer Short Course/Datathon

Causal Data Mining Richard Scheines

Causal Discovery Richard Scheines Peter Spirtes, Clark Glymour,

An Algorithm for Bayesian Network Construction from Data

Discriminative Probabilistic Models for Relational Data

Searching for Graphical Causal Models of Education Data

Using Bayesian Network in the Construction of a Bi-level Multi-classifier. A Case Study Using Intensive Care Unit Patients Data B. Sierra, N. Serrano,

Presentation transcript:

Causal Modeling for Anomaly Detection Andrew Arnold Machine Learning Department, Carnegie Mellon University Summer Project with Naoki Abe Predictive Modeling Group, IBM Rick Lawrence, Manager June 23, 2006

2 Contributions Consistent causal structure can be learned from passive observational data Anomalous examples have a quantitatively differentiable causal structure from normal ones Causal structure is a significant contribution to the standard analysis tools of independence and likelihood

3 Outline Motivation & Problem Causation Definition Causal Discovery Causal Comparisson Conclusions & Ongoing Work

4 Motivation Processors: –Detection: Is this wafer good or bad? –Causation: Why is this wafer bad? –Intervention: How can we fix the problem? Business: –Detection: Is this business functioning well or not? –Causation: Why is this business not functioning well? –Intervention: What can IBM do to improve performance?

5 Problem Interventions are expensive and flawed What can passively observed data tell us about the causal structure of a process?

6 Direct Causation X is a direct cause of Y relative to S, iff  z,x 1  x 2 P(Y | X set= x 1, Z set= z)  P(Y | X set= x 2, Z set= z) where Z = S - {X,Y} [Scheines (2005)] Asymmetric Intervene to set Z = z Not just observe Z = z

7 Causal Graphs Causal Directed Acyclic Graph G = {V,E} Each edge X  Y represents a direct causal claim: X is a direct cause of Y relative to V [Scheines (2005)]

8 Probabilistic Independence X and Y are independent iff  x 1  x 2 P(Y | X = x 1 ) = P(Y | X = x 2 ) X and Y are associated iff X and Y are not independent [Scheines (2005)]

9 Causal Structure Probabilistic Independence The Causal Markov Axiom Markov Condition In a Causal Graph: each variable V is independent of its non-effects, conditional on its direct causes. [Scheines (2005)]

10 Causal Structure  Statistical Data [Scheines (2005)]

11 Causal Structure  Statistical Data [Scheines (2005)]

12 Causal Structure  Statistical Data [Scheines (2005)]

13 Causal Discovery Statistical Data  Causal Structure Background Knowledge - Faithfulness - X 2 before X 3 - no unmeasured common causes Statistical Inference [Scheines (2005)]

14 Causal Discovery Algorithm PC algorithm [Spirtes et al., 2000] –Constraint-based search –Only need to know how to test conditional independence –Do not need to measure all causes –Asymptotically correct

15 PC algorithm Begin with the fully connected undirected graph For each pair of nodes, test their independence conditional on all subsets of their neighbors: –i.e., (X _||_ Y | Z)? If independent for any conditioning –remove edge, record subset conditioned upon If dependent for all conditionings –leave edge Orient edges, where possible

16 Independence Tests [Scheines (2005)]

17 Edge Orientation Rule 1: Colliders [Scheines (2005)]

18 More Orientation Rules: Rule 2: Avoid forming new colliders [Scheines (2005)]

19 More Orientation Rules: Rule 3: Avoid forming cycles If there is an undirected edge between X and Y And there is a directed path from X to Y –Then direct X-Y as X  Y Given: OK: BAD (cycle): X Y X Y X Y Z Z Z

20 Our Example Rule 2: Colliders Rule 3: No new V-structures Truth fully recovered [Scheines (2005)]

21 Patterns Often unable to orient all edges Use patterns [Pearl, 2000]: –Represents an equivalence set of DAG’s [Scheines (2005)]

22 How is this causation? Definition of causation Assumptions: –Faithfullness –No common unmeasured causes –Causal Markov condition

23 Results: Key Performance Indicators

24 Results: Key Performance Indicators

25 Causal Structure confirms existing beliefs and suggests new relationships

26 Results: Chip Fabrication

27 Temporal ordering is preserved

28 Using causal structure to explain anomalies Why is one wafer good, and another bad? –Separate data into classes –Form causal graphs on each class –Compare causal structures

29 Classification Support vector machine (SVM): –Max-margin classifier Finds hyperplane maximally separating data Chip data is readily separable: > 95% accuracy on labeled data

30 Form causal graphs Good Train Good Test Bad

31 How to compare? Similarity Score for graphs A and B over common nodes V : –Consider undirected edges as bi-directed –Of all the ordered pairs of variables (x, y) in V, with an arc x  y in either A or B In what percentage is there also x  y in the other graph i.e., (Adj A (x,y) || Adj B (x,y)) && (Adj A (x,y) == Adj B (x,y)) Difference Graph: –If there is an arc x  y in either A or B, but not in both, place the arc x  y in the difference graph –i.e., if (Adj A (x,y) != Adj B (x,y)) then Adj Diff (x,y) = True

32 Comparison Good TestGood Train 59% similar Difference Graph

33 Comparison BadGood Train 37% similar Difference Graph

34 Comparison BadGood Test 35% similar Difference Graph

35 Conclusions Consistent causal structure can be learned from passive observational data Anomalous examples have a quantitatively differentiable causal structure from normal ones Causal structure is a significant contribution to the standard analysis tools of independence and likelihood

36 Ongoing work Comparing to maximum likelihood and minimum description length techniques Looking at time-ordering –How do variables influence each other over time? Using one-class SVM to do clustering –Avoids need for labeled data Relaxing assumptions –Allow latent variables Evaluation is difficult without domain expert Using causal structure to help in clustering

37 References J. Pearl (2000). Causality: Models, Reasoning, and Inference, Cambridge Univ. Press R. Scheines, Causality Slides P. Spirtes, C. Glymour, and R. Scheines (2000). Causation, Prediction, and Search, 2 nd Edition (MIT Press) Thank You ¿ Questions ?