Learning Causality Some slides are from Judea Pearl’s class lecture

Slides:

Advertisements

Similar presentations

Causal reasoning in Biomedical Informatics

Advertisements

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for 1 Lecture Notes for E Alpaydın 2010.

CS498-EA Reasoning in AI Lecture #15 Instructor: Eyal Amir Fall Semester 2011.

Weakening the Causal Faithfulness Assumption

Bayesian Networks, Winter Yoav Haimovitch & Ariel Raviv 1.

Outline 1)Motivation 2)Representing/Modeling Causal Systems 3)Estimation and Updating 4)Model Search 5)Linear Latent Variable Models 6)Case Study: fMRI.

Lauritzen-Spiegelhalter Algorithm

BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.

Structure Learning Using Causation Rules Raanan Yehezkel PAML Lab. Journal Club March 13, 2003.

Exact Inference in Bayes Nets

Identifying Conditional Independencies in Bayes Nets Lecture 4.

Representing Relations Using Matrices

Introduction of Probabilistic Reasoning and Bayesian Networks

EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.

Causal Networks Denny Borsboom. Overview The causal relation Causality and conditional independence Causal networks Blocking and d-separation Excercise.

From: Probabilistic Methods for Bioinformatics - With an Introduction to Bayesian Networks By: Rich Neapolitan.

Chapter 8-3 Markov Random Fields 1. Topics 1. Introduction 1. Undirected Graphical Models 2. Terminology 2. Conditional Independence 3. Factorization.

Causal Modelling and Path Analysis. Some Notes on Causal Modelling and Path Analysis. “Path analysis is... superior to ordinary regression analysis since.

1 CS 201 Compiler Construction Lecture 12 Global Register Allocation.

PGM 2003/04 Tirgul 3-4 The Bayesian Network Representation.

Bayesian networks Chapter 14 Section 1 – 2.

Bayesian Belief Networks

CSE 571 Advanced Artificial Intelligence Nov 24, 2003 Class Notes Transcribed By: Jon Lammers.

Bayesian Network Representation Continued

Graphical Models Lei Tang. Review of Graphical Models Directed Graph (DAG, Bayesian Network, Belief Network) Typically used to represent causal relationship.

Simulation and Application on learning gene causal relationships Xin Zhang.

Learning Equivalence Classes of Bayesian-Network Structures David M. Chickering Presented by Dmitry Zinenko.

Inferring Causal Graphs Computing 882 Simon Fraser University Spring 2002.

Bayesian Networks Alan Ritter.

Computer vision: models, learning and inference Chapter 10 Graphical Models.

. DAGs, I-Maps, Factorization, d-Separation, Minimal I-Maps, Bayesian Networks Slides by Nir Friedman.

Causal Modeling for Anomaly Detection Andrew Arnold Machine Learning Department, Carnegie Mellon University Summer Project with Naoki Abe Predictive Modeling.

1 Day 2: Search June 9, 2015 Carnegie Mellon University Center for Causal Discovery.

Bayes Net Perspectives on Causation and Causal Inference

Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.

Bayes’ Nets  A Bayes’ net is an efficient encoding of a probabilistic model of a domain  Questions we can ask:  Inference: given a fixed BN, what is.

A Brief Introduction to Graphical Models

CSC2535 Spring 2013 Lecture 1: Introduction to Machine Learning and Graphical Models Geoffrey Hinton.

CSC 331: Algorithm Analysis Decompositions of Graphs.

Bayesian Learning By Porchelvi Vijayakumar. Cognitive Science Current Problem: How do children learn and how do they get it right?

2 Syntax of Bayesian networks Semantics of Bayesian networks Efficient representation of conditional distributions Exact inference by enumeration Exact.

1 Monte Carlo Artificial Intelligence: Bayesian Networks.

Learning Linear Causal Models Oksana Kohutyuk ComS 673 Spring 2005 Department of Computer Science Iowa State University.

Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,

Exploratory studies: you have empirical data and you want to know what sorts of causal models are consistent with it. Confirmatory tests: you have a causal.

Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:

1 BN Semantics 2 – The revenge of d-separation Graphical Models – Carlos Guestrin Carnegie Mellon University September 20 th, 2006 Readings: K&F:

Christopher M. Bishop, Pattern Recognition and Machine Learning 1.

1 BN Semantics 2 – Representation Theorem The revenge of d-separation Graphical Models – Carlos Guestrin Carnegie Mellon University September 17.

Today Graphical Models Representing conditional dependence graphically

Algorithms for hard problems Parameterized complexity Bounded tree width approaches Juris Viksna, 2015.

Belief Networks Kostas Kontogiannis E&CE 457. Belief Networks A belief network is a graph in which the following holds: –A set of random variables makes.

1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2006 Readings: K&F: 3.1, 3.2, 3.3.

1 Day 2: Search June 9, 2015 Carnegie Mellon University Center for Causal Discovery.

An Algorithm to Learn the Structure of a Bayesian Network Çiğdem Gündüz Olcay Taner Yıldız Ethem Alpaydın Computer Engineering Taner Bilgiç Industrial.

1 Day 2: Search June 14, 2016 Carnegie Mellon University Center for Causal Discovery.

CSC317 1 At the same time: Breadth-first search tree: If node v is discovered after u then edge uv is added to the tree. We say that u is a predecessor.

Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.

CS 2750: Machine Learning Directed Graphical Models

Bell & Coins Example Coin1 Bell Coin2

Markov Properties of Directed Acyclic Graphs

Dependency Models – abstraction of Probability distributions

CSCI 5822 Probabilistic Models of Human and Machine Learning

Inferring Causal Graphs

CS 188: Artificial Intelligence Spring 2007

CS 188: Artificial Intelligence Spring 2006

BN Semantics 2 – The revenge of d-separation

CS 201 Compiler Construction

Presentation transcript:

Learning Causality Some slides are from Judea Pearl’s class lecture

A causal model Example Statement ‘rain causes mud’ implies an asymmetric relationship: the rain will create mud, but the mud will not create rain. Use ‘→’ when refer such causal relationship; There is no arrow between ‘rain’ and ‘other causes of mud’ means that there is no direct causal relationship between them; Rain Other causes of mudMud

Directed (causal) Graphs A and B are causally independent; C, D, E, and F are causally dependent on A and B; A and B are direct causes C; A and B are indirect causes D, E and F; If C is prevented from changing with A and B, then A and B will no longer cause changes in D, E and F. A F D E B C

Conditional Independence

Conditional Independence (Notation)

Causal Structure

Causal Structure (cont’d) A Causal Structure serves as a blueprint for forming a “casual model” – a precise specification of how each variable is influenced by its parents in the DAG. We assume that Nature is at liberty to impose arbitrary functional relationships between each effect and its causes and then to perturb these relationships by introducing arbitrary disturbance; These disturbances reflect “hidden” or unmeasurable conditions.

Causal Model

Causal Model (Cont’d) Once a causal model M is formed, it defines a joint probability distribution P(M) over the variables in the system; This distribution reflects some features of the causal structure –Each variable must be independent of its grandparents, given the values of its parents We may allowed to inspect a select subset O  V of “observed” variables to ask questions about P[o], the probability distribution over the observations; We may recover the topology D of the DAG, from features of the probability distribution P[o].

Inferred Causation

Latent Structure

Structure Preference

Structure Preference (Cont’d) The set of independencies entailed by a causal structure imposes limits on its power to mimic other structure; L1 cannot be preferred to L2 if there is even one observable dependency that is permitted by L1 and forbidden by L2; L1 is preferred to L2 if L2 has subset of L1’s independence; Thus, test for preference and equivalence can sometimes be reduced to test dependencies, which can be determined by topology of the DAGs without concerning parameters.

Minimality

Consistency

Inferred Causation

Examples {a,b,c,d} reveal two independencies: 1.a is independent of b; 2.d is independent of {a,b} given c; Assume further that the data reveals no other independencies; a = having a cold; b = having hay fever; c = having to sneeze; d = having to wipe one’s nose.

Example (Cont’d) {a,b,c,d} reveal two independencies: 1.a is independent of b; 2.d is independent of {a,b} given c; minimal Arbitrary relations between a and b Not minimal: fails to impose conditional Independence between d and {a,b} Not consistent with data: impose marginal independence between d and {a,b}

Stability The stability condition states that, as we vary the parmeters from  to , no indpendence in P can be destroyed. In other words, if the independency exists, it will always exists.

Stable distribution A probability distribution P is a faithful/stable distribution if there exist a directed acyclic graph (DAG) D such that the conditional independence relationship in P is also shown in the D, and vice versa.

IC algorithm (Inductive Causation) IC algorithm (Pearl) –Based on variable dependencies; –Find all pairs of variables that are dependent of each other (applying standard statistical method on the database); –Eliminate (as much as possible) indirect dependencies; –Determine directions of dependencies;

Comparing abduction, deduction and induction Deduction: major premise: All balls in the box are black minor premise: These balls are from the box conclusion: These balls are black Abduction: rule: All balls in the box are black observation: These balls are black explanation: These balls are from the box Induction: case: These balls are from the box observation: These balls are black hypothesized rule: All ball in the box are black A => B A B A => B B Possibly A Whenever A then B but not vice versa Possibly A => B Induction: from specific cases to general rules; Abduction and deduction: both from part of a specific case to other part of the case using general rules (in different ways) Source from httpwww.csee.umbc.edu/~ypeng/F02671/lecture-notes/Ch15.ppt

IC Algorithm (Cont’d) Input: –P – a stable distribution on a set V of variables; Output: –A pattern H(P) compatible with P; Patten: is a partially directed DAG some edges are directed and some edges are undirected;

IC Algorithm: Step 1 For each pair of variables a and b in V, search for a set S ab such that (a╨b | S ab ) holds in P – in other words, a and b should be independent in P, conditioned on S ab. Construct an undirected graph G such that vertices a and b are connected with an edge if and only if no set S ab can be found. S ab a Not  S ab b  S ab ab ab ╨

IC Algorithm: Step 2 For each pair of nonadjacent variables a and b with a common neighbor c, check if c  S ab. If it is, then continue; Else add arrowheads at c i.e a→ c ← b Yes c a b abC ╨ No c a b

Example Rain Other causes of mud Mud Rain Other causes of mud Mud

IC Algorithm Step 3 In the partially directed graph that results, orient as many of the undirected edges as possible subject to two conditions: –The orientation should not create a new v- structure; –The orientation should not create a directed cycle;

Rules required to obtaining a maximally oriented pattern R1: Orient b — c into b→c whenever there is an arrow a→b such that a and c are non adjacent; cbcb bac

Rules required to obtaining a maximally oriented pattern R2: Orient a — b into a→b whenever there is a chain a→c→b; baba cab

Rules required to obtaining a maximally oriented pattern R3: Orient a — b into a→b whenever there are two chains a — c→b and a — d→b such that c and d are nonadjacent; baba c ab d

Rules required to obtaining a maximally oriented pattern R4: Orient a — b into a→b whenever there are two chains a — c→d and c→d→b such that c and b are nonadjacent; baba cad dcb

IC* Algorithm Input: –P, a sampled distribution; Output: –core(P), a marked pattern;

Marked Pattern:Four types of edges

IC* Algorithm: Step 1 For each pair of variables a and b, search for a set S ab such that a and b are independent in P, conditioned on S ab. If there is no such S ab, place an undirected link between the two variables, a – b.

IC* Algorithm: Step 2 For each pair of nonadjacent variables a and b with a common neighbor c, check if c  S ab –If it is, then continue; –If it is not, then add arrow heads pointing at c (i.e. a  c  b). In the partially directed graph that results, add (recursively) as many arrowheads as possible, and mark as many edges as possible, according to the following two rules:

IC* Algorithm: Rule 1 R1: For each pair of non-adjacent nodes a and b with a common neighbor c, if the link between a and c has an arrow head into c and if the link between c and b has no arrowhead into c, then add an arrow head on the link between c and b pointing at b and mark that link to obtain c –*  b; c a b c a b *

IC* Algorithm: Rule 2 R2: If a and b are adjacent and there is a directed path (composed strictly of marked links) from a to b, then add an arrowhead pointing toward b on the link between a and b;