Inferring Causal Graphs

Slides:

Advertisements

Similar presentations

Recognising Languages We will tackle the problem of defining languages by considering how we could recognise them. Problem: Is there a method of recognising.

Advertisements

Graph Algorithms Algorithm Design and Analysis Victor AdamchikCS Spring 2014 Lecture 11Feb 07, 2014Carnegie Mellon University.

BAYESIAN NETWORKS Ivan Bratko Faculty of Computer and Information Sc. University of Ljubljana.

Weakening the Causal Faithfulness Assumption

Outline 1)Motivation 2)Representing/Modeling Causal Systems 3)Estimation and Updating 4)Model Search 5)Linear Latent Variable Models 6)Case Study: fMRI.

Exact Inference in Bayes Nets

Identifying Conditional Independencies in Bayes Nets Lecture 4.

Bayesian Networks VISA Hyoungjune Yi. BN – Intro. Introduced by Pearl (1986 ) Resembles human reasoning Causal relationship Decision support system/ Expert.

Learning Causality Some slides are from Judea Pearl’s class lecture

Causal Networks Denny Borsboom. Overview The causal relation Causality and conditional independence Causal networks Blocking and d-separation Excercise.

Bayesian Networks A causal probabilistic network, or Bayesian network,

PGM 2003/04 Tirgul 3-4 The Bayesian Network Representation.

CSE 571 Advanced Artificial Intelligence Nov 24, 2003 Class Notes Transcribed By: Jon Lammers.

Bayesian Network Representation Continued

Graphical Models Lei Tang. Review of Graphical Models Directed Graph (DAG, Bayesian Network, Belief Network) Typically used to represent causal relationship.

Inferring Causal Graphs Computing 882 Simon Fraser University Spring 2002.

Bayesian Networks Alan Ritter.

CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.

. DAGs, I-Maps, Factorization, d-Separation, Minimal I-Maps, Bayesian Networks Slides by Nir Friedman.

Causal Modeling for Anomaly Detection Andrew Arnold Machine Learning Department, Carnegie Mellon University Summer Project with Naoki Abe Predictive Modeling.

. PGM 2002/3 – Tirgul6 Approximate Inference: Sampling.

1 Day 2: Search June 9, 2015 Carnegie Mellon University Center for Causal Discovery.

Bayes Net Perspectives on Causation and Causal Inference

Summary of the Bayes Net Formalism David Danks Institute for Human & Machine Cognition.

A Brief Introduction to Graphical Models

Introduction to Bayesian Networks

Learning Linear Causal Models Oksana Kohutyuk ComS 673 Spring 2005 Department of Computer Science Iowa State University.

Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,

Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.

Computing & Information Sciences Kansas State University Data Sciences Summer Institute Multimodal Information Access and Synthesis Learning and Reasoning.

INTERVENTIONS AND INFERENCE / REASONING. Causal models  Recall from yesterday:  Represent relevance using graphs  Causal relevance ⇒ DAGs  Quantitative.

1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, –  Carlos.

1 Bayesian Networks (Directed Acyclic Graphical Models) The situation of a bell that rings whenever the outcome of two coins are equal can not be well.

1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,

Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:

Recognising Languages We will tackle the problem of defining languages by considering how we could recognise them. Problem: Is there a method of recognising.

Reasoning Under Uncertainty: Independence and Inference CPSC 322 – Uncertainty 5 Textbook §6.3.1 (and for HMMs) March 25, 2011.

1 BN Semantics 2 – Representation Theorem The revenge of d-separation Graphical Models – Carlos Guestrin Carnegie Mellon University September 17.

Today Graphical Models Representing conditional dependence graphically

Belief Networks Kostas Kontogiannis E&CE 457. Belief Networks A belief network is a graph in which the following holds: –A set of random variables makes.

Markov Random Fields in Vision

1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2006 Readings: K&F: 3.1, 3.2, 3.3.

1 Day 2: Search June 9, 2015 Carnegie Mellon University Center for Causal Discovery.

1Causal Inference and the KMSS. Predicting the future with the right model Stijn Meganck Vrije Universiteit Brussel Department of Electronics and Informatics.

1 Day 2: Search June 14, 2016 Carnegie Mellon University Center for Causal Discovery.

CS 2750: Machine Learning Directed Graphical Models

CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12

Qian Liu CSE spring University of Pennsylvania

CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12

Bell & Coins Example Coin1 Bell Coin2

Markov Properties of Directed Acyclic Graphs

CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12

Dependency Models – abstraction of Probability distributions

CSCI 5822 Probabilistic Models of Human and Machine Learning

Bayesian Networks Based on

CAP 5636 – Advanced Artificial Intelligence

CS 188: Artificial Intelligence Fall 2007

CS 188: Artificial Intelligence Fall 2008

An Algorithm for Bayesian Network Construction from Data

Markov Random Fields Presented by: Vladan Radosavljevic.

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence Spring 2007

Readings: K&F: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7 Markov networks, Factor graphs, and an unified view Start approximate inference If we are lucky… Graphical.

CPS 570: Artificial Intelligence Bayesian networks

Instructor: Vincent Conitzer

CS 188: Artificial Intelligence Spring 2006

CS 188: Artificial Intelligence Fall 2008

BN Semantics 2 – The revenge of d-separation

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14

Presentation transcript:

Inferring Causal Graphs Computing 882 Simon Fraser University Spring 2002

Applications of Bayes Nets (I) Windows Office “Paper Clip.” Bill Gates: “The competitive advantage of Microsoft lies in our expertise in Bayes Nets.” UBC Intelligent Tutoring System (ASI X-change).

Applications of Bayes Nets (II) University Drop-outs: Search program Tetrad II says that higher SAT score would lead to lower drop-out rate. Carnegie Mellon uses this to reduce its drop-out rates. Tetrad II recalibrates Mass Spectrometer on earth satellite. Tetrad II predicts relation between corn exports and exchange rates.

Bayes Nets: Basic Definitions Defn: A and B are independent iff P(A and B) = P(A) x P(B). Exercise: Prove that A and B are independent iff P(A|B) = P(A). Thus independence implies irrelevance.

Independence Among Variables Let X,Y,Z be random variables. X is independent of Y iff P(X=x| Y=y) = P(X=x) for all x,y s.t. P(Y=y) > 0. X is independent of Y given Z iff P(X=x|Y=y,Z=z) = P(Z=z) for all y,z s.t. P(Y=y and Z=z) >0. Notation: (X Y|Z). Intuitively: given information Z, Y is irrelevant to X.

Axioms for Informational Relevance Pearl (2000), p.11. It’s possible to read the  symbol as “irrelevant”. Then we can consider a number of axioms for  as axiomatizations of relevance, for example: Symmetry: if (X  Y|Z) then (Y  X|Z). Decomposition: if (X  YW|Z) then (X  Y|Z).

Markovian Parents In constructing a Bayes net, we look for “direct causes” – variables that “immediately determine” the value of another value. Such direct causes “screen off” other variables. Formally: Let an ordering of variables X1, …, Xn be given. Consider Xj. Let PA be any subset of X1,…,Xj-1. Suppose that P(Xj|PA) = P(Xj|X1,..,Xj) and that no subset of PA has this property. Then PA forms the Markovian parents of Xj. Mention Reichenbach 1956 for screening off. Note connection with Markov decision processes.

Markovian Parents and Bayes Nets Given an ordering of variables, we can construct a causal graphs by drawing arrows between Markovian parents and children. Note that graphs are suitable for drawing the distinction between “direct” and “intermediate” causes. Exercise: For the variables in figure 1.2, construct a Bayes net in the given ordering. Exercise: Construct a Bayes net along the ordering (X5, X1, X3, X2, X4).

Independence in Bayes Nets Note how useful irrelevance information is – think of a Prolog-style logical database. A typical problem: Given some information Z, and a query about X, is Y relevant to X? For Bayes nets, the d-separation criterion is a powerful answer.

d-separation In principle, information can flow along any path between two variables X and Y. Provisos: A path is blocked by any collider. Conditioning on a node reverses its status. Conditioning on non-collider makes it block. Conditioning on collider or its descendant makes it unblocked. Examples: In 1.2, X2 and x3 are d-separated by X1. (Check that this holds for other ordering as well). But X2 and X3 are not d-separated by {X1,X5}. Also look at figure 1.3.

d-separation characterizes independence If X,Y d-separated by Z in a DAG G, then (X Y|Z) in all probability distributions compatible with G. If X,Y not d-separated by Z in a DAG G, then not [(X Y|Z) in all probability distributions compatible with G.].

Observational Equivalence Suppose we can observe the probabilities of various occurrences (rain vs. umbrellas, smoking vs. lung cancer etc.). How does prob constrain graph? Two causal graphs G1,G2 are compatible with the same probs iff. G1 has the same adjacencies as G2 and the same v-structures (basically, colliders).

Observational Equivalence: Examples (I) In sprinkler network, cannot tell whether X1 -> X2 or vice versa. But can tell that X2 -> X4 and X4 -> X5. General note: You cannot always tell in machine learning what the correct hypothesis is even if you have all possible data -> need more assumptions or other kinds of data.

Observational Equivalence: Examples (II) Vancouver sun, March 29, 2002. “Adolescents …. Are more likely to turn to violence in their early twenties if they watch more than an hour of television a day… The team tracked more than 700 children and took into account the “chicken and egg” question: Does watching television cause aggression or do people prone to aggression watch more television?” [Science, Dr. Johnson, Columbia U.]

Two Models of Aggressive behaviour Disposition to aggression TV watching Violent behavour Disposition to aggression TV watching Violent behavour Are these two graphs observationally distinguishable?

Minimal Graphs A graph G is minimal for a probability distribution P iff G is compatible with P, and no subgraph of G is compatible with P. Example: not minimal if A {B,C,D} A C D B

Note on minimality Intuitively, minimality requires that you add an edge between A and B only if there is some dependence between A and B. In statistical tests, dependence is observable but independence is not. So minimality amounts to “assume independence until dependence is observed”. That is exactly the strategy for minimizing mind changes! (“assume reaction is impossible until observed”).

Stable Distributions A distribution P is stable iff there is a graph G such that (X  Y |Z) in P iff X and Y are d-separated given Z in G. Intuitively, stability rules out “exact counterbalance”: two forces both having a causal effect but cancelling out each other exactly in every circumstance.

Inferring Causal Structure: The IC Algorithm Assume a stable probability distribution P. Find a minimal graph for P with as many edges directed as possible. General idea: First find variables that are “directly causally related”. Connect those. Add arrows as far as possible.

Inferring Causal Structure: The IC Algorithm For each pair of variables X and Y, look for a “screen off” set S(X,Y) s.t. X  Y| S(X,Y) holds. If there is no such set, add an undirected edge between X and Y. For each pair X,Y with a common neighbour Z, check if Z is part of a “screening off” set S(X,Y). If not, make Z a common consequence of X,Y. Orient edges without creating cycles or v-structures. Do edges in Sprinkler example.

Rules for Orientation Given a  b, b – c add b c if a,c are not linked (no new collider). Given a c b, a – b add a  b (no cycle). Given a – c d and c d  b and a – b add a  b if c,d are not linked (no cycle + no new collider). Given a – c b and a – d b and a – b add a  b if c,d are not linked (no cycle + no new collider).