3/19. Conditional Independence Assertions We write X || Y | Z to say that the set of variables X is conditionally independent of the set of variables.

Slides:



Advertisements
Similar presentations
Bayesian networks Chapter 14 Section 1 – 2. Outline Syntax Semantics Exact computation.
Advertisements

Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Atomic events Probabilities Joint distribution Inference by enumeration Independence.
BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.
Identifying Conditional Independencies in Bayes Nets Lecture 4.
1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig.
Bayesian Networks Chapter 14 Section 1, 2, 4. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) March, 16, 2009.
Review: Bayesian learning and inference
10/24  Exam on 10/26 (Lei Tang and Will Cushing to proctor)
Probabilistic Calculus to the Rescue Suppose we know the likelihood of each of the (propositional) worlds (aka Joint Probability distribution ) Then we.
Bayesian Networks. Motivation The conditional independence assumption made by naïve Bayes classifiers may seem to rigid, especially for classification.
Probabilistic Reasoning Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 14 (14.1, 14.2, 14.3, 14.4) Capturing uncertain knowledge Probabilistic.
PGM 2003/04 Tirgul 3-4 The Bayesian Network Representation.
Bayesian networks Chapter 14 Section 1 – 2.
Bayesian Belief Networks
11/18 Everything is fine.. Everything is fine… Everything is fine…
10/15  Homework 3 due 10/17  Mid-term on 10/24  Project 2 accepted until 10/26 (4 days automatic extension)
17 th October --Project 2 can be submitted until Thursday (in-class) --Homework 3 due Thursday --Midterm next Thursday (10/26)
Bayesian Networks What is the likelihood of X given evidence E? i.e. P(X|E) = ?
Constructing Belief Networks: Summary [[Decide on what sorts of queries you are interested in answering –This in turn dictates what factors to model in.
5/25/2005EE562 EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005.
10/22  Homework 3 returned; solutions posted  Homework 4 socket opened  Project 3 assigned  Mid-term on Wednesday  (Optional) Review session Tuesday.
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
Probabilistic Propositional Logic Nov 6 th. Need for modeling uncertainity Consider a simple scenario: You know that rain makes grass wet. Sprinklers.
Bayesian networks More commonly called graphical models A way to depict conditional independence relationships between random variables A compact specification.
Probabilistic Reasoning
EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS
Artificial Intelligence CS 165A Tuesday, November 27, 2007  Probabilistic Reasoning (Ch 14)
Bayes’ Nets  A Bayes’ net is an efficient encoding of a probabilistic model of a domain  Questions we can ask:  Inference: given a fixed BN, what is.
Bayesian networks Chapter 14. Outline Syntax Semantics.
Bayesian networks Chapter 14 Section 1 – 2. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.
Bayesian Networks What is the likelihood of X given evidence E? i.e. P(X|E) = ?
Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.
1 Chapter 14 Probabilistic Reasoning. 2 Outline Syntax of Bayesian networks Semantics of Bayesian networks Efficient representation of conditional distributions.
2 Syntax of Bayesian networks Semantics of Bayesian networks Efficient representation of conditional distributions Exact inference by enumeration Exact.
Introduction to Bayesian Networks
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
Announcements Project 4: Ghostbusters Homework 7
Probabilistic Reasoning [Ch. 14] Bayes Networks – Part 1 ◦Syntax ◦Semantics ◦Parameterized distributions Inference – Part2 ◦Exact inference by enumeration.
CHAPTER 5 Probability Theory (continued) Introduction to Bayesian Networks.
Bayesian Networks CSE 473. © D. Weld and D. Fox 2 Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential.
Review: Bayesian inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y.
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
Conditional Probability, Bayes’ Theorem, and Belief Networks CISC 2315 Discrete Structures Spring2010 Professor William G. Tanner, Jr.
Belief Networks Kostas Kontogiannis E&CE 457. Belief Networks A belief network is a graph in which the following holds: –A set of random variables makes.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) Nov, 13, 2013.
Conditional Independence As with absolute independence, the equivalent forms of X and Y being conditionally independent given Z can also be used: P(X|Y,
PROBABILISTIC REASONING Heng Ji 04/05, 04/08, 2016.
Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
Web-Mining Agents Data Mining Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme Karsten Martiny (Übungen)
Bayesian networks Chapter 14 Section 1 – 2.
Presented By S.Yamuna AP/CSE
Qian Liu CSE spring University of Pennsylvania
Read R&N Ch Next lecture: Read R&N
Conditional Probability, Bayes’ Theorem, and Belief Networks
Read R&N Ch Next lecture: Read R&N
CS 188: Artificial Intelligence
CAP 5636 – Advanced Artificial Intelligence
CS 188: Artificial Intelligence Fall 2007
CS 188: Artificial Intelligence Fall 2008
CAP 5636 – Advanced Artificial Intelligence
CS 188: Artificial Intelligence
Class #16 – Tuesday, October 26
CS 188: Artificial Intelligence Spring 2007
Bayesian networks Chapter 14 Section 1 – 2.
Probabilistic Reasoning
CS 188: Artificial Intelligence Spring 2006
Read R&N Ch Next lecture: Read R&N
CS 188: Artificial Intelligence Fall 2008
Presentation transcript:

3/19

Conditional Independence Assertions We write X || Y | Z to say that the set of variables X is conditionally independent of the set of variables Y given evidence on the set of variables Z (where X,Y,Z are subsets of the set of all random variables in the domain model) We saw that Bayes Rule computations can exploit conditional independence assertions. Specifically, –X || Y| Z implies P(X & Y|Z) = P(X|Z) * P(Y|Z) P(X|Y, Z) = P(X|Z) P(Y|X,Z) = P(Y|Z) –If A||B|C then P(A,B,C)=P(A|B,C)P(B,C) =P(A|B,C)P(B|C)P(C) =P(A|C)P(B|C)P(C) (Can get by with 1+2+2=5 numbers instead of 8) Why not write down all conditional independence assertions that hold in a domain?

Cond. Indep. Assertions (Contd) Idea: Why not write down all conditional independence assertions (CIA) (X || Y | Z) that hold in a domain? Problem: There can be exponentially many conditional independence assertions that hold in a domain (recall that X, Y and Z are all subsets of the domain variables). Brilliant Idea: May be we should implicitly specify the CIA by writing down the “local dependencies” between variables using a graphical model –A Bayes Network is a way of doing just this. The Bayes Net is a Directed Acyclic Graph whose nodes are random variables, and the immediate dependencies between variables are represented by directed arcs –The topology of a bayes network shows the inter-variable dependencies. Given the topology, there is a way of checking if any Cond. Indep. Assertion. holds in the network (the Bayes Ball algorithm and the D-Sep idea)

CIA implicit in Bayes Nets So, what conditional independence assumptions are implicit in Bayes nets? –Local Markov Assumption: A node N is independent of its non-descendants (including ancestors) given its immediate parents. (So if P are the immediate paretnts of N, and A is a subset of of Ancestors and other non-descendants, then {N} || A| P ) (Equivalently) A node N is independent of all other nodes given its markov blanket (parents, children, children’s parents) –Given this assumption, many other conditional independencies follow. For a full answer, we need to appeal to D-Sep condition and/or Bayes Ball reachability

Topological Semantics Independence from Non-descedants holds Given just the parents Independence from Every node holds Given markov blanket These two conditions are equivalent Many other conditional indepdendence assertions follow from these Markov Blanket Parents; Children; Children’s other parents

Local Semantics Node independent of non-descendants given its parents Gives global semantics i.e. the full joint Put variables in reverse topological sort; apply chain rule P(J|M,A,~B,~E)*P(M|A,~B,E)*P(A|~B,E)*P(~B|E)*P(E) P(J|A) * P(M|A) *P(A|~B,E) * P(~B) * P(E) By conditional independence inherent in the Bayes Net

Alarm P(A|J,M) =P(A)? Burglary Earthquake How many probabilities are needed? 13 for the new; 10 for the old Is this the worst? Introduce variables in the causal order Easy when you know the causality of the domain hard otherwise..

Digression: Is assessing/learning numbers the only hard model-learning problem? We are making it sound as if assessing the probabilities is a big deal In doing so, we are taking into account model acquisition/learning costs. How come we didn’t care about these issues in logical reasoning? Is it because acquiring logical knowledge is easy? Actually—if we are writing programs for worlds that we (the humans) already live in, it is easy for us (humans) to add the logical knowledge into the program. It is a pain to give the probabilities.. On the other hand, if the agent is fully autonomous and is bootstrapping itself, then learning logical knowledge is actually harder than learning probabilities.. –For example, we will see that given the bayes network topology (“logic”), learning its CPTs is much easier than learning both topology and CPTs

Ideas for reducing the number of probabilties to be specified Problem 1: Joint distribution requires 2 n numbes to specify; and those numbers are harder to assess Problem 2: But, CPTs will be as big as the full joint if the network is dense CPTs Problem 3: But, CPTs can still be quite hard to specify if there are too many parents (or if the variables are continuous) Solution: Use Bayes Nets to reduce the numbers and specify them as CPTs Solution: Introduce intermediate variables to induce sparsity into the network Solution: Parameterize the CPT (use Noisy OR etc for discrete variables; gaussian etc for continuous variables)

Making the network Sparse by introducing intermediate variables Consider a network of boolean variables where n parent nodes are connected to m children nodes (with each parent influencing each child). –You will need n + m*2 n conditional probabilities Suppose you realize that what is really influencing the child nodes is some single aggregate function on the parent’s values (e.g. sum of the parents). –We can introduce a single intermediate node called “sum” which has links from all the n parent nodes, and separately influences each of the m child nodes Now you will wind up needing only n+ 2 n + 2m conditional probabilities to specify this new network!

Learning such hidden variables from data poses challenges..

Compact/Parameterized distributions are pretty much the only way to go when continuous variables are involved!

We only consider the failure to cause probability of the Causes that hold k i=j+1 riri How about Noisy And? (hint: A&B => ~( ~A V ~B) ) Prob that X holds even though i th parent doesn’t Think of a firing squad with upto k gunners trying to shoot you You will live only if everyone who shot missed..

Constructing Belief Networks: Summary [[Decide on what sorts of queries you are interested in answering –This in turn dictates what factors to model in the network Decide on a vocabulary of the variables and their domains for the problem –Introduce “Hidden” variables into the network as needed to make the network “sparse” Decide on an order of introduction of variables into the network –Introducing variables in causal direction leads to fewer connections (sparse structure) AND easier to assess probabilities Try to use canonical distributions to specify the CPTs –Noisy-OR –Parameterized discrete/continuous distributions Such as Poisson, Normal (Gaussian) etc

Case Study: Pathfinder System Domain: Lymph node diseases –Deals with 60 diseases and 100 disease findings Versions: –Pathfinder I: A rule-based system with logical reasoning –Pathfinder II: Tried a variety of approaches for uncertainity Simple bayes reasoning outperformed –Pathfinder III: Simple bayes reasoning, but reassessed probabilities –Parthfinder IV: Bayesian network was used to handle a variety of conditional dependencies. Deciding vocabulary: 8 hours Devising the topology of the network: 35 hours Assessing the (14,000) probabilities: 40 hours –Physician experts liked assessing causal probabilites Evaluation: 53 “referral” cases –Pathfinder III: 7.9/10 –Pathfinder IV: 8.9/10 [Saves one additional life in every 1000 cases!] –A more recent comparison shows that Pathfinder now outperforms experts who helped design it!!