Introduction to Artificial Intelligence

Slides:



Advertisements
Similar presentations
Bayesian networks Chapter 14 Section 1 – 2. Outline Syntax Semantics Exact computation.
Advertisements

Learning with Missing Data
BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.
Learning: Parameter Estimation
1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig.
Bayesian Networks Chapter 14 Section 1, 2, 4. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.
Graphical Models - Learning -
Bayesian Networks - Intro - Wolfram Burgard, Luc De Raedt, Kristian Kersting, Bernhard Nebel Albert-Ludwigs University Freiburg, Germany PCWP CO HRBP HREKG.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) March, 16, 2009.
Graphical Models - Inference -
Review: Bayesian learning and inference
Bayesian Networks. Motivation The conditional independence assumption made by naïve Bayes classifiers may seem to rigid, especially for classification.
Graphical Models - Inference - Wolfram Burgard, Luc De Raedt, Kristian Kersting, Bernhard Nebel Albert-Ludwigs University Freiburg, Germany PCWP CO HRBP.
Bayesian networks Chapter 14 Section 1 – 2.
. PGM: Tirgul 10 Learning Structure I. Benefits of Learning Structure u Efficient learning -- more accurate models with less data l Compare: P(A) and.
Bayesian Belief Networks
Bayesian Reasoning. Tax Data – Naive Bayes Classify: (_, No, Married, 95K, ?)
Bayesian networks More commonly called graphical models A way to depict conditional independence relationships between random variables A compact specification.
Artificial Intelligence CS 165A Tuesday, November 27, 2007  Probabilistic Reasoning (Ch 14)
Read R&N Ch Next lecture: Read R&N
Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
COMP 538 Reasoning and Decision under Uncertainty Introduction Readings: Pearl (1998, Chapter 1 Shafer and Pearl, Chapter 1.
Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):
CS498-EA Reasoning in AI Lecture #10 Instructor: Eyal Amir Fall Semester 2009 Some slides in this set were adopted from Eran Segal.
Review: Bayesian inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y.
Some Neat Results From Assignment 1. Assignment 1: Negative Examples (Rohit)
Guidance: Assignment 3 Part 1 matlab functions in statistics toolbox  betacdf, betapdf, betarnd, betastat, betafit.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) Nov, 13, 2013.
PROBABILISTIC REASONING Heng Ji 04/05, 04/08, 2016.
Bayesian Networks Read R&N Ch Next lecture: Read R&N
Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016.
A Brief Introduction to Bayesian networks
Another look at Bayesian inference
Reasoning Under Uncertainty: Belief Networks
CS 2750: Machine Learning Directed Graphical Models
Bayesian networks Chapter 14 Section 1 – 2.
Presented By S.Yamuna AP/CSE
Qian Liu CSE spring University of Pennsylvania
Read R&N Ch Next lecture: Read R&N
Introduction to Artificial Intelligence
Learning Bayesian Network Models from Data
CS 4/527: Artificial Intelligence
CS 4/527: Artificial Intelligence
Introduction to Artificial Intelligence
CSCI 5822 Probabilistic Models of Human and Machine Learning
Read R&N Ch Next lecture: Read R&N
Bayesian Networks Read R&N Ch. 13.6,
Read R&N Ch Next lecture: Read R&N
Read R&N Ch Next lecture: Read R&N
Probabilistic Reasoning; Network-based reasoning
Bayesian Networks Based on
Read R&N Ch Next lecture: Read R&N
CS 188: Artificial Intelligence
CAP 5636 – Advanced Artificial Intelligence
Bayesian Statistics and Belief Networks
CS 188: Artificial Intelligence Fall 2007
CAP 5636 – Advanced Artificial Intelligence
CS 188: Artificial Intelligence
Class #16 – Tuesday, October 26
CS 188: Artificial Intelligence Spring 2007
Read R&N Ch Next lecture: Read R&N
Bayesian networks Chapter 14 Section 1 – 2.
Probabilistic Reasoning
CS 188: Artificial Intelligence Spring 2006
Read R&N Ch Next lecture: Read R&N
Bayesian networks (1) Lirong Xia. Bayesian networks (1) Lirong Xia.
CS 188: Artificial Intelligence Fall 2008
Presentation transcript:

Introduction to Artificial Intelligence Bayesian Networks CS171, Fall 2016 Introduction to Artificial Intelligence Prof. Alexander Ihler Reading: R&N Ch 14

Why Bayesian Networks? Knowledge Representation & Reasoning (Inference) Propositional Logic Knowledge Base : Propositional logic sentences Reasoning : KB |= Theory Find a model or Count models Probabilistic Reasoning Knowledge Base : Full joint probability over all random variables Reasoning: Compute Pr ( KB |= Theory ) Find the most probable assignments Compute marginal / conditional probability Why Bayesian Net? Manipulating full joint probability distribution is very hard! Exploit conditional independence properties of our distribution Bayesian Network captures conditional independence Graphical Representation (Probabilistic Graphical Models) Tool for Reasoning, Computation (Probabilistic Reasoning bases on the Graph)

Conditional independence Recall: chain rule of probability p(x,y,z) = p(x) p(y|x) p(z|x,y) Some of these models will be conditionally independent e.g., p(x,y,z) = p(x) p(y|x) p(z|x) Some models may have even more independence E.g., p(x,y,z) = p(x) p(y) p(z) p(x) p(y|x) p(z|x,y) p(x) p(y|x) p(z|x) p(x) p(y) p(z)

Bayesian networks y z x x y z c b d a Directed graphical model Nodes associated with variables “Draw” independence in conditional probability expansion Parents in graph are the RHS of conditional Ex: y z x Graph must be acyclic Corresponds to an order over the variables (chain rule) x y z c b d a

Example Consider the following 5 binary variables: Example from Russell & Norvig Consider the following 5 binary variables: B = a burglary occurs at your house E = an earthquake occurs at your house A = the alarm goes off J = John calls to report the alarm M = Mary calls to report the alarm What is P(B | M, J) ? (for example) We can use the full joint distribution to answer this question Requires 25 = 32 probabilities Can we use prior domain knowledge to come up with a Bayesian network that requires fewer probabilities?

Constructing a Bayesian network Order the variables in terms of causality (may be a partial order) e.g., Now, apply the chain rule, and simplify based on assumptions These assumptions are reflected in the graph structure of the Bayesian network E B A J M

Constructing a Bayesian network Given Define probabilities: Where do these come from? Expert knowledge; estimate from data; some combination 1 + 1 + 4 + 2 + 2 P(B) 0.001 Earthquake Burglary P(E) 0.002 E B P(A|E,B) 0.001 1 0.29 0.94 0.95 Alarm A P(J|A) 0.05 1 0.90 John Mary A P(M|A) 0.01 1 0.70

Constructing a Bayesian network Joint distribution E B A J M P( … ) .93674 1 .00133 .00005 .00000 .00003 .00002 .04930 .00007 .00027 .00016 .00025 E B A J M P( … ) 1 .00946 .00001 .00000 .00007 .00004 .00050 .00063 .00037 .00059 E B A J M Full joint distribution: 25 = 32 probabilities Structured distribution: specify 10 parameters

Alarm network [Beinlich et al., 1989] The “alarm” network: 37 variables, 509 parameters (rather than 237 = 1011 !) MINVOLSET PULMEMBOLUS INTUBATION KINKEDTUBE VENTMACH DISCONNECT PAP SHUNT VENTLUNG VENITUBE PRESS MINOVL FIO2 VENTALV ANAPHYLAXIS PVSAT ARTCO2 TPR SAO2 INSUFFANESTH EXPCO2 HYPOVOLEMIA LVFAILURE CATECHOL LVEDVOLUME STROEVOLUME HISTORY ERRBLOWOUTPUT HR ERRCAUTER CVP PCWP CO HREKG HRSAT HRBP BP

Network structure and ordering The network structure depends on the conditioning order Suppose we choose ordering M, J, A, B, E No Mary John No Alarm No Yes No Yes No Burglary Earthquake

Network structure and ordering The network structure depends on the conditioning order Suppose we choose ordering M, J, A, B, E “Non-causal” ordering Deciding independence is harder Selecting probabilities is harder Representation is less efficient Mary John Alarm 1 + 2 + 4 + 2 + 4 = 13 probabilities Burglary Earthquake

Network structure and ordering The network structure depends on the conditioning order Suppose we choose ordering M, J, A, B, E “Non-causal” ordering Deciding independence is harder Selecting probabilities is harder Representation is less efficient Some orders may not reveal any independence! Earthquake Burglary Alarm John Mary

Reasoning in Bayesian networks Suppose we observe J Observing J makes A more likely A being more likely makes B more likely Suppose we observe A Makes M more likely Observe A and J? J doesn’t add anything to M Observing A makes J, M independent How can we read independence directly from the graph? Earthquake Burglary Alarm John Mary

Reasoning in Bayesian networks How are J,M related given A? P(M) = 0.0117 P(M|A) = 0.7 P(M|A,J) = 0.7 Conditionally independent Proof: Earthquake Burglary Alarm John Mary (we actually know this by construction!)

Reasoning in Bayesian networks How are J,B related given A? P(B) = 0.001 P(B|A) = 0.3735 P(B|A,J) = 0.3735 Conditionally independent Proof: Earthquake Burglary Alarm John Mary

Reasoning in Bayesian networks How are E,B related? P(B) = 0.001 P(B|E) = 0.001 (Marginally) independent What about given A? P(B|A) = 0.3735 P(B|A,E) = 0.0032 Not conditionally independent! The “causes” of A become coupled by observing its value Sometimes called “explaining away” Earthquake Burglary Alarm John Mary

D-Separation Prove sets X,Y independent given Z? Check all undirected paths from X to Y A path is “inactive” if it passes through: (1) A “chain” with an observed variable (2) A “split” with an observed variable (3) A “vee” with only unobserved variables below it If all paths are inactive, conditionally independent! X Y V X Y V X Y V

Example SMALL MINVOLSET PULMEMBOLUS INTUBATION KINKEDTUBE VENTMACH DISCONNECT PAP SHUNT VENTLUNG VENITUBE PRESS MINOVL FIO2 VENTALV ANAPHYLAXIS PVSAT ARTCO2 TPR SAO2 INSUFFANESTH EXPCO2 HYPOVOLEMIA LVFAILURE CATECHOL LVEDVOLUME STROEVOLUME HISTORY ERRBLOWOUTPUT HR ERRCAUTER CVP PCWP CO HREKG HRSAT HRBP BP

Testing D-Separation Given Bayes net G, are sets X, Y independent given Z? Define set S = Z and all ancestors of Z Search (DFS, BFS) from all X; after following edge (n, n’): If n->n’ and n’ in S, follow any incoming edges If n’ not in Z, follow any outgoing edges If search arrives at a node in Y, not independent.

Markov blanket A node is conditionally independent of all other nodes in the network given its Markov blanket (in gray)

Graphs and Independence Graph structure allows us to infer independence in p(.) X,Y d-separated given Z? Adding edges Fewer independencies inferred, but still valid to represent p(.) Complete graph: can represent any distribution p(.) P(B) 0.001 Earthquake Burglary P(E) 0.002 E B P(A|E,B) 0.001 1 0.29 0.94 0.95 “G is an I-Map for p(.)” = any independence identified using the graph G is also present in p. Note that there may be additional independencies in p that are not obvious from G. “p is Markov with respect to G” = same thing. Note that, if we cannot reason that an independence must hold from G, it means there is some distribution p(.) of the given form for which that independence does not hold (but maybe not with these values) E A P(J|E,A) 0.05 1 0.90 Alarm A P(J|A) 0.05 1 0.90 A P(M|A) 0.01 1 0.70 John Mary

Genetic linkage analysis ? | ? A | a B | b 1 2 A | A B | b ? | A ? | B 3 4 - 6 individuals - Haplotype: {2, 3} - Genotype: {6} - Unknown ? | ? A | a B | b 5 6

Pedigree model: 6 people, 3 markers L11m L11f X11 L12m L12f X12 L13m L13f X13 L14m L14f X14 L15m L15f X15 L16m L16f X16 S13m S15m S16m L21m L21f X21 L22m L22f X22 L23m L23f X23 L24m L24f X24 L25m L25f X25 L26m L26f X26 S23m S25m S26m L31m L31f X31 L32m L32f X32 L33m L33f X33 L34m L34f X34 L35m L35f X35 L36m L36f X36 S33m S35m S36m

Naïve Bayes Model X1 X2 X3 Xn C P(C | X1,…,Xn) = α Π P(Xi | C) P (C) Features X are conditionally independent given the class variable C Widely used in machine learning e.g., spam email classification: X’s = counts of words in emails Probabilities P(C) and P(Xi | C) can easily be estimated from labeled data

Naïve Bayes Model (2) P(C | X1,…Xn) = α Π P(Xi | C) P (C) <Learning Naïve Bayes Model> Probabilities P(C) and P(Xi | C) can easily be estimated from labeled data P(C = cj) ≈ #(Examples with class label cj) / #(Examples) P(Xi = xik | C = cj) ≈ #(Examples with Xi value xik and class label cj) / #(Examples with class label cj) Usually easiest to work with logs log [ P(C | X1,…Xn) ] = log α + Σ [ log P(Xi | C) + log P (C) ] DANGER: Suppose ZERO examples with Xi value xik and class label cj ? An unseen example with Xi value xik will NEVER predict class label cj ! Practical solutions: Pseudocounts, e.g., add 1 to every #() , etc. Theoretical solutions: Bayesian inference, beta distribution, etc.

Hidden Markov Models Two key assumptions Widely used in: Hidden state sequence is Markov Observations o_t is conditionally independent given state x_t Widely used in: speech recognition, protein sequence models, ... Bayesian network is a tree, so inference is linear in n Exploit graph structure for efficient computation (as in CSPs) x0 x1 x2 x3 x4 o0 o1 o2 o3 o4 Hidden - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Observed

Summary Bayesian networks Building a Bayesian network Directed, acyclic graphs Encode chain rule + conditional independence Efficient representation Building a Bayesian network Select ordering Evaluate independence relations Assign probabilities Reasoning about independence in the graph D-separation Markov blanket

You should know… Basic concepts and vocabulary of Bayesian networks. Nodes represent random variables. Directed arcs represent (informally) direct influences. Conditional probability tables, P( Xi | Parents(Xi) ). Given a Bayesian network: Write down the full joint distribution it represents. Given a full joint distribution in factored form: Draw the Bayesian network that represents it. Given a variable ordering and some background assertions of conditional independence among the variables: Write down the factored form of the full joint distribution, as simplified by the conditional independence assertions.

Summary Bayesian networks represent a joint distribution using a graph The graph encodes a set of conditional independence assumptions Answering queries (or inference or reasoning) in a Bayesian network amounts to efficient computation of appropriate conditional probabilities Probabilistic inference is intractable in the general case But can be carried out in linear time for certain classes of Bayesian networks