Qian Liu CSE spring University of Pennsylvania

Slides:



Advertisements
Similar presentations
Bayesian networks Chapter 14 Section 1 – 2. Outline Syntax Semantics Exact computation.
Advertisements

Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Atomic events Probabilities Joint distribution Inference by enumeration Independence.
BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.
1 Knowledge Engineering for Bayesian Networks. 2 Probability theory for representing uncertainty l Assigns a numerical degree of belief between 0 and.
1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig.
Bayesian Networks Chapter 14 Section 1, 2, 4. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) March, 16, 2009.
Review: Bayesian learning and inference
Bayesian Networks. Motivation The conditional independence assumption made by naïve Bayes classifiers may seem to rigid, especially for classification.
Bayesian networks Chapter 14 Section 1 – 2.
Bayesian Belief Networks
Bayesian Networks What is the likelihood of X given evidence E? i.e. P(X|E) = ?
Bayesian Networks Alan Ritter.
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
Bayesian Reasoning. Tax Data – Naive Bayes Classify: (_, No, Married, 95K, ?)
1 Probabilistic Belief States and Bayesian Networks (Where we exploit the sparseness of direct interactions among components of a world) R&N: Chap. 14,
Bayesian networks More commonly called graphical models A way to depict conditional independence relationships between random variables A compact specification.
Artificial Intelligence CS 165A Tuesday, November 27, 2007  Probabilistic Reasoning (Ch 14)
Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.
Read R&N Ch Next lecture: Read R&N
Bayesian networks Chapter 14. Outline Syntax Semantics.
A Brief Introduction to Graphical Models
Bayesian networks Chapter 14 Section 1 – 2. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.
Probabilistic Belief States and Bayesian Networks (Where we exploit the sparseness of direct interactions among components of a world) R&N: Chap. 14, Sect.
Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.
Aprendizagem Computacional Gladys Castillo, UA Bayesian Networks Classifiers Gladys Castillo University of Aveiro.
Introduction to Bayesian Networks
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):
CHAPTER 5 Probability Theory (continued) Introduction to Bayesian Networks.
Bayesian Networks CSE 473. © D. Weld and D. Fox 2 Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential.
Review: Bayesian inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y.
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
Conditional Probability, Bayes’ Theorem, and Belief Networks CISC 2315 Discrete Structures Spring2010 Professor William G. Tanner, Jr.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) Nov, 13, 2013.
Conditional Independence As with absolute independence, the equivalent forms of X and Y being conditionally independent given Z can also be used: P(X|Y,
PROBABILISTIC REASONING Heng Ji 04/05, 04/08, 2016.
Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016.
A Brief Introduction to Bayesian networks
Another look at Bayesian inference
Reasoning Under Uncertainty: Belief Networks
CS 2750: Machine Learning Review
CS 2750: Machine Learning Directed Graphical Models
Bayesian Networks Chapter 14 Section 1, 2, 4.
Bayesian networks Chapter 14 Section 1 – 2.
Presented By S.Yamuna AP/CSE
Computer Science Department
Read R&N Ch Next lecture: Read R&N
Introduction to Artificial Intelligence
Conditional Probability, Bayes’ Theorem, and Belief Networks
Learning Bayesian Network Models from Data
CS 4/527: Artificial Intelligence
Introduction to Artificial Intelligence
Read R&N Ch Next lecture: Read R&N
Read R&N Ch Next lecture: Read R&N
Read R&N Ch Next lecture: Read R&N
CAP 5636 – Advanced Artificial Intelligence
CSE 473: Artificial Intelligence Autumn 2011
CS 188: Artificial Intelligence Fall 2007
CS 188: Artificial Intelligence
Belief Networks CS121 – Winter 2003 Belief Networks.
Read R&N Ch Next lecture: Read R&N
Bayesian networks Chapter 14 Section 1 – 2.
Probabilistic Reasoning
Read R&N Ch Next lecture: Read R&N
Bayesian Networks CSE 573.
Warm-up as you walk in Each node in a Bayes net represents a conditional probability distribution. What distribution do you get when you multiply all of.
Probabilistic Reasoning
Presentation transcript:

Qian Liu CSE 391 2005 spring University of Pennsylvania Belief Networks Qian Liu CSE 391 2005 spring University of Pennsylvania CSE 391 2005

Outline Motivation BN=DAG+CPTs Inference Learning Application of BNs CSE 391 2005

Outline Motivation BN=DAG+CPTs Inference Learning Application of BNs CSE 391 2005

From application What kind of application problems can be solved by Belief Networks? Classifier: classify email, webpage… Medical diagnosis Trouble shooting system in MS Windows Bouncy paperclip guy in MS Word Speech recognition Gene finding aka: Bayesian Network, Causal Network, Directed Graphical Model…… CSE 391 2005

From representation Problem: How to describe joint distribution of N random variables P(X1,X2,…,XN)? If we are to write the joint distribution in a table, how many entries are there in all in the table? X1, …, XN P 0, 0, ….., 0 0.008 0, 0, ….., 1 0.001 …… CSE 391 2005

From representation Problem: How to describe joint distribution of N random variables P(X1,X2,…,XN)? If we are to write the joint distribution in a table, how many entries are there in all in the table? X1, …, XN P 0, 0, ….., 0 0.008 0, 0, ….., 1 0.001 …… # entries = 2 N CSE 391 2005

From representation Problem: How to describe joint distribution of N random variables P(X1,X2,…,XN)? If we are to write the joint distribution in a table, how many entries are there in all in the table? X1, …, XN P 0, 0, ….., 0 0.008 0, 0, ….., 1 0.001 …… # entries = 2 N Too many! CSE 391 2005

From representation Problem: How to describe joint distribution of N random variables P(X1,X2,…,XN)? If we are to write the joint distribution in a table, how many entries are there in all in the table? X1, …, XN P 0, 0, ….., 0 0.008 0, 0, ….., 1 0.001 …… # entries = 2 N Too many! Is there a more clever way of representing the joint distribution? CSE 391 2005

From representation Problem: How to describe joint distribution of N random variables P(X1,X2,…,XN)? If we are to write the joint distribution in a table, how many entries are there in all in the table? X1, …, XN P 0, 0, ….., 0 0.008 0, 0, ….., 1 0.001 …… # entries = 2 N Too many! Is there a more clever way of representing the joint distribution? YES! ---- Belief Network CSE 391 2005

Outline Motivation BN=DAG+CPTs Inference Learning Application of BNs CSE 391 2005

What is BN: BN=DAG+CPTs Belief Network consists of a Directed Acyclic Graph, and Conditional Probability Tables. DAG Nodes: random variables Directed edges represent causal relations CPTs Each random variable has a CPT CPT specifies BN specifies a joint distribution on the variables: CSE 391 2005

Alarm example Random variables: Burglary, Earthquake, Alarm, JohnCalls, MaryCalls Causal relationship among the variables: A burglary can set the alarm off An earthquake can set the alarm off The alarm can cause Mary to call The alarm can cause John to call Causal relations reflect domain knowledge. CSE 391 2005

Alarm example (cont.) CSE 391 2005

Joint distribution BN specifies a joint distribution on the variables Alarm example Shorthand notation: CSE 391 2005

Another example CPTs Joint probability Convention to write joint probability: write each variable in a “cause  effects” order Belief Networks are generative models, which can generate data in a “cause  effects” order. CSE 391 2005

Compactness Belief Network offers a simple and compact way of representing the joint distribution of many random variables. The number of entries in a full joint distribution table ~ But, for BN, suppose a variable has at most k parents, then the total number of entries in all the CPTs ~ In real practice, k << N. We save a lot of space! CSE 391 2005

Outline Motivation BN=DAG+CPTs Inference Learning Application of BNs CSE 391 2005

Inference The task of inference is to compute the posterior probability of a set of query variables given a set of evidence variables, denoted as , given the Belief Network is known. Since the joint distribution is known, can be computed naively and inefficiently by product rule marginalization CSE 391 2005

Conditional independence A,B are (unconditionally) independent P(A,B) = P(A)P(B) (1) I. P(A|B) = P(A) (2) I. P(B|A) = P(B) (3) I. (1),(2),(3) are equivalent A,B are conditionally independent, given evidence E P(A,B|E) = P(A|E)P(B|E) (1) C.I. P(A|B,E) = P(A|E) (2) C.I. P(B|A,E) = P(B|E) (3) C.I. CSE 391 2005

Conditional independence A,B are (unconditionally) independent A,B are conditionally independent, given evidence E P(A,B) = P(A)P(B) (1) I. P(A,B|E) = P(A|E)P(B|E) (1) C.I. P(A|B) = P(A) (2) I. P(A|B,E) = P(A|E) (2) C.I. P(B|A) = P(B) (3) I. P(B|A,E) = P(B|E) (3) C.I. (1),(2),(3) are equivalent CSE 391 2005

Conditional independence Chain rule P(B,E,A,J,M) = P(B) * P(E|B) * P(A|B,E) * P(J|B,E,A) * P(M|B,E,A,J) BN P(B,E,A,J,M) = P(B) * P(E) ---- B is I. of E. * P(A|B,E) * P(J|A) ----J is C.I. of B,E, given A * P(M|A) ----M is C.I. of B,E,J, given A Belief Networks explore conditional independence among variables so as to represent the joint distribution compactly. CSE 391 2005

Conditional independence Belief Network encodes conditional independence in the graph structure. (1) A variable is C.I. of its non-descendants, given all its parents. (2) A variable is C.I. of all the other variables, given all its parents, children and children’s parents ---- that is , given its Markov blanket. CSE 391 2005

Conditional independence Alarm example B is I. with E. or, B is C.I. with E, given nothing. (1) J is C.I. with B,E,M, given A. (1) M is C.I. with B,E,J, given A. (1) Another example U is C.I. with X, given Y,V,Z (2) CSE 391 2005

Examples of inference Alarm example ---- We know: P(A) = ? CPTs: P(B), P(E), P(A|B,E), P(J|A), P(M|A) Conditional independence P(A) = ? Marginalization Chain rule B, E are Ind. CSE 391 2005

Examples of inference Alarm example ---- We know: P(J,M) = ? CPTs: P(B), P(E), P(A|B,E), P(J|A), P(M|A) Conditional independence P(J,M) = ? marginalization chain rule J is C.I. of M, given A CSE 391 2005

Outline Motivation BN=DAG+CPTs Inference Learning Applications of BNs CSE 391 2005

Learning The task of learning is to learn the Belief Network which can best describe the data we observe. Let’s assume the DAG is known, then the learning problem is simplified to learning the best CPTs from data, according to some “goodness” criterion. Note: There are many kinds of learning: learn different things, different “goodness” criteria…… We are only going to discuss the easiest kind of learning. CSE 391 2005

Training data All binary variables Observe T examples Assume examples are identically, independently distributed (i.i.d.) from distribution Task: how to learn the best CPTs from the T examples? Example t X1, X2, ……, Xn 1 0, 0, 0, …., 0 2 0, 1, 0, …., 0 3 4 0, 1, 1, …., 0 5 1, 1, 1, ...., 1 … …… T 0, 1, 0, …., 1 CSE 391 2005

Think of learning as… One person used a DAG and a set of CPTs to generate the training data. You are given the DAG and the training data, and you are asked to guess what CPTs are most likely to be used by this guy to generate the training data. t X1, X2, ……, Xn 1 0, 0, 0, …., 0 2 0, 1, 0, …., 0 3 4 0, 1, 1, …., 0 5 1, 1, 1, ...., 1 … …… T 0, 1, 0, …., 1 CSE 391 2005

Given CPTs Probability of the t-th example --- e.g. for alarm example If the t-th example is Then CSE 391 2005

Given CPTs Probability of the t-th example Probability of all the data (i.i.d.) CSE 391 2005

Given CPTs Probability of the t-th example Probability of all the data (i.i.d.) Log-likelihood of data CSE 391 2005

Given CPTs Probability of the t-th example Probability of all the data (i.i.d.) Log-likelihood of data Log-likelihood of data is a function of CPTs. Which CPTs are the best? CSE 391 2005

Maximum-likelihood learning Log-likelihood of data is a function of CPTs: So, the goodness criterion of CPTs is the log-likelihood of the data . The best CPTs are the CPTs which can maximize the log-likelihood: CSE 391 2005

Maximum-likelihood learning Mathematical formulation subject to constraints: probabilities sum up to 1 Constrained optimization with equality constraints Lagrange multiplier (which you’ve probably seen in your Calculus class) You can solve it yourself. It’s not hard at all. Very common technique in machine learning CSE 391 2005

ML solution Nicely, we can have closed-form solution for the constrained optimization problem. Example t X1, X2, ……, Xn 1 0, 0, 0, …., 0 2 0, 1, 0, …., 0 3 4 0, 1, 1, …., 0 5 1, 1, 1, ...., 1 … …… T 0, 1, 0, …., 1 CSE 391 2005

ML solution Nicely, we can have closed-form solution for the constrained optimization problem. Example t X1, X2, ……, Xn 1 0, 0, 0, …., 0 2 0, 1, 0, …., 0 3 4 0, 1, 1, …., 0 5 1, 1, 1, ...., 1 … …… T 0, 1, 0, …., 1 CSE 391 2005

ML solution Nicely, we can have closed-form solution for the constrained optimization problem. Example t X1, X2, ……, Xn 1 0, 0, 0, …., 0 2 0, 1, 0, …., 0 3 4 0, 1, 1, …., 0 5 1, 1, 1, ...., 1 … …… T 0, 1, 0, …., 1 And, the solution is very intuitive! CSE 391 2005

ML learning example Three binary variables X,Y,Z T = 1000 examples in training data X, Y, Z count 0, 0, 0 230 0, 0, 1 100 0, 1, 0 70 0, 1, 1 50 1, 0, 0 110 1, 0, 1 150 1, 1, 0 160 1, 1, 1 130 Example t X, Y, Z 1 0, 0, 0 2 0, 1, 0 3 4 1, 0, 1 5 1, 1, 0 … …… 1000  CSE 391 2005

ML learning example Three binary variables X,Y,Z T = 1000 examples in training data X, Y, Z count 0, 0, 0 230 0, 0, 1 100 0, 1, 0 70 0, 1, 1 50 1, 0, 0 110 1, 0, 1 150 1, 1, 0 160 1, 1, 1 130 Example t X, Y, Z 1 0, 0, 0 2 0, 1, 0 3 4 1, 0, 1 5 1, 1, 0 … …… 1000  CSE 391 2005

Outline Motivation BN=DAG+CPTs Inference Learning Application of BNs CSE 391 2005

Naïve Bayes Classifier Represent an object with attributes Class label Joint probability Learn CPTs from training data , Classify a new object CSE 391 2005

Naïve Bayes Classifier Inference Bayes rule marginalization C.I. CSE 391 2005

Medical Diagnosis The QMR-DT model (Shwe et al. 1991) Learning Prior probability of each disease Conditional probability of each finding given its parents Inference Given the findings of some patient, which is/are the most probable disease(s) causing these findings? CSE 391 2005

Hidden Markov Model Sequence / time-series model Q: states Speech recognition Observations: utterance/waveform States: words Gene finding Observations: genomic sequence States: gene/no-gene, different components of gene Q: states Y: observations CSE 391 2005

Applying BN to real-world problem Involves the following steps: Domain experts (or computer scientists if the problem is not very hard) specify causal relations among the random variables, then we can draw the DAG Collect training data from real-world Learn the Maximum-likelihood CPTs from the training data Infer the queries we are interested in CSE 391 2005

Summary BN=DAG+CPTs Inference Learning Compact representation of Joint probability Inference Conditional independence Probability rules Learning Maximum-likelihood solution CSE 391 2005