Presentation is loading. Please wait.

Presentation is loading. Please wait.

Qian Liu CSE spring University of Pennsylvania

Similar presentations


Presentation on theme: "Qian Liu CSE spring University of Pennsylvania"— Presentation transcript:

1 Qian Liu CSE 391 2005 spring University of Pennsylvania
Belief Networks Qian Liu CSE spring University of Pennsylvania CSE

2 Outline Motivation BN=DAG+CPTs Inference Learning Application of BNs
CSE

3 Outline Motivation BN=DAG+CPTs Inference Learning Application of BNs
CSE

4 From application What kind of application problems can be solved by Belief Networks? Classifier: classify , webpage… Medical diagnosis Trouble shooting system in MS Windows Bouncy paperclip guy in MS Word Speech recognition Gene finding aka: Bayesian Network, Causal Network, Directed Graphical Model…… CSE

5 From representation Problem: How to describe joint distribution of N random variables P(X1,X2,…,XN)? If we are to write the joint distribution in a table, how many entries are there in all in the table? X1, …, XN P 0, 0, ….., 0 0.008 0, 0, ….., 1 0.001 …… CSE

6 From representation Problem: How to describe joint distribution of N random variables P(X1,X2,…,XN)? If we are to write the joint distribution in a table, how many entries are there in all in the table? X1, …, XN P 0, 0, ….., 0 0.008 0, 0, ….., 1 0.001 …… # entries = 2 N CSE

7 From representation Problem: How to describe joint distribution of N random variables P(X1,X2,…,XN)? If we are to write the joint distribution in a table, how many entries are there in all in the table? X1, …, XN P 0, 0, ….., 0 0.008 0, 0, ….., 1 0.001 …… # entries = 2 N Too many! CSE

8 From representation Problem: How to describe joint distribution of N random variables P(X1,X2,…,XN)? If we are to write the joint distribution in a table, how many entries are there in all in the table? X1, …, XN P 0, 0, ….., 0 0.008 0, 0, ….., 1 0.001 …… # entries = 2 N Too many! Is there a more clever way of representing the joint distribution? CSE

9 From representation Problem: How to describe joint distribution of N random variables P(X1,X2,…,XN)? If we are to write the joint distribution in a table, how many entries are there in all in the table? X1, …, XN P 0, 0, ….., 0 0.008 0, 0, ….., 1 0.001 …… # entries = 2 N Too many! Is there a more clever way of representing the joint distribution? YES! ---- Belief Network CSE

10 Outline Motivation BN=DAG+CPTs Inference Learning Application of BNs
CSE

11 What is BN: BN=DAG+CPTs
Belief Network consists of a Directed Acyclic Graph, and Conditional Probability Tables. DAG Nodes: random variables Directed edges represent causal relations CPTs Each random variable has a CPT CPT specifies BN specifies a joint distribution on the variables: CSE

12 Alarm example Random variables: Burglary, Earthquake, Alarm, JohnCalls, MaryCalls Causal relationship among the variables: A burglary can set the alarm off An earthquake can set the alarm off The alarm can cause Mary to call The alarm can cause John to call Causal relations reflect domain knowledge. CSE

13 Alarm example (cont.) CSE

14 Joint distribution BN specifies a joint distribution on the variables
Alarm example Shorthand notation: CSE

15 Another example CPTs Joint probability
Convention to write joint probability: write each variable in a “cause  effects” order Belief Networks are generative models, which can generate data in a “cause  effects” order. CSE

16 Compactness Belief Network offers a simple and compact way of representing the joint distribution of many random variables. The number of entries in a full joint distribution table ~ But, for BN, suppose a variable has at most k parents, then the total number of entries in all the CPTs ~ In real practice, k << N. We save a lot of space! CSE

17 Outline Motivation BN=DAG+CPTs Inference Learning Application of BNs
CSE

18 Inference The task of inference is to compute the posterior probability of a set of query variables given a set of evidence variables, denoted as , given the Belief Network is known. Since the joint distribution is known, can be computed naively and inefficiently by product rule marginalization CSE

19 Conditional independence
A,B are (unconditionally) independent P(A,B) = P(A)P(B) (1) I. P(A|B) = P(A) (2) I. P(B|A) = P(B) (3) I. (1),(2),(3) are equivalent A,B are conditionally independent, given evidence E P(A,B|E) = P(A|E)P(B|E) (1) C.I. P(A|B,E) = P(A|E) (2) C.I. P(B|A,E) = P(B|E) (3) C.I. CSE

20 Conditional independence
A,B are (unconditionally) independent A,B are conditionally independent, given evidence E P(A,B) = P(A)P(B) (1) I. P(A,B|E) = P(A|E)P(B|E) (1) C.I. P(A|B) = P(A) (2) I. P(A|B,E) = P(A|E) (2) C.I. P(B|A) = P(B) (3) I. P(B|A,E) = P(B|E) (3) C.I. (1),(2),(3) are equivalent CSE

21 Conditional independence
Chain rule P(B,E,A,J,M) = P(B) * P(E|B) * P(A|B,E) * P(J|B,E,A) * P(M|B,E,A,J) BN P(B,E,A,J,M) = P(B) * P(E) B is I. of E. * P(A|B,E) * P(J|A) J is C.I. of B,E, given A * P(M|A) M is C.I. of B,E,J, given A Belief Networks explore conditional independence among variables so as to represent the joint distribution compactly. CSE

22 Conditional independence
Belief Network encodes conditional independence in the graph structure. (1) A variable is C.I. of its non-descendants, given all its parents. (2) A variable is C.I. of all the other variables, given all its parents, children and children’s parents ---- that is , given its Markov blanket. CSE

23 Conditional independence
Alarm example B is I. with E. or, B is C.I. with E, given nothing. (1) J is C.I. with B,E,M, given A. (1) M is C.I. with B,E,J, given A. (1) Another example U is C.I. with X, given Y,V,Z (2) CSE

24 Examples of inference Alarm example ---- We know: P(A) = ?
CPTs: P(B), P(E), P(A|B,E), P(J|A), P(M|A) Conditional independence P(A) = ? Marginalization Chain rule B, E are Ind. CSE

25 Examples of inference Alarm example ---- We know: P(J,M) = ?
CPTs: P(B), P(E), P(A|B,E), P(J|A), P(M|A) Conditional independence P(J,M) = ? marginalization chain rule J is C.I. of M, given A CSE

26 Outline Motivation BN=DAG+CPTs Inference Learning Applications of BNs
CSE

27 Learning The task of learning is to learn the Belief Network which can best describe the data we observe. Let’s assume the DAG is known, then the learning problem is simplified to learning the best CPTs from data, according to some “goodness” criterion. Note: There are many kinds of learning: learn different things, different “goodness” criteria…… We are only going to discuss the easiest kind of learning. CSE

28 Training data All binary variables Observe T examples
Assume examples are identically, independently distributed (i.i.d.) from distribution Task: how to learn the best CPTs from the T examples? Example t X1, X2, ……, Xn 1 0, 0, 0, …., 0 2 0, 1, 0, …., 0 3 4 0, 1, 1, …., 0 5 1, 1, 1, ...., 1 …… T 0, 1, 0, …., 1 CSE

29 Think of learning as… One person used a DAG and a set of CPTs to generate the training data. You are given the DAG and the training data, and you are asked to guess what CPTs are most likely to be used by this guy to generate the training data. t X1, X2, ……, Xn 1 0, 0, 0, …., 0 2 0, 1, 0, …., 0 3 4 0, 1, 1, …., 0 5 1, 1, 1, ...., 1 …… T 0, 1, 0, …., 1 CSE

30 Given CPTs Probability of the t-th example --- e.g. for alarm example
If the t-th example is Then CSE

31 Given CPTs Probability of the t-th example
Probability of all the data (i.i.d.) CSE

32 Given CPTs Probability of the t-th example
Probability of all the data (i.i.d.) Log-likelihood of data CSE

33 Given CPTs Probability of the t-th example
Probability of all the data (i.i.d.) Log-likelihood of data Log-likelihood of data is a function of CPTs. Which CPTs are the best? CSE

34 Maximum-likelihood learning
Log-likelihood of data is a function of CPTs: So, the goodness criterion of CPTs is the log-likelihood of the data . The best CPTs are the CPTs which can maximize the log-likelihood: CSE

35 Maximum-likelihood learning
Mathematical formulation subject to constraints: probabilities sum up to 1 Constrained optimization with equality constraints Lagrange multiplier (which you’ve probably seen in your Calculus class) You can solve it yourself. It’s not hard at all. Very common technique in machine learning CSE

36 ML solution Nicely, we can have closed-form solution for the constrained optimization problem. Example t X1, X2, ……, Xn 1 0, 0, 0, …., 0 2 0, 1, 0, …., 0 3 4 0, 1, 1, …., 0 5 1, 1, 1, ...., 1 …… T 0, 1, 0, …., 1 CSE

37 ML solution Nicely, we can have closed-form solution for the constrained optimization problem. Example t X1, X2, ……, Xn 1 0, 0, 0, …., 0 2 0, 1, 0, …., 0 3 4 0, 1, 1, …., 0 5 1, 1, 1, ...., 1 …… T 0, 1, 0, …., 1 CSE

38 ML solution Nicely, we can have closed-form solution for the constrained optimization problem. Example t X1, X2, ……, Xn 1 0, 0, 0, …., 0 2 0, 1, 0, …., 0 3 4 0, 1, 1, …., 0 5 1, 1, 1, ...., 1 …… T 0, 1, 0, …., 1 And, the solution is very intuitive! CSE

39 ML learning example Three binary variables X,Y,Z
T = 1000 examples in training data X, Y, Z count 0, 0, 0 230 0, 0, 1 100 0, 1, 0 70 0, 1, 1 50 1, 0, 0 110 1, 0, 1 150 1, 1, 0 160 1, 1, 1 130 Example t X, Y, Z 1 0, 0, 0 2 0, 1, 0 3 4 1, 0, 1 5 1, 1, 0 …… 1000 CSE

40 ML learning example Three binary variables X,Y,Z
T = 1000 examples in training data X, Y, Z count 0, 0, 0 230 0, 0, 1 100 0, 1, 0 70 0, 1, 1 50 1, 0, 0 110 1, 0, 1 150 1, 1, 0 160 1, 1, 1 130 Example t X, Y, Z 1 0, 0, 0 2 0, 1, 0 3 4 1, 0, 1 5 1, 1, 0 …… 1000 CSE

41 Outline Motivation BN=DAG+CPTs Inference Learning Application of BNs
CSE

42 Naïve Bayes Classifier
Represent an object with attributes Class label Joint probability Learn CPTs from training data , Classify a new object CSE

43 Naïve Bayes Classifier
Inference Bayes rule marginalization C.I. CSE

44 Medical Diagnosis The QMR-DT model (Shwe et al. 1991) Learning
Prior probability of each disease Conditional probability of each finding given its parents Inference Given the findings of some patient, which is/are the most probable disease(s) causing these findings? CSE

45 Hidden Markov Model Sequence / time-series model Q: states
Speech recognition Observations: utterance/waveform States: words Gene finding Observations: genomic sequence States: gene/no-gene, different components of gene Q: states Y: observations CSE

46 Applying BN to real-world problem
Involves the following steps: Domain experts (or computer scientists if the problem is not very hard) specify causal relations among the random variables, then we can draw the DAG Collect training data from real-world Learn the Maximum-likelihood CPTs from the training data Infer the queries we are interested in CSE

47 Summary BN=DAG+CPTs Inference Learning
Compact representation of Joint probability Inference Conditional independence Probability rules Learning Maximum-likelihood solution CSE


Download ppt "Qian Liu CSE spring University of Pennsylvania"

Similar presentations


Ads by Google