Download presentation
Presentation is loading. Please wait.
1
主講人:虞台文 大同大學資工所 智慧型多媒體研究室
Bayesian Networks 主講人:虞台文 大同大學資工所 智慧型多媒體研究室
2
Contents Introduction Probability Theory Skip Inference
Clique Tree Propagation Building the Clique Tree Inference by Propagation
3
Introduction 大同大學資工所 智慧型多媒體研究室
Bayesian Networks Introduction 大同大學資工所 智慧型多媒體研究室
4
What is Bayesian Networks?
Bayesian Networks are directed acyclic graphs (DAGs) with an associated set of probability tables. The nodes are random variables. Certain independence relations can be induced by the topology of the graph.
5
Why Use a Bayesian Network?
Deal with uncertainty in inference via probability Bayes. Handle incomplete data set, e.g., classification, regression. Model the domain knowledge, e.g., causal relationships.
6
Example Use a DAG to model the causality. Train Strike Norman
Oversleep Martin Oversleep Martin Late Norman Late Boss Failure-in-Love Project Delay Office Dirty Boss Angry
7
Example Attach prior probabilities to all root nodes Martin oversleep
Probability T 0.01 F 0.99 Train Strike Probability T 0.1 F 0.9 Norman oversleep Probability T 0.2 F 0.8 Train Strike Martin Late Norman Project Delay Office Dirty Boss Angry Failure-in-Love Oversleep Boss failure-in-love Probability T 0.01 F 0.99
8
Example Attach prior probabilities to non-root nodes
Each column is summed to 1. Train Strike Martin Late Norman Project Delay Office Dirty Boss Angry Failure-in-Love Oversleep Norman untidy Train strike T F Martin oversleep Martin Late 0.95 0.8 0.7 0.05 0.2 0.3 Norman oversleep T F Norman untidy 0.6 0.2 0.4 0.8
9
Example Attach prior probabilities to non-root nodes
Boss Failure-in-love T F Project Delay Office Dirty Boss Angry very 0.98 0.85 0.6 0.5 0.3 0.2 0.01 mid 0.02 0.15 0.25 little 0.1 0.7 0.07 no 0.9 Each column is summed to 1. Train Strike Martin Late Norman Project Delay Office Dirty Boss Angry Failure-in-Love Oversleep Norman untidy What is the difference between probability & fuzzy measurements?
10
Medical Knowledge Example
11
Definition of Bayesian Networks
A Bayesian network is a directed acyclic graph with the following properties: Each node represents a random variable. Each node representing a variable A with parent nodes representing variables B1, B2,..., Bn is assigned a conditional probability table (CPT):
12
Problems Bad news: All of them are NP-Hard How to inference?
How to learn the probabilities from data? How to learn the structure from data? What applications we may have? Bad news: All of them are NP-Hard
13
Inference 大同大學資工所 智慧型多媒體研究室
Bayesian Networks Inference 大同大學資工所 智慧型多媒體研究室
14
Inference
15
Example Questions: P (“Martin Late”, “Norman Late”, “Train Strike”)=?
Probability T 0.1 F 0.9 Train Strike Martin Late Norman Train Strike T F Martin Late 0.6 0.5 0.4 Train Strike T F Norman Late 0.8 0.1 0.2 0.9 Questions: P (“Martin Late”, “Norman Late”, “Train Strike”)=? Joint distribution P(“Martin Late”)=? Marginal distribution P(“Matrin Late” | “Norman Late ”)=? Conditional distribution
16
Example C A B Questions: e.g., Demo
Probability T 0.048 F 0.032 0.012 0.008 0.045 0.405 Example Demo Train Strike Probability T 0.1 F 0.9 C Train Strike Martin Late Norman Train Strike T F Martin Late 0.6 0.5 0.4 Train Strike T F Norman Late 0.8 0.1 0.2 0.9 A B Questions: P (“Martin Late”, “Norman Late”, “Train Strike”)=? Joint distribution e.g.,
17
Example C A B Questions: e.g., Demo P (“Martin Late”, “Norman Late”)=?
Probability T 0.048 F 0.032 0.012 0.008 0.045 0.405 A B Probability T 0.093 F 0.077 0.417 0.413 Example Demo Train Strike Probability T 0.1 F 0.9 C Train Strike Martin Late Norman Train Strike T F Martin Late 0.6 0.5 0.4 Train Strike T F Norman Late 0.8 0.1 0.2 0.9 A B Questions: P (“Martin Late”, “Norman Late”)=? Marginal distribution e.g.,
18
Example C A B Questions: e.g., Demo P (“Martin Late”)=?
Probability T 0.048 F 0.032 0.012 0.008 0.045 0.405 A B Probability T 0.093 F 0.077 0.417 0.413 Example Train Strike Probability T 0.1 F 0.9 C Train Strike Martin Late Norman Train Strike T F Martin Late 0.6 0.5 0.4 Train Strike T F Norman Late 0.8 0.1 0.2 0.9 A B A Probability T 0.51 F 0.49 Demo Questions: P (“Martin Late”)=? Marginal distribution e.g.,
19
Example C A B Questions: e.g., P (“Martin Late” | “Norman Late” )=?
Probability T 0.048 F 0.032 0.012 0.008 0.045 0.405 A B Probability T 0.093 F 0.077 0.417 0.413 Example Train Strike Probability T 0.1 F 0.9 C Train Strike Martin Late Norman Train Strike T F Martin Late 0.6 0.5 0.4 Train Strike T F Norman Late 0.8 0.1 0.2 0.9 A B A Probability T 0.51 F 0.49 B Probability T 0.17 F 0.83 Questions: P (“Martin Late” | “Norman Late” )=? Conditional distribution e.g., Demo
20
Inference Methods Exact Algorithms: Approximation Algorithms
Probability propagation Variable elimination Cutset Conditioning Dynamic Programming Approximation Algorithms Variational methods Sampling (Monte Carlo) methods Loopy belief propagation Bounded cutset conditioning Parametric approximation methods
21
Independence Assertions
The given terms are called evidences. Independence Assertions Bayesian Networks have build-in independent assertions. An independence assertion is a statement of the form X and Y are independent given Z We called that X and Y are d-separated by Z. That is, or
22
d-Separation Y1 Y2 Y3 Y4 Z X1 W1 X2 X3 W2
23
Type of Connections Yi – Z – Xj Y1/2 – Z – Y3/4 Y3 – Z – Y4
Serial Connections Y1 Y2 Y3 Y4 Yi – Z – Xj Converge Connections Y1/2 – Z – Y3/4 Z X1 W1 X2 X3 W2 Y3 – Z – Y4 Diverge Connections Xi – Z – Xj
24
d-Separation Serial Converge Diverge Z X Y Z X Y Z X Y
25
Joint Distribution By chain rule By independence assertions
JPT: Joint probability table CPT: Conditional probability table Joint Distribution With this, we can compute all probabilities X8 X6 X1 X4 X2 X3 X10 X7 X5 X9 X11 By chain rule By independence assertions Parents of Xi Consider binary random variables: To store JPT of all r.v’s : 2n 1 table entries To store CPT of all r.v’s: ? table entries
26
Joint Distribution Consider binary random variables: X8 X6 X1 X4 X2 X3
To store JPT of all r.v’s : 2n 1 table entries To store CPT of all r.v’s: ? table entries
27
Joint Distribution To store JPT of all random variables:
X8 X6 X1 X4 X2 X3 X10 X7 X5 X9 X11 1 2 8 4 To store CPT of all random variables:
28
More on d-Separation A path from X to Y is d-connecting w.r.t evidence nodes E is every interior nodes N in the path has the property that either X Y It is linear or diverge and not a member of E; or It is converging, and either N or one of its descendants is in E. E
29
Identify the d-connecting and non-d-connecting paths from X to Y.
More on d-Separation A path from X to Y is d-connecting w.r.t evidence nodes E is every interior nodes N in the path has the property that either X Y It is linear or diverge and not a member of E; or It is converging, and either N or one of its descendants is in E. E
30
More on d-Separation Two nodes are d-separated if there is no d-connecting path between them. X Y E Exercise: Withdraw minimum number of edges such that X and Y are d-separated.
31
X E Y More on d-Separation Two set of nodes, say,
X={X1, …, Xm} and Y={Y1, …, Yn} are d-separated w.r.t. evidence nodes E if any pair of Xi and Yj are d-separated w.r.t. E. X Y E In this case, we have
32
Clique Tree Propagation 大同大學資工所 智慧型多媒體研究室
Bayesian Networks Clique Tree Propagation 大同大學資工所 智慧型多媒體研究室
33
References Developed by Lauritzen and Spiegelhalter and refined by Jensen et al. Lauritzen, S. L., and Spiegelhalter, D. J., Local computations with probabilities on graphical structures and their application to expert systems, J. Roy. Stat. Soc. B, 50, , 1988. Jensen, F. V., Lauritzen, S. L., and Olesen, K. G., Bayesian updating in causal probabilistic networks by local computations, Comp. Stat. Quart., 4, , 1990. Shenoy, P., and Shafer, G., Axioms for probability and belief-function propagation, in Uncertainty and Articial Intelligence, Vol. 4 (R. D. Shachter, T. Levitt, J. F. Lemmer and L. N. Kanal, Eds.), Elsevier, North-Holland, Amsterdam, , 1990.
34
Clique Tree Propagation (CTP)
Given a Bayesian Network, build a secondary structure, called clique tree. An undirected tree Inference by propagation the belief potential among tree nodes. It is an exact algorithm.
35
Notations Item Notation Examples Random variables uninitiated
uppercase A, B, C initiated lowercase a, b, c Random vectors Boldface uppercase X, Y, Z Boldface lowercase x, y, z
36
Definition: Family of a Node
The family of a node V, denoted as FV, is defined by: Examples: A B C D E F G H
37
Potential and Distributions
We will model the probability tables as potential functions. Potential and Distributions a P(a) on 0.5 off Function of a. Prior probability All of these tables map a set of random variables to a real value. A B C D E F G H b a on off P(b | a) 0.7 0.2 0.3 0.8 Conditional probability Conditional probability f d on off e P(f | de) 0.95 0.8 0.7 0.05 0.2 0.3 Function of a and b. Function of d, e and f.
38
Potential 1. Marginalization: 2. Multiplication:
Used to implement matrices or tables. Two operations: 1. Marginalization: 2. Multiplication:
39
Marginalization Example: ABC AB A A B C A B A T 0.048 F 0.032 0.012
0.008 0.045 0.405 Marginalization Example: A B AB T 0.093 F 0.077 0.417 0.413 A A T 0.51 F 0.49
40
x and y are consistent with z.
B C ABC T 0.093 0.08= F 0.077 0.08= 0.417 0.02= 0.413 0.02= 0.093 0.09= 0.077 0.09= 0.417 0.91= 0.413 0.91= Multiplication Not necessary sum to one. x and y are consistent with z. Example: A B AB T 0.093 F 0.077 0.417 0.413 B C AB T 0.08 F 0.02 0.09 0.91
41
The Secondary Structure
Given a Bayesian Network over a set of variables U = {V1, …, Vn} , its secondary structure contains a graphical and a numerical component. Graphic Component: An undirected clique tree: satisfies the join tree property. Numerical Component: Belief potentials on nodes and edges.
42
The Clique Tree T How to build a clique tree?
The clique tree T for a belief network over a set of variables U = {V1, …, Vn} satisfies the following properties. A B C D E F G H Each node in T is a cluster or clique (nonempty set) of variables. The clusters satisfy the join tree property: Given two clusters X and Y in T, all clusters on the path between X and Y contain XY. For each variable VU, FV is included in at least one of the cluster. Sepsets: Each edge in T is labeled with the intersection of the adjacent clusters. ABD ADE ACE CEG DEF EGH AD AE CE DE EG
43
The Numeric Component How to assign belief functions?
Clusters and sepsets are attached with belief functions. For each cluster X and neighboring sepset S, it holds that It also holds that Local Consistency ABD ADE ACE CEG DEF EGH AD AE CE DE EG Global Consistency
44
The Numeric Component How to assign belief functions?
Clusters and sepsets are attached with belief functions. The key step to satisfy these constraints by letting and If so, ABD ADE ACE CEG DEF EGH AD AE CE DE EG
45
Building the Clique Tree 大同大學資工所 智慧型多媒體研究室
Bayesian Networks Building the Clique Tree 大同大學資工所 智慧型多媒體研究室
46
The Steps Belief Network Moral Graph Triangulated Graph Clique Set
Join Tree
47
Moral Graph Belief Network Moral Graph A B C D E F G H A B C D E F G H
Triangulated Graph Clique Set Join Tree Belief Network Moral Graph Convert the directed graph to undirected. Connect each pair of parent nodes for each node.
48
Triangulation Triangulated Graph Moral Graph There are many ways.
This step is, in fact, done by incorporating with the next step. Triangulation A B C D E F G H A B C D E F G H Belief Network Moral Graph Triangulated Graph Clique Set Join Tree Triangulated Graph Moral Graph Triangulate the cycles with length more than 4 There are many ways.
49
Select Clique Set A B C D E F G H A B C D E F G H Belief Network
Moral Graph Triangulated Graph Clique Set Join Tree Copy GM to GM’. While GM’ is not empty select a node V from GM’, according to a criterion. Node V and its neighbor form a cluster. Connect all the nodes in the cluster. For each edge added to GM’, add the same edge to GM. Remove V from GM’.
50
Select Clique Set Criterion:
The weight of a node V is the number of values of V. The weight of a cluster is the product of it constituent nodes. Choose the node that causes the least number of edges to be added. Breaking ties by choosing the node that induces the cluster with the smallest weight. Select Clique Set A B C D E F G H A B C D E F G H Belief Network Moral Graph Triangulated Graph Clique Set Join Tree Copy GM to GM’. While GM’ is not empty select a node V from GM’, according to a criterion. Node V and its neighbor form a cluster. Connect all the nodes in the cluster. For each edge added to GM’, add the same edge to GM. Remove V from GM’.
51
Select Clique Set Criterion:
The weight of a node V is the number of values of V. The weight of a cluster is the product of it constituent nodes. Choose the node that causes the least number of edges to be added. Breaking ties by choosing the node that induces the cluster with the smallest weight. Select Clique Set A B C D E F G H A B C D E F G H Belief Network Moral Graph Triangulated Graph Clique Set Join Tree Eliminated Vertex Induced Cluster Edges Added A B C D E F G H H EGH none G CEG none F DEF none C ACE {A, E} B ABD {A, D} D ADE none E AE none A none
52
Building an Optimal Join Tree
We need to find minimal number of edges to connect these cliques, i.e. to build a tree. Belief Network Moral Graph Triangulated Graph Clique Set Join Tree Given n nodes to build a tree, n1 edges are required. Eliminated Vertex Induced Cluster Edges Added H EGH none G CEG none There are many ways. F DEF none C ACE {A, E} How to achieve optimality? B ABD {A, D} D ADE none E AE none A none
53
Building an Optimal Join Tree
Begin with a set of n trees, each consisting of a single clique, and an empty set S. For each distinct pair of cliques X and Y: Create a candidate sepset SXY= XY, with backpointers to X and Y. Insert SXY to S. Repeat until n1 sepsets have been inserted into the forest. Select a sepset SXY from S, according to the criterion described in the next slide. Delete SXY from S. Insert SXY between cliques X and Y only if X and Y are on different trees in the forest. Belief Network Moral Graph Triangulated Graph Clique Set Join Tree
54
Building an Optimal Join Tree
Criterion: The mass of SXY is the number of nodes in XY. The cost of SXY is the weight X plus the weight Y. The weight of a node V is the number of values of V. The weight of a set of nodes X is the product of it constituent nodes in X. Choose the sepset with causes the largest mass. Breaking ties by choosing the sepset with the smallest cost. Begin with a set of n trees, each consisting of a single clique, and an empty set S. For each distinct pair of cliques X and Y: Create a candidate sepset SXY= XY, with backpointers to X and Y. Insert SXY to S. Repeat until n1 sepsets have been inserted into the forest. Select a sepset SXY from S, according to the criterion described in the next slide. Delete SXY from S. Insert SXY between cliques X and Y only if X and Y are on different trees in the forest. Belief Network Moral Graph Triangulated Graph Clique Set Join Tree
55
Building an Optimal Join Tree
C D E F G H Belief Network Moral Graph Triangulated Graph Clique Set Join Tree Graphical Transformation ABD ADE ACE CEG DEF EGH AD AE CE DE EG
56
Inference by Propagation 大同大學資工所 智慧型多媒體研究室
Bayesian Networks Inference by Propagation 大同大學資工所 智慧型多媒體研究室
57
Inferences PPTC: Probability Propagation in Tree of Cliques.
Inference without evidence Inference with evidence PPTC: Probability Propagation in Tree of Cliques.
58
Inference without Evidence
Train Strike Martin Late Norman Project Delay Office Dirty Boss Angry Failure-in-Love Oversleep Demo
59
Procedure for PPTC without Evidence
Belief Network Building Graphic Component Graphical Transformation Join Tree Structure Building Numeric Component Initialization Inconsistent Join Tree Propagation Consistent Join Tree Marginalization
60
Initialization For each cluster and sepset X, set each X(x) to 1:
For each variable V: Assign to V a cluster X that contains FV; call X the parent cluster of FV. Multiply X(x) by P(V | V). A B C D E F G H ABD ADE ACE CEG DEF EGH AD AE CE DE EG
61
Initialization c a on off 0.7 0.2 0.3 0.8 P(c | a) e c on off 0.3 0.6
0.4 P(e | c) Initialization a c e Initial Values on off 1 A B C D E F G H 0.7 0.3 0.2 0.8 0.3 0.7 0.6 0.4 = 0.21 = 0.49 = 0.18 = 0.12 = 0.06 = 0.14 = 0.48 = 0.32 c e Initial Values on off 1 ABD ADE ACE CEG DEF EGH AD AE CE DE EG
62
By independence assertions
N : # clusters Q : # variables By independence assertions Initialization a c e Initial Values on off 1 A B C D E F G H 0.7 0.3 0.2 0.8 0.3 0.7 0.6 0.4 = 0.21 = 0.49 = 0.18 = 0.12 = 0.06 = 0.14 = 0.48 = 0.32 c e Initial Values on off 1 ABD ADE ACE CEG DEF EGH AD AE CE DE EG
63
By independence assertions
N : # clusters Q : # variables By independence assertions Initialization a c e Initial Values on off 1 A B C D E F G H 0.7 0.3 0.2 0.8 0.3 0.7 0.6 0.4 = 0.21 = 0.49 = 0.18 = 0.12 = 0.06 = 0.14 = 0.48 = 0.32 After initialization, global consistency is satisfied, but local consistency is not. c e Initial Values on off 1 ABD ADE ACE CEG DEF EGH AD AE CE DE EG
64
Global Propagation X Y It is used to achieve local consistency.
Let’s consider single message passing first. Message Passing Absorption on receiving cluster: X Y R Projection on sepset:
65
The Effect of Single Message Passing
Absorption on receiving cluster: X Y R Projection on sepset:
66
Global Propagation Choose an arbitrary cluster X.
Unmark all clusters. Call Ingoing-Propagation(X). Unmark all clusters. Call Outgoing-Propagation(X).
67
Global Propagation Ingoing-Propagation(X) Outgoing-Propagation(X)
Choose an arbitrary cluster X. Unmark all clusters. Call Ingoing-Propagation(X). Unmark all clusters. Call Outgoing-Propagation(X). Global Propagation Ingoing-Propagation(X) Mark X. Call Ingoing-Propagation recursively on X’s unmarked neighboring clusters, if any. Pass a message from X to the cluster which invoked Ingoing-Propagation(X). Outgoing-Propagation(X) Mark X. Pass a message from X to each of its unmarked clusters, if any. Call Outgoing-Propagation recursively on X’s unmarked neighboring clusters, if any. 1 3 5 ABD ADE ACE CEG DEF EGH AD AE CE DE EG After global propagation, The clique tree is both global and local consistent. 8 6 9 2 7 10 4
68
Marginalization ABD = Consistent Join Tree a b d ABD(abd) on off
.225 .025 .125 .180 .020 .150 ABD = a P (a) on off = .500 = .500 d P (d) on off = .680 = .320 ABD ADE ACE CEG DEF EGH AD AE CE DE EG Consistent Join Tree
69
Review: Procedure for PPTC without Evidence
Belief Network Building Graphic Component Graphical Transformation Join Tree Structure Building Numeric Component Initialization Inconsistent Join Tree Propagation Consistent Join Tree Marginalization
70
Inference with Evidence
Train Strike Martin Late Norman Project Delay Office Dirty Boss Angry Failure-in-Love Oversleep Demo
71
E = e Observations Observations are the simplest forms of evidences.
An observations is a statement of the form V = v. Collections of observations may be denoted by Observations are referred to as hard evidence. E = e An instantiation of a set of variable E.
72
Likelihoods Given E = e, the likelihood of V, denoted as V, is defined as:
73
Likelihoods A B C D E F G H V(v) V on off Variable v = on v = off A 1
D E F G H A B C D E F G H on off
74
Procedure for PPTC with Evidence
Belief Network Join Tree Structure Inconsistent Join Tree Consistent Join Tree Graphical Transformation Propagation Building Graphic Component Building Numeric Component Initialization Observation Entry Marginalization Normalization
75
Initialization with Observations
C D E F G H ABD ADE ACE CEG DEF EGH AD AE CE DE EG For each cluster and sepset X, set each X(x) to 1: For each variable V: Assign to V a cluster X that contains FV; call X the parent cluster of FV. Multiply X(x) by P(V | V). Set each likelihood element V(v) to 1:
76
Observation Entry Encode the observation V = v as:
Identify a cluster X that contains V. Update X and V: A B C D E F G H ABD ADE ACE CEG DEF EGH AD AE CE DE EG
77
Marginalization After global propagation, ABD ADE ACE CEG DEF EGH AD
AE CE DE EG
78
Normalization After global propagation, Normalization
79
Handling Dynamic Observations
How to handle the consistency if the observation is changed to e2? Suppose that the join tree now is consistent for e1.
80
Observation States e1 e2 Three observation states for a variable, say, V : No change Update Retraction V is unobserved observed V is observed unobserved or V = v1 V = v2 , v1 v2
81
Handling Dynamic Observations
Initialization Observation Entry Marginalization Normalization Belief Network Join Tree Structure Inconsistent Join Tree Consistent Join Tree Graphical Transformation Propagation Global Update Global Retraction When? When?
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.