Download presentation

Presentation is loading. Please wait.

1
**CS188: Computational Models of Human Behavior**

Introduction to graphical models slide Credits: Kevin Murphy, mark pashkin, zoubin ghahramani and jeff bilmes

2
**Reasoning under uncertainty**

In many settings, we need to understand what is going on in a system when we have imperfect or incomplete information For example, we might deploy a burglar alarm to detect intruders But the sensor could be triggered by other events, e.g., earth-quake Probabilities quantify the uncertainties regarding the occurrence of events

3
Probability spaces A probability space represents our uncertainty regarding an experiment It has two parts: A sample space , which is the set of outcomes the probability measure P, which is a real function of the subsets of A set of outcomes A is called an event. P(A) represents how likely it is that the experiment’s actual outcome be a member of A

4
An example If our experiment is to deploy a burglar alarm and see if it works, then there could be four outcomes: = {(alarm, intruder), (no alarm, intruder), (alarm, no intruder), (no alarm, no intruder)} Our choice of P has to obey these simple rules …

5
**The three axioms of probability theory**

P(A)≥0 for all events A P()=1 P(A U B) = P(A) + P(B) for disjoint events A and B

6
**Some consequences of the axioms**

7
**Example Let’s assign a probability to each outcome ω**

These probabilities must be non-negative and sum to one intruder no intruder alarm 0.002 0.003 no alarm 0.001 0.994

8
**Conditional Probability**

9
Marginal probability Marginal probability is then the unconditional probability P(A) of the event A; that is, the probability of A, regardless of whether event B did or did not occur. For example, if there are two possible outcomes corresponding to events B and B', this means that P(A) = P(AB) + P(AB’) This is called marginalization

10
**Example If P is defined by then**

P({(intruder, alarm)|(intruder, alarm),(no intruder, alarm)}) intruder no intruder alarm 0.002 0.003 no alarm 0.001 0.994

11
The product rule The probability that A and B both happen is the probability that A happens and B happens, given A has occurred

12
**The chain rule Applying the product rule repeatedly:**

P(A1,A2,…,Ak) = P(A1) P(A2|A1)P(A3|A2,A1)…P(Ak|Ak-1,…,A1) Where P(A3|A2,A1) = P(A3|A2A1)

13
**Bayes’ rule Use the product rule both ways with P(AB)**

P(A B) = P(A)P(B|A) P(A B) = P(B)P(A|B)

14
**Random variables and densities**

15
Inference One of the central problems of computational probability theory Many problems can be formulated in these terms. Examples: The probability that there is an intruder given the alarm went off is pI|A(true, true) Inference requires manipulating densities

16
**Probabilistic graphical models**

Combination of graph theory and probability theory Graph structure specifies which parts of the system are directly dependent Local functions at each node specify how different parts interaction Bayesian Networks = Probabilistic Graphical Models based on directed acyclic graph Markov Networks = Probabilistic Graphical Models based on undirected graph

17
Some broad questions

18
**Bayesian Networks Nodes are random variables**

Edges represent dependence – no directed cycles allowed) P(X1:N) = P(X1)P(X2|X1)P(X3|X1,X2) = P(Xi|X1:i-1) = P(Xi|Xi) x2 x3 x5 x4 x7 x6 x1

19
**Example Water sprinkler Bayes net**

P(C,S,R,W)=P(C)P(S|C)P(R|C,S)P(W|C,S,R) chain rule =P(C)P(S|C)P(R|C)P(W|C,S,R) since R S|C =P(C)P(S|C)P(R|C)P(W|S,R) since W C|R,S

20
Inference

21
Naïve inference

22
**Problem with naïve representation of the joint probability**

Problems with the working with the joint probability Representation: big table of numbers is hard to understand Inference: computing a marginal P(Xi) takes O(2N) time Learning: there are O(2N) parameters to estimate Graphical models solve the above problems by providing a structured representation for the joint Graphs encode conditional independence properties and represent families of probability distribution that satisfy these properties

23
**Bayesian networks provide a compact representation of the joint probability**

24
**Conditional probabilities**

25
**Another example: medical diagnosis (classification)**

26
**Approach: build a Bayes’ net and use Bayes’s rule to get class probability**

27
**A very simple Bayes’ net: Naïve Bayes**

28
**Naïve Bayes classifier for medical diagnosis**

29
**Another commonly used Bayes’ net: Hidden Markov Model (HMM)**

30
**Conditional independence properties of Bayesian networks: chains**

31
**Conditional independence properties of Bayesian networks: common cause**

32
**Conditional independence properties of Bayesian networks: explaining away**

33
**Global Markov properties of DAGs**

34
Bayes ball algorithm

35
Example

36
**Undirected graphical models**

37
Parameterization

38
Clique potentials

39
**Interpretation of clique potentials**

40
Examples

41
**Joint distribution of an undirected graphical model**

Complexity scales exponentially as 2n for binary random variable if we use a naïve approach to computing the partition function

42
**Max clique vs. sub-clique**

43
Log-linear models

44
Log-linear models

45
Log-linear models

46
Summary

47
Summary

48
**From directed to undirected graphs**

49
**From directed to undirected graphs**

50
**Example of moralization**

51
**Comparing directed and undirected models**

52
Expressive power w x y z x y z

53
**Coming back to inference**

54
**Coming back to inference**

55
**Belief propagation in trees**

56
**Belief propagation in trees**

57
**Belief propagation in trees**

58
**Belief propagation in trees**

59
**Belief propagation in trees**

60
**Belief propagation in trees**

61
**Belief propagation in trees**

62
**Belief propagation in trees**

63
Learning

64
Parameter Estimation

65
Parameter Estimation

66
**Maximum-likelihood Estimation (MLE)**

67
Example: 1-D Gaussian

68
MLE for Bayes’ Net

69
MLE for Bayes’ Net

70
**MLE for Bayes’ Net with Discrete Nodes**

71
**Parameter Estimation with Hidden Nodes**

Z Z Z Z Z Z Z6

72
Why is learning harder?

73
**Where do hidden variables come from?**

74
**Parameter Estimation with Hidden Nodes**

z z

75
EM

76
**Different Learning Conditions**

Structure Observability Full Partial Known Closed form search EM Unknown Local search Structural EM

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google