Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS188: Computational Models of Human Behavior

Similar presentations


Presentation on theme: "CS188: Computational Models of Human Behavior"— Presentation transcript:

1 CS188: Computational Models of Human Behavior
Introduction to graphical models slide Credits: Kevin Murphy, mark pashkin, zoubin ghahramani and jeff bilmes

2 Reasoning under uncertainty
In many settings, we need to understand what is going on in a system when we have imperfect or incomplete information For example, we might deploy a burglar alarm to detect intruders But the sensor could be triggered by other events, e.g., earth-quake Probabilities quantify the uncertainties regarding the occurrence of events

3 Probability spaces A probability space represents our uncertainty regarding an experiment It has two parts: A sample space , which is the set of outcomes the probability measure P, which is a real function of the subsets of  A set of outcomes A is called an event. P(A) represents how likely it is that the experiment’s actual outcome be a member of A

4 An example If our experiment is to deploy a burglar alarm and see if it works, then there could be four outcomes:  = {(alarm, intruder), (no alarm, intruder), (alarm, no intruder), (no alarm, no intruder)} Our choice of P has to obey these simple rules …

5 The three axioms of probability theory
P(A)≥0 for all events A P()=1 P(A U B) = P(A) + P(B) for disjoint events A and B

6 Some consequences of the axioms

7 Example Let’s assign a probability to each outcome ω
These probabilities must be non-negative and sum to one intruder no intruder alarm 0.002 0.003 no alarm 0.001 0.994

8 Conditional Probability

9 Marginal probability Marginal probability is then the unconditional probability P(A) of the event A; that is, the probability of A, regardless of whether event B did or did not occur. For example, if there are two possible outcomes corresponding to events B and B', this means that P(A) = P(AB) + P(AB’) This is called marginalization

10 Example If P is defined by then
P({(intruder, alarm)|(intruder, alarm),(no intruder, alarm)}) intruder no intruder alarm 0.002 0.003 no alarm 0.001 0.994

11 The product rule The probability that A and B both happen is the probability that A happens and B happens, given A has occurred

12 The chain rule Applying the product rule repeatedly:
P(A1,A2,…,Ak) = P(A1) P(A2|A1)P(A3|A2,A1)…P(Ak|Ak-1,…,A1) Where P(A3|A2,A1) = P(A3|A2A1)

13 Bayes’ rule Use the product rule both ways with P(AB)
P(A B) = P(A)P(B|A) P(A B) = P(B)P(A|B)

14 Random variables and densities

15 Inference One of the central problems of computational probability theory Many problems can be formulated in these terms. Examples: The probability that there is an intruder given the alarm went off is pI|A(true, true) Inference requires manipulating densities

16 Probabilistic graphical models
Combination of graph theory and probability theory Graph structure specifies which parts of the system are directly dependent Local functions at each node specify how different parts interaction Bayesian Networks = Probabilistic Graphical Models based on directed acyclic graph Markov Networks = Probabilistic Graphical Models based on undirected graph

17 Some broad questions

18 Bayesian Networks Nodes are random variables
Edges represent dependence – no directed cycles allowed) P(X1:N) = P(X1)P(X2|X1)P(X3|X1,X2) = P(Xi|X1:i-1) = P(Xi|Xi) x2 x3 x5 x4 x7 x6 x1

19 Example Water sprinkler Bayes net
P(C,S,R,W)=P(C)P(S|C)P(R|C,S)P(W|C,S,R) chain rule =P(C)P(S|C)P(R|C)P(W|C,S,R) since R  S|C =P(C)P(S|C)P(R|C)P(W|S,R) since W  C|R,S

20 Inference

21 Naïve inference

22 Problem with naïve representation of the joint probability
Problems with the working with the joint probability Representation: big table of numbers is hard to understand Inference: computing a marginal P(Xi) takes O(2N) time Learning: there are O(2N) parameters to estimate Graphical models solve the above problems by providing a structured representation for the joint Graphs encode conditional independence properties and represent families of probability distribution that satisfy these properties

23 Bayesian networks provide a compact representation of the joint probability

24 Conditional probabilities

25 Another example: medical diagnosis (classification)

26 Approach: build a Bayes’ net and use Bayes’s rule to get class probability

27 A very simple Bayes’ net: Naïve Bayes

28 Naïve Bayes classifier for medical diagnosis

29 Another commonly used Bayes’ net: Hidden Markov Model (HMM)

30 Conditional independence properties of Bayesian networks: chains

31 Conditional independence properties of Bayesian networks: common cause

32 Conditional independence properties of Bayesian networks: explaining away

33 Global Markov properties of DAGs

34 Bayes ball algorithm

35 Example

36 Undirected graphical models

37 Parameterization

38 Clique potentials

39 Interpretation of clique potentials

40 Examples

41 Joint distribution of an undirected graphical model
Complexity scales exponentially as 2n for binary random variable if we use a naïve approach to computing the partition function

42 Max clique vs. sub-clique

43 Log-linear models

44 Log-linear models

45 Log-linear models

46 Summary

47 Summary

48 From directed to undirected graphs

49 From directed to undirected graphs

50 Example of moralization

51 Comparing directed and undirected models

52 Expressive power w x y z x y z

53 Coming back to inference

54 Coming back to inference

55 Belief propagation in trees

56 Belief propagation in trees

57 Belief propagation in trees

58 Belief propagation in trees

59 Belief propagation in trees

60 Belief propagation in trees

61 Belief propagation in trees

62 Belief propagation in trees

63 Learning

64 Parameter Estimation

65 Parameter Estimation

66 Maximum-likelihood Estimation (MLE)

67 Example: 1-D Gaussian

68 MLE for Bayes’ Net

69 MLE for Bayes’ Net

70 MLE for Bayes’ Net with Discrete Nodes

71 Parameter Estimation with Hidden Nodes
Z Z Z Z Z Z Z6

72 Why is learning harder?

73 Where do hidden variables come from?

74 Parameter Estimation with Hidden Nodes
z z

75 EM

76 Different Learning Conditions
Structure Observability Full Partial Known Closed form search EM Unknown Local search Structural EM


Download ppt "CS188: Computational Models of Human Behavior"

Similar presentations


Ads by Google