Download presentation

Presentation is loading. Please wait.

Published byDevan Hicken Modified over 2 years ago

1
INTRODUCTION TO GRAPHICAL MODELS SLIDE CREDITS: KEVIN MURPHY, MARK PASHKIN, ZOUBIN GHAHRAMANI AND JEFF BILMES CS188: Computational Models of Human Behavior

2
Reasoning under uncertainty In many settings, we need to understand what is going on in a system when we have imperfect or incomplete information For example, we might deploy a burglar alarm to detect intruders – But the sensor could be triggered by other events, e.g., earth-quake Probabilities quantify the uncertainties regarding the occurrence of events

3
Probability spaces A probability space represents our uncertainty regarding an experiment It has two parts: – A sample space, which is the set of outcomes – the probability measure P, which is a real function of the subsets of A set of outcomes A is called an event. P(A) represents how likely it is that the experiments actual outcome be a member of A

4
An example If our experiment is to deploy a burglar alarm and see if it works, then there could be four outcomes: = {(alarm, intruder), (no alarm, intruder), (alarm, no intruder), (no alarm, no intruder)} Our choice of P has to obey these simple rules …

5
The three axioms of probability theory P(A)0 for all events A P( )=1 P(A U B) = P(A) + P(B) for disjoint events A and B

6
Some consequences of the axioms

7
Example Lets assign a probability to each outcome ω These probabilities must be non-negative and sum to one intruderno intruder alarm no alarm

8
Conditional Probability

9
Marginal probability Marginal probability is then the unconditional probability P(A) of the event A; that is, the probability of A, regardless of whether event B did or did not occur. For example, if there are two possible outcomes corresponding to events B and B', this means that – P(A) = P(A B) + P(A B) This is called marginalization

10
Example If P is defined by then P({(intruder, alarm)|(intruder, alarm),(no intruder, alarm)}) intruderno intruder alarm no alarm

11
The product rule The probability that A and B both happen is the probability that A happens and B happens, given A has occurred

12
The chain rule Applying the product rule repeatedly: P(A 1,A 2,…,A k ) = P(A 1 ) P(A 2 |A 1 )P(A 3 |A 2,A 1 )…P(A k |A k-1,…,A 1 ) Where P(A 3 |A 2,A 1 ) = P(A 3 |A 2 A 1 )

13
Bayes rule Use the product rule both ways with P(A B) – P(A B) = P(A)P(B|A) – P(A B) = P(B)P(A|B)

14
Random variables and densities

15
Inference One of the central problems of computational probability theory Many problems can be formulated in these terms. Examples: – The probability that there is an intruder given the alarm went off is p I|A (true, true) Inference requires manipulating densities

16
Probabilistic graphical models Combination of graph theory and probability theory – Graph structure specifies which parts of the system are directly dependent – Local functions at each node specify how different parts interaction Bayesian Networks = Probabilistic Graphical Models based on directed acyclic graph Markov Networks = Probabilistic Graphical Models based on undirected graph

17
Some broad questions

18
Bayesian Networks Nodes are random variables Edges represent dependence – no directed cycles allowed) P(X 1:N ) = P(X 1 )P(X 2 |X 1 )P(X 3 |X 1,X 2 ) = P(X i |X 1:i-1 ) = P(X i |X i ) x1x1 x2x2 x3x3 x5x5 x4x4 x7x7 x6x6

19
Example Water sprinkler Bayes net P(C,S,R,W)=P(C)P(S|C)P(R|C,S)P(W|C,S,R) chain rule =P(C)P(S|C)P(R|C)P(W|C,S,R) since R S|C =P(C)P(S|C)P(R|C)P(W|S,R) since W C|R,S

20
Inference

21
Naïve inference

22
Problem with naïve representation of the joint probability Problems with the working with the joint probability – Representation: big table of numbers is hard to understand – Inference: computing a marginal P(X i ) takes O(2 N ) time – Learning: there are O(2 N ) parameters to estimate Graphical models solve the above problems by providing a structured representation for the joint Graphs encode conditional independence properties and represent families of probability distribution that satisfy these properties

23
Bayesian networks provide a compact representation of the joint probability

24
Conditional probabilities

25
Another example: medical diagnosis (classification)

26
Approach: build a Bayes net and use Bayess rule to get class probability

27
A very simple Bayes net: Naïve Bayes

28
Naïve Bayes classifier for medical diagnosis

29
Another commonly used Bayes net: Hidden Markov Model (HMM)

30
Conditional independence properties of Bayesian networks: chains

31
Conditional independence properties of Bayesian networks: common cause

32
Conditional independence properties of Bayesian networks: explaining away

33
Global Markov properties of DAGs

34
Bayes ball algorithm

35
Example

36
Undirected graphical models

37
Parameterization

38
Clique potentials

39
Interpretation of clique potentials

40
Examples

41
Joint distribution of an undirected graphical model Complexity scales exponentially as 2 n for binary random variable if we use a naïve approach to computing the partition function

42
Max clique vs. sub-clique

43
Log-linear models

44

45

46
Summary

47

48
From directed to undirected graphs

49

50
Example of moralization

51
Comparing directed and undirected models

52
Expressive power xy w z xy z

53
Coming back to inference

54

55
Belief propagation in trees

56

57

58

59

60

61

62

63
Learning

64
Parameter Estimation

65

66
Maximum-likelihood Estimation (MLE)

67
Example: 1-D Gaussian

68
MLE for Bayes Net

69

70
MLE for Bayes Net with Discrete Nodes

71
Parameter Estimation with Hidden Nodes Z 1 Z 2 Z 3 Z 4 Z 5 Z 6 Z

72
Why is learning harder?

73
Where do hidden variables come from?

74
Parameter Estimation with Hidden Nodes z z

75
EM

76
Different Learning Conditions StructureObservability FullPartial KnownClosed form searchEM UnknownLocal searchStructural EM

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google