Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 188: Artificial Intelligence Fall 2009 Lecture 18: Decision Diagrams 10/29/2009 Dan Klein – UC Berkeley.

Similar presentations


Presentation on theme: "CS 188: Artificial Intelligence Fall 2009 Lecture 18: Decision Diagrams 10/29/2009 Dan Klein – UC Berkeley."— Presentation transcript:

1 CS 188: Artificial Intelligence Fall 2009 Lecture 18: Decision Diagrams 10/29/2009 Dan Klein – UC Berkeley

2 Announcements  Mid-Semester Evaluations  Link is in your email  Assignments  W3 out later this week  P4 out next week  Contest! 2

3 Decision Networks  MEU: choose the action which maximizes the expected utility given the evidence  Can directly operationalize this with decision networks  Bayes nets with nodes for utility and actions  Lets us calculate the expected utility for each action  New node types:  Chance nodes (just like BNs)  Actions (rectangles, cannot have parents, act as observed evidence)  Utility node (diamond, depends on action and chance nodes) Weather Forecast Umbrella U 3 [DEMO: Ghostbusters]

4 Decision Networks  Action selection:  Instantiate all evidence  Set action node(s) each possible way  Calculate posterior for all parents of utility node, given the evidence  Calculate expected utility for each action  Choose maximizing action Weather Forecast Umbrella U 4

5 Example: Decision Networks Weather Umbrella U WP(W) sun0.7 rain0.3 AWU(A,W) leavesun100 leaverain0 takesun20 takerain70 Umbrella = leave Umbrella = take Optimal decision = leave

6 Decisions as Outcome Trees  Almost exactly like expectimax / MDPs  What’s changed? 6 U(t,s) Weather take leave {} sun U(t,r) rain U(l,s)U(l,r) rain sun

7 Evidence in Decision Networks  Find P(W|F=bad)  Select for evidence  First we join P(W) and P(bad|W)  Then we normalize Weather Forecast WP(W) sun0.7 rain0.3 FP(F|rain) good0.1 bad0.9 FP(F|sun) good0.8 bad0.2 WP(W) sun0.7 rain0.3 WP(F=bad|W) sun0.2 rain0.9 WP(W,F=bad) sun0.14 rain0.27 WP(W | F=bad) sun0.34 rain0.66 7 Umbrella U

8 Example: Decision Networks Weather Forecast =bad Umbrella U AWU(A,W) leavesun100 leaverain0 takesun20 takerain70 WP(W|F=bad) sun0.34 rain0.66 Umbrella = leave Umbrella = take Optimal decision = take 8

9 Decisions as Outcome Trees 9 U(t,s) W | {b} take leave sun U(t,r) rain U(l,s)U(l,r) rain sun {b}

10 Conditioning on Action Nodes  An action node can be a parent of a chance node  Chance node conditions on the outcome of the action  Action nodes are like observed variables in a Bayes’ net, except we max over their values 10 S’ A U S T(s,a,s’) R(s,a,s’)

11 Value of Information  Idea: compute value of acquiring evidence  Can be done directly from decision network  Example: buying oil drilling rights  Two blocks A and B, exactly one has oil, worth k  You can drill in one location  Prior probabilities 0.5 each, & mutually exclusive  Drilling in either A or B has MEU = k/2  Question: what’s the value of information?  Value of knowing which of A or B has oil  Value is expected gain in MEU from new info  Survey may say “oil in a” or “oil in b,” prob 0.5 each  If we know OilLoc, MEU is k (either way)  Gain in MEU from knowing OilLoc?  VPI(OilLoc) = k/2  Fair price of information: k/2 OilLoc DrillLoc U DOU aak ab0 ba0 bbk OP a1/2 b 11

12 Value of Information  Assume we have evidence E=e. Value if we act now:  Assume we see that E’ = e’. Value if we act then:  BUT E’ is a random variable whose value is unknown, so we don’t know what e’ will be  Expected value if E’ is revealed and then we act:  Value of information: how much MEU goes up by revealing E’ first: P(s | e) {e} a U {e, e’} a P(s | e, e’) U {e} P(e’ | e) {e, e’}

13 VPI Example: Weather Weather Forecast Umbrella U AWU leavesun100 leaverain0 takesun20 takerain70 MEU with no evidence MEU if forecast is bad MEU if forecast is good FP(F) good0.59 bad0.41 Forecast distribution 13 [Demo]

14 VPI Properties  Nonnegative  Nonadditive ---consider, e.g., obtaining E j twice  Order-independent 14

15 Quick VPI Questions  The soup of the day is either clam chowder or split pea, but you wouldn’t order either one. What’s the value of knowing which it is?  There are two kinds of plastic forks at a picnic. It must be that one is slightly better. What’s the value of knowing which?  You have $10 to bet double-or-nothing and there is a 75% chance that Berkeley will beat Stanford. What’s the value of knowing the outcome in advance?  You must bet on Cal, either way. What’s the value now?

16 Reasoning over Time  Often, we want to reason about a sequence of observations  Speech recognition  Robot localization  User attention  Medical monitoring  Need to introduce time into our models  Basic approach: hidden Markov models (HMMs)  More general: dynamic Bayes’ nets 16

17 Markov Models  A Markov model is a chain-structured BN  Each node is identically distributed (stationarity)  Value of X at a given time is called the state  As a BN:  Parameters: called transition probabilities or dynamics, specify how the state evolves over time (also, initial probs) X2X2 X1X1 X3X3 X4X4 [DEMO: Ghostbusters]

18 Conditional Independence  Basic conditional independence:  Past and future independent of the present  Each time step only depends on the previous  This is called the (first order) Markov property  Note that the chain is just a (growing) BN  We can always use generic BN reasoning on it if we truncate the chain at a fixed length X2X2 X1X1 X3X3 X4X4 18

19 Example: Markov Chain  Weather:  States: X = {rain, sun}  Transitions:  Initial distribution: 1.0 sun  What’s the probability distribution after one step? rainsun 0.9 0.1 This is a CPT, not a BN! 19

20 Mini-Forward Algorithm  Question: probability of being in state x at time t?  Slow answer:  Enumerate all sequences of length t which end in s  Add up their probabilities … 20

21 Mini-Forward Algorithm  Better way: cached incremental belief updates  An instance of variable elimination! sun rain sun rain sun rain sun rain Forward simulation

22 Example  From initial observation of sun  From initial observation of rain P(X 1 )P(X 2 )P(X 3 )P(X  ) P(X 1 )P(X 2 )P(X 3 )P(X  ) 22

23 Stationary Distributions  If we simulate the chain long enough:  What happens?  Uncertainty accumulates  Eventually, we have no idea what the state is!  Stationary distributions:  For most chains, the distribution we end up in is independent of the initial distribution  Called the stationary distribution of the chain  Usually, can only predict a short time out [DEMO: Ghostbusters]

24 Web Link Analysis  PageRank over a web graph  Each web page is a state  Initial distribution: uniform over pages  Transitions:  With prob. c, uniform jump to a random page (dotted lines)  With prob. 1-c, follow a random outlink (solid lines)  Stationary distribution  Will spend more time on highly reachable pages  E.g. many ways to get to the Acrobat Reader download page!  Somewhat robust to link spam  Google 1.0 returned the set of pages containing all your keywords in decreasing rank, now all search engines use link analysis along with many other factors 24

25 Value of (Perfect) Information  Current evidence E=e, utility depends on S=s  Potential new evidence E’: suppose we knew E’ = e’  BUT E’ is a random variable whose value is currently unknown, so:  Must compute expected gain over all possible values  (VPI = value of perfect information) 25

26 VPI Example: Ghostbusters TBGP(T,B,G) +t+b+g0.16 +t+b gg 0.16 +t bb +g0.24 +t bb gg 0.04  t +b+g0.04 tt +b gg 0.24 tt bb +g0.06 tt bb gg  Reminder: ghost is hidden, sensors are noisy  T: Top square is red B: Bottom square is red G: Ghost is in the top  Sensor model: P( +t | +g ) = 0.8 P( +t |  g ) = 0.4 P( +b | +g) = 0.4 P( +b |  g ) = 0.8 Joint Distribution [Demo]

27 VPI Example: Ghostbusters TBGP(T,B,G) +t+b+g0.16 +t+b gg 0.16 +t bb +g0.24 +t bb gg 0.04  t +b+g0.04 tt +b gg 0.24 tt bb +g0.06 tt bb gg Utility of bust is 2, no bust is 0  Q1: What’s the value of knowing T if I know nothing?  Q1’: E P(T) [MEU(t) – MEU()]  Q2: What’s the value of knowing B if I already know that T is true (red)?  Q2’: E P(B|t) [MEU(t,b) – MEU(t)]  How low can the value of information ever be? Joint Distribution [Demo]


Download ppt "CS 188: Artificial Intelligence Fall 2009 Lecture 18: Decision Diagrams 10/29/2009 Dan Klein – UC Berkeley."

Similar presentations


Ads by Google