Decision Making Under Uncertainty CMSC 671 – Fall 2010 R&N, Chapters 16.1-16.3, 16.5-16.6, 17.1-17.3 material from Lise Getoor, Jean-Claude Latombe, and.

Slides:



Advertisements
Similar presentations
Decision Making Under Uncertainty Russell and Norvig: ch 16, 17 CMSC 671 – Fall 2005 material from Lise Getoor, Jean-Claude Latombe, and Daphne Koller.
Advertisements

Markov Decision Processes (MDPs) read Ch utility-based agents –goals encoded in utility function U(s), or U:S  effects of actions encoded in.
Making Simple Decisions Chapter 16 Some material borrowed from Jean-Claude Latombe and Daphne Koller by way of Marie desJadines,
Making Simple Decisions
Markov Decision Process
SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.
Questions?. Setting a reward function, with and without subgoals Difference between agent and environment AI for games, Roomba Markov Property – Broken.
THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY CSIT 5220: Reasoning and Decision under Uncertainty L09: Graphical Models for Decision Problems Nevin.
Decision Theoretic Planning
MDP Presentation CS594 Automated Optimal Decision Making Sohail M Yousof Advanced Artificial Intelligence.
Markov Decision Processes
Planning under Uncertainty
CPSC 322, Lecture 35Slide 1 Finish VE for Sequential Decisions & Value of Information and Control Computer Science cpsc322, Lecture 35 (Textbook Chpt 9.4)
KI Kunstmatige Intelligentie / RuG Markov Decision Processes AIMA, Chapter 17.
Decision Making Under Uncertainty Russell and Norvig: ch 16, 17 CMSC421 – Fall 2005.
Making Decisions under Probabilistic Uncertainty (Where an agent optimizes what it gets on average, but it may get more... or less ) R&N: Chap. 17, Sect.
Decision Making Under Uncertainty Russell and Norvig: ch 16 CMSC421 – Fall 2006.
Department of Computer Science Undergraduate Events More
Decision Making Under Uncertainty Russell and Norvig: ch 16, 17 CMSC421 – Fall 2003 material from Jean-Claude Latombe, and Daphne Koller.
CMSC 671 Fall 2003 Class #26 – Wednesday, November 26 Russell & Norvig 16.1 – 16.5 Some material borrowed from Jean-Claude Latombe and Daphne Koller by.
Utility Theory & MDPs Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart.
Instructor: Vincent Conitzer
MAKING COMPLEX DEClSlONS
Reminder Midterm Mar 7 Project 2 deadline Mar 18 midnight in-class
1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.
Making Simple Decisions
Axioms Let W be statements known to be true in a domain An axiom is a rule presumed to be true An axiomatic set is a collection of axioms Given an axiomatic.
Making Simple Decisions Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 16.
Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect ,Chap. 17 CS121 – Winter 2003.
Making Simple Decisions Utility Theory MultiAttribute Utility Functions Decision Networks The Value of Information Summary.
Chapter 16: Making Simple Decision March 23, 2004.
Decision Theoretic Planning. Decisions Under Uncertainty  Some areas of AI (e.g., planning) focus on decision making in domains where the environment.
Decision Making Under Uncertainty CMSC 471 – Spring 2014 Class #12– Thursday, March 6 R&N, Chapters , material from Lise Getoor, Jean-Claude.
Decision Making Under Uncertainty CMSC 471 – Spring 2041 Class #25– Tuesday, April 29 R&N, material from Lise Getoor, Jean-Claude Latombe, and.
CPS 570: Artificial Intelligence Markov decision processes, POMDPs
Department of Computer Science Undergraduate Events More
1 Chapter 17 2 nd Part Making Complex Decisions --- Decision-theoretic Agent Design Xin Lu 11/04/2002.
Markov Decision Process (MDP)
Web-Mining Agents Agents and Rational Behavior Decision-Making under Uncertainty Simple Decisions Ralf Möller Universität zu Lübeck Institut für Informationssysteme.
Planning Under Uncertainty. Sensing error Partial observability Unpredictable dynamics Other agents.
Making Simple Decisions Chapter 16 Some material borrowed from Jean-Claude Latombe and Daphne Koller by way of Marie desJadines,
Markov Decision Processes AIMA: 17.1, 17.2 (excluding ), 17.3.
1 Automated Planning and Decision Making 2007 Automated Planning and Decision Making Prof. Ronen Brafman Various Subjects.
Web-Mining Agents Agents and Rational Behavior Decision-Making under Uncertainty Complex Decisions Ralf Möller Universität zu Lübeck Institut für Informationssysteme.
Decision Making ECE457 Applied Artificial Intelligence Spring 2007 Lecture #10.
Decision Making Under Uncertainty CMSC 471 – Fall 2011 Class #23-24 – Thursday, November 17 / Tuesday, November 22 R&N, Chapters , ,
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.
Ralf Möller Universität zu Lübeck Institut für Informationssysteme
Nevin L. Zhang Room 3504, phone: ,
Ralf Möller Universität zu Lübeck Institut für Informationssysteme
Making Simple Decisions
ECE457 Applied Artificial Intelligence Fall 2007 Lecture #10
Making complex decisions
ECE457 Applied Artificial Intelligence Spring 2008 Lecture #10
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3
Markov Decision Processes
CS 4/527: Artificial Intelligence
Markov Decision Processes
Markov Decision Processes
Rational Decisions and
13. Acting under Uncertainty Wolfram Burgard and Bernhard Nebel
Instructor: Vincent Conitzer
Chapter 17 – Making Complex Decisions
Making Simple Decisions
CS 416 Artificial Intelligence
Reinforcement Learning Dealing with Partial Observability
Reinforcement Nisheeth 18th January 2019.
Utilities and Decision Theory
Deciding Under Probabilistic Uncertainty
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3
Presentation transcript:

Decision Making Under Uncertainty CMSC 671 – Fall 2010 R&N, Chapters , , material from Lise Getoor, Jean-Claude Latombe, and Daphne Koller 1

Decision Making Under Uncertainty Many environments have multiple possible outcomes Some of these outcomes may be good; others may be bad Some may be very likely; others unlikely What’s a poor agent to do?? 2

Non-Deterministic vs. Probabilistic Uncertainty ? bac {a,b,c}  decision that is best for worst case ? bac {a(p a ), b(p b ), c(p c )}  decision that maximizes expected utility value Non-deterministic modelProbabilistic model ~ Adversarial search 3

Expected Utility Random variable X with n values x 1,…,x n and distribution (p 1,…,p n ) E.g.: X is the state reached after doing an action A under uncertainty Function U of X E.g., U is the utility of a state The expected utility of A is EU[A] =  i=1,…,n p(x i |A)U(x i ) 4

s0s0 s3s3 s2s2 s1s1 A U(A1, S0) = 100 x x x 0.1 = = 62 One State/One Action Example 5

s0s0 s3s3 s2s2 s1s1 A A2 s4s U (A1, S0) = 62 U (A2, S0) = 74 U (S0) = max a {U(a,S0)} = 74 One State/Two Actions Example 6

s0s0 s3s3 s2s2 s1s1 A A2 s4s U (A1, S0) = 62 – 5 = 57 U (A2, S0) = 74 – 25 = 49 U (S0) = max a {U(a, S0)} = Introducing Action Costs 7

MEU Principle A rational agent should choose the action that maximizes agent’s expected utility This is the basis of the field of decision theory The MEU principle provides a normative criterion for rational choice of action 8

Not quite… Must have a complete model of: Actions Utilities States Even if you have a complete model, decision making is computationally intractable In fact, a truly rational agent takes into account the utility of reasoning as well (bounded rationality) Nevertheless, great progress has been made in this area recently, and we are able to solve much more complex decision-theoretic problems than ever before 9

Axioms of Utility Theory Orderability (A>B)  (A<B)  (A~B) Transitivity (A>B)  (B>C)  (A>C) Continuity A>B>C   p [p,A; 1-p,C] ~ B Substitutability A~B  [p,A; 1-p,C]~[p,B; 1-p,C] Monotonicity A>B  (p≥q  [p,A; 1-p,B] >~ [q,A; 1-q,B]) Decomposability [p,A; 1-p, [q,B; 1-q, C]] ~ [p,A; (1-p)q, B; (1-p)(1-q), C] 10

Money Versus Utility Money <> Utility More money is better, but not always in a linear relationship to the amount of money Expected Monetary Value Risk-averse: U(L) < U(S EMV(L) ) Risk-seeking: U(L) > U(S EMV(L) ) Risk-neutral: U(L) = U(S EMV(L) ) 11

Value Function Provides a ranking of alternatives, but not a meaningful metric scale Also known as an “ordinal utility function” Sometimes, only relative judgments (value functions) are necessary At other times, absolute judgments (utility functions) are required 12

Multiattribute Utility Theory A given state may have multiple utilities...because of multiple evaluation criteria...because of multiple agents (interested parties) with different utility functions We will talk about this more later in the semester, when we discuss multi-agent systems and game theory 13

Decision Networks Extend BNs to handle actions and utilities Also called influence diagrams Use BN inference methods to solve Perform Value of Information calculations 14

Decision Networks cont. Chance nodes: random variables, as in BNs Decision nodes: actions that a decision maker can take Utility/value nodes: the utility of an outcome state 15

R&N example 16

Umbrella Network weather forecast umbrella happiness take/don’t take f w p(f|w) sunny rain 0.3 rainy rain 0.7 sunny no rain 0.8 rainy no rain 0.2 P(rain) = 0.4 U(have,rain) = -25 U(have,~rain) = 0 U(~have, rain) = -100 U(~have, ~rain) = 100 have umbrella P(have|take) = 1.0 P(~have|~take)=1.0 17

Evaluating Decision Networks Set the evidence variables for current state For each possible value of the decision node: Set decision node to that value Calculate the posterior probability of the parent nodes of the utility node, using BN inference Calculate the resulting utility for each action Return the action with the highest utility 18

Decision Making: Umbrella Network weather forecast umbrella happiness take/don’t take f w p(f|w) sunny rain 0.3 rainy rain 0.7 sunny no rain 0.8 rainy no rain 0.2 P(rain) = 0.4 U(have,rain) = -25 U(have,~rain) = 0 U(~have, rain) = -100 U(~have, ~rain) = 100 have umbrella P(have|take) = 1.0 P(~have|~take)=1.0 Should I take my umbrella?? 19

Value of Information (VOI) Suppose an agent’s current knowledge is E. The value of the current best action  is: The value of the new best action (after new evidence E’ is obtained): The value of information for E’ is therefore: 20

Value of Information: Umbrella Network weather forecast umbrella happiness take/don’t take f w p(f|w) sunny rain 0.3 rainy rain 0.7 sunny no rain 0.8 rainy no rain 0.2 P(rain) = 0.4 U(have,rain) = -25 U(have,~rain) = 0 U(~have, rain) = -100 U(~have, ~rain) = 100 have umbrella P(have|take) = 1.0 P(~have|~take)=1.0 What is the value of knowing the weather forecast? 21

Sequential Decision Making Finite Horizon Infinite Horizon 22

Simple Robot Navigation Problem In each state, the possible actions are U, D, R, and L 23

Probabilistic Transition Model In each state, the possible actions are U, D, R, and L The effect of U is as follows (transition model): With probability 0.8, the robot moves up one square (if the robot is already in the top row, then it does not move) 24

Probabilistic Transition Model In each state, the possible actions are U, D, R, and L The effect of U is as follows (transition model): With probability 0.8 the robot moves up one square (if the robot is already in the top row, then it does not move) With probability 0.1 the robot moves right one square (if the robot is already in the rightmost row, then it does not move) 25

Probabilistic Transition Model In each state, the possible actions are U, D, R, and L The effect of U is as follows (transition model): With probability 0.8 the robot moves up one square (if the robot is already in the top row, then it does not move) With probability 0.1 the robot moves right one square (if the robot is already in the rightmost row, then it does not move) With probability 0.1 the robot moves left one square (if the robot is already in the leftmost row, then it does not move) 26

Markov Property The transition properties depend only on the current state, not on the previous history (how that state was reached) 27

Sequence of Actions Planned sequence of actions: (U, R) [3,2] 28

Sequence of Actions Planned sequence of actions: (U, R) U is executed [3,2] [4,2][3,3][3,2] 29

Histories Planned sequence of actions: (U, R) U has been executed R is executed There are 9 possible sequences of states – called histories – and 6 possible final states for the robot! [3,2] [4,2][3,3][3,2] [3,3][3,2][4,1][4,2][4,3][3,1] 30

Probability of Reaching the Goal P([4,3] | (U,R).[3,2]) = P([4,3] | R.[3,3]) x P([3,3] | U.[3,2]) + P([4,3] | R.[4,2]) x P([4,2] | U.[3,2]) Note importance of Markov property in this derivation P([3,3] | U.[3,2]) = 0.8 P([4,2] | U.[3,2]) = 0.1 P([4,3] | R.[3,3]) = 0.8 P([4,3] | R.[4,2]) = 0.1 P([4,3] | (U,R).[3,2]) =

Utility Function [4,3] provides power supply [4,2] is a sand area from which the robot cannot escape

Utility Function [4,3] provides power supply [4,2] is a sand area from which the robot cannot escape The robot needs to recharge its batteries

Utility Function [4,3] provides power supply [4,2] is a sand area from which the robot cannot escape The robot needs to recharge its batteries [4,3] or [4,2] are terminal states

Utility of a History [4,3] provides power supply [4,2] is a sand area from which the robot cannot escape The robot needs to recharge its batteries [4,3] or [4,2] are terminal states The utility of a history is defined by the utility of the last state (+1 or –1) minus n/25, where n is the number of moves

Utility of an Action Sequence +1 Consider the action sequence (U,R) from [3,2]

Utility of an Action Sequence +1 Consider the action sequence (U,R) from [3,2] A run produces one of 7 possible histories, each with some probability [3,2] [4,2][3,3][3,2] [3,3][3,2][4,1][4,2][4,3][3,1] 37

Utility of an Action Sequence +1 Consider the action sequence (U,R) from [3,2] A run produces one among 7 possible histories, each with some probability The utility of the sequence is the expected utility of the histories: U =  h U h P(h) [3,2] [4,2][3,3][3,2] [3,3][3,2][4,1][4,2][4,3][3,1] 38

Optimal Action Sequence +1 Consider the action sequence (U,R) from [3,2] A run produces one among 7 possible histories, each with some probability The utility of the sequence is the expected utility of the histories The optimal sequence is the one with maximal utility [3,2] [4,2][3,3][3,2] [3,3][3,2][4,1][4,2][4,3][3,1] 39

Optimal Action Sequence +1 Consider the action sequence (U,R) from [3,2] A run produces one among 7 possible histories, each with some probability The utility of the sequence is the expected utility of the histories The optimal sequence is the one with maximal utility But is the optimal action sequence what we want to compute? [3,2] [4,2][3,3][3,2] [3,3][3,2][4,1][4,2][4,3][3,1] only if the sequence is executed blindly! 40

Accessible or observable state Repeat:  s  sensed state  If s is terminal then exit  a  choose action (given s)  Perform a Reactive Agent Algorithm 41

Policy (Reactive/Closed-Loop Strategy) A policy  is a complete mapping from states to actions

Repeat:  s  sensed state  If s is terminal then exit  a   (s)  Perform a Reactive Agent Algorithm 43

Optimal Policy +1 A policy  is a complete mapping from states to actions The optimal policy  * is the one that always yields a history (ending at a terminal state) with maximal expected utility Makes sense because of Markov property Note that [3,2] is a “dangerous” state that the optimal policy tries to avoid 44

Optimal Policy +1 A policy  is a complete mapping from states to actions The optimal policy  * is the one that always yields a history with maximal expected utility This problem is called a Markov Decision Problem (MDP) How to compute  *? 45

Additive Utility History H = (s 0,s 1,…,s n ) The utility of H is additive iff: U (s 0,s 1,…,s n ) = R (0) + U (s 1,…,s n ) =  R (i) Reward 46

Additive Utility History H = (s 0,s 1,…,s n ) The utility of H is additive iff: U (s 0,s 1,…,s n ) = R (0) + U (s 1,…,s n ) =  R (i) Robot navigation example: R (n) = +1 if s n = [4,3] R (n) = -1 if s n = [4,2] R (i) = -1/25 if i = 0, …, n-1 47

Principle of Max Expected Utility History H = (s 0,s 1,…,s n ) Utility of H: U (s 0,s 1,…,s n ) =  R (i)  First-step analysis  U (i) = R (i) + max a  k P (k | a.i) U (k)  *(i) = arg max a  k P (k | a.i) U (k) +1 48

Defining State Utility Problem: When making a decision, we only know the reward so far, and the possible actions We’ve defined utility retroactively (i.e., the utility of a history is obvious once we finish it) What is the utility of a particular state in the middle of decision making? Need to compute expected utility of possible future histories 49

Value Iteration Initialize the utility of each non-terminal state s i to U 0 (i) = 0 For t = 0, 1, 2, …, do: U t+1 (i)  R (i) + max a  k P (k | a.i) U t (k)

Value Iteration Initialize the utility of each non-terminal state s i to U 0 (i) = 0 For t = 0, 1, 2, …, do: U t+1 (i)  R (i) + max a  k P (k | a.i) U t (k) U t ([3,1]) t Note the importance of terminal states and connectivity of the state-transition graph 51

Policy Iteration Pick a policy  at random 52

Policy Iteration Pick a policy  at random Repeat: Compute the utility of each state for  U t+1 (i)  R (i) +  k P (k |  (i).i) U t (k) 53

Policy Iteration Pick a policy  at random Repeat: Compute the utility of each state for  U t+1 (i)  R (i) +  k P (k |  (i).i) U t (k) Compute the policy  ’ given these utilities  ’(i) = arg max a  k P (k | a.i) U (k) 54

Policy Iteration Pick a policy  at random Repeat: Compute the utility of each state for  U t+1 (i)  R (i) +  k P (k |  (i).i) U t (k) Compute the policy  ’ given these utilities  ’(i) = arg max a  k P(k | a.i) U (k) If  ’ =  then return  Or solve the set of linear equations: U (i) = R (i) +  k P (k |  (i).i) U (k) (often a sparse system) 55

Infinite Horizon In many problems, e.g., the robot navigation example, histories are potentially unbounded and the same state can be reached many times One trick: Use discounting to make an infinite horizon problem mathematically tractable What if the robot lives forever? 56