1 Endgame Logistics  Final Project Presentations  Tuesday, March 19, 3-5, KEC2057  Powerpoint suggested (email to me before class)  Can use your own.

Slides:

Advertisements

Similar presentations

Markov Decision Process

Advertisements

Department of Computer Science Undergraduate Events More

SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.

CSE-573 Artificial Intelligence Partially-Observable MDPS (POMDPs)

Decision Theoretic Planning

Optimal Policies for POMDP Presented by Alp Sardağ.

MDP Presentation CS594 Automated Optimal Decision Making Sohail M Yousof Advanced Artificial Intelligence.

1 Classical STRIPS Planning Alan Fern * * Based in part on slides by Daniel Weld.

1 Reinforcement Learning Introduction & Passive Learning Alan Fern * Based in part on slides by Daniel Weld.

An Introduction to Markov Decision Processes Sarah Hickmott

Reinforcement Learning & Apprenticeship Learning Chenyi Chen.

Markov Decision Processes

Planning under Uncertainty

POMDPs: Partially Observable Markov Decision Processes Advanced AI

A: A Unified Brand-name-Free Introduction to Planning Subbarao Kambhampati Environment What action next? The $$$$$$ Question.

SA-1 1 Probabilistic Robotics Planning and Control: Markov Decision Processes.

Markov Decision Processes CSE 473 May 28, 2004 AI textbook : Sections Russel and Norvig Decision-Theoretic Planning: Structural Assumptions.

INSTITUTO DE SISTEMAS E ROBÓTICA Minimax Value Iteration Applied to Robotic Soccer Gonçalo Neto Institute for Systems and Robotics Instituto Superior Técnico.

Planning CSE 473. © Daniel S. Weld Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised.

Nov 14 th  Homework 4 due  Project 4 due 11/26.

Planning in MDPs S&B: Sec 3.6; Ch. 4. Administrivia Reminder: Final project proposal due this Friday If you haven’t talked to me yet, you still have the.

4/1 Agenda: Markov Decision Processes (& Decision Theoretic Planning)

5/6: Summary and Decision Theoretic Planning  Last homework socket opened (two more problems to be added—Scheduling, MDPs)  Project 3 due today  Sapa.

Markov Decision Processes

Planning to learn. Progress report Last time: Transition functions & stochastic outcomes Markov chains MDPs defined Today: Exercise completed Value functions.

Department of Computer Science Undergraduate Events More

More RL. MDPs defined A Markov decision process (MDP), M, is a model of a stochastic, dynamic, controllable, rewarding process given by: M = 〈 S, A,T,R.

Reinforcement Learning Yishay Mansour Tel-Aviv University.

Dynamic Bayesian Networks CSE 473. © Daniel S. Weld Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanningLearning.

9/23. Announcements Homework 1 returned today (Avg 27.8; highest 37) –Homework 2 due Thursday Homework 3 socket to open today Project 1 due Tuesday –A.

Making Decisions CSE 592 Winter 2003 Henry Kautz.

Exploration in Reinforcement Learning Jeremy Wyatt Intelligent Robotics Lab School of Computer Science University of Birmingham, UK

CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.

Policy Generation for Continuous-time Stochastic Domains with Concurrency Håkan L. S. YounesReid G. Simmons Carnegie Mellon University.

Reinforcement Learning on Markov Games Nilanjan Dasgupta Department of Electrical and Computer Engineering Duke University Durham, NC Machine Learning.

Overview  Decision processes and Markov Decision Processes (MDP)  Rewards and Optimal Policies  Defining features of Markov Decision Process  Solving.

CSE-473 Artificial Intelligence Partially-Observable MDPS (POMDPs)

1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.

CSE-573 Reinforcement Learning POMDPs. Planning What action next? PerceptsActions Environment Static vs. Dynamic Fully vs. Partially Observable Perfect.

CPSC 7373: Artificial Intelligence Lecture 10: Planning with Uncertainty Jiang Bian, Fall 2012 University of Arkansas at Little Rock.

By: Messias, Spaan, Lima Presented by: Mike Plasker DMES – Ocean Engineering.

Department of Computer Science Undergraduate Events More

© D. Weld and D. Fox 1 Reinforcement Learning CSE 473.

Reinforcement Learning Yishay Mansour Tel-Aviv University.

Extending PDDL to Model Stochastic Decision Processes Håkan L. S. Younes Carnegie Mellon University.

Rational Agents (Chapter 2)

George F Luger ARTIFICIAL INTELLIGENCE 6th edition Structures and Strategies for Complex Problem Solving Machine Learning: Probabilistic Luger: Artificial.

Algorithmic, Game-theoretic and Logical Foundations

MDPs (cont) & Reinforcement Learning

Decision Theoretic Planning. Decisions Under Uncertainty  Some areas of AI (e.g., planning) focus on decision making in domains where the environment.

Reinforcement Learning Dynamic Programming I Subramanian Ramamoorthy School of Informatics 31 January, 2012.

Department of Computer Science Undergraduate Events More

1 Chapter 17 2 nd Part Making Complex Decisions --- Decision-theoretic Agent Design Xin Lu 11/04/2002.

COMP 2208 Dr. Long Tran-Thanh University of Southampton Reinforcement Learning.

Some Final Thoughts Abhijit Gosavi. From MDPs to SMDPs The Semi-MDP is a more general model in which the time for transition is also a random variable.

Intelligent Agents (Ch. 2)

CS b659: Intelligent Robotics

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

Rational Agents (Chapter 2)

Review for the Midterm Exam

Markov Decision Processes

Markov Decision Processes

Course Logistics CS533: Intelligent Agents and Decision Making

Planning CSE 573 A handful of GENERAL SEARCH TECHNIQUES lie at the heart of practically all work in AI We will encounter the SAME PRINCIPLES again and.

Reinforcement Learning Dealing with Partial Observability

Reinforcement Nisheeth 18th January 2019.

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

Presentation transcript:

1 Endgame Logistics  Final Project Presentations  Tuesday, March 19, 3-5, KEC2057  Powerpoint suggested ( to me before class)  Can use your own laptop if necessary (e.g. demo)  10 minutes of presentation per project  Not including questions  Final Project Reports  Due: Friday, March 22, 12 noon

2 Percepts Actions World perfect vs. noisy fully observable vs. partially observable instantaneous vs. durative deterministic vs. stochastic ???? sole source of change vs. other sources concurrent actions vs. single action goal satisfaction vs. general reward Objective known world model vs. unknown numeric vs. discrete

3 Percepts Actions World perfect vs. noisy fully observable vs. partially observable instantaneous vs. durative deterministic vs. stochastic ???? sole source of change vs. other sources concurrent actions (but …) vs. single action goal satisfaction vs. general reward Objective known world model vs. unknown vs. partial model numeric vs. discrete STRIPS Planning

4 Percepts Actions World perfect vs. noisy fully observable vs. partially observable instantaneous vs. durative deterministic vs. stochastic ???? sole source of change vs. other sources concurrent actions vs. single action goal satisfaction vs. general reward Objective known world model vs. unknown numeric vs. discrete MDP Planning

5 Percepts Actions World perfect vs. noisy fully observable vs. partially observable instantaneous vs. durative deterministic vs. stochastic ???? sole source of change vs. other sources concurrent actions vs. single action goal satisfaction vs. general reward Objective known world model vs. unknown numeric vs. discrete Reinforcement Learning

6 Percepts Actions World perfect vs. noisy fully observable vs. partially observable instantaneous vs. durative deterministic vs. stochastic ???? sole source of change vs. other sources concurrent actions vs. single action goal satisfaction vs. general reward Objective known world model vs. unknown vs. simulator numeric vs. discrete Simulation-Based Planning

7 Percepts Actions World perfect vs. noisy fully observable vs. partially observable instantaneous vs. durative deterministic vs. stochastic ???? sole source of change vs. other sources concurrent actions vs. single action goal satisfaction vs. general reward Objective known world model vs. unknown numeric vs. discrete

8 Numeric States  In many cases states are naturally described in terms of numeric quantities  Classical control theory typically studies MDPs with real-valued continuous state spaces  Typically assume linear dynamical systems  Quite limited for most applications we are interested in in AI (often mix of discrete and numeric)  Typically we deal with this via feature encodings of the state space  Simulation based methods are agnostic about whether the state is numeric or discrete

9 Percepts Actions World perfect vs. noisy fully observable vs. partially observable instantaneous vs. durative deterministic vs. stochastic ???? sole source of change vs. other sources concurrent actions vs. single action goal satisfaction vs. general reward Objective known world model vs. unknown numeric vs. discrete

10 Partial Observability  In reality we only observe percepts of the world not the actual state  Partially-Observable MDPs (POMDPs) extend MDPs to handle partial observability  Start with an MDP and add an observation distribution P(o | s) : probability of observation o given state s  We see a sequence of observations rather than sequence of states  POMDP planning is much harder than MDP planning. Scalability is poor.  Can often apply RL in practice using features of observations

11 Percepts Actions World perfect vs. noisy fully observable vs. partially observable instantaneous vs. durative deterministic vs. stochastic ???? sole source of change vs. other sources concurrent actions vs. single action goal satisfaction vs. general reward Objective known world model vs. unknown numeric vs. discrete

12 Other Sources of Change  In many cases the environment changes even if no actions are select by the agent  Sometimes due to exogenous events, e.g. 911 calls come in at random  Sometimes due to other agents  Adversarial agents try to decrease our reward  Cooperative agents may be trying to increase our reward or have their own objectives  Decision making in the context of other agents is studied in the area of game theory

13 Percepts Actions World perfect vs. noisy fully observable vs. partially observable instantaneous vs. durative deterministic vs. stochastic ???? sole source of change vs. other sources concurrent actions vs. single action goal satisfaction vs. general reward Objective known world model vs. unknown numeric vs. discrete

14 Durative Actions  Generally different actions have different durations  Often durations are stochastic  Semi-Markov MDPs (SMDPs) are an extension to MDPs that account for actions with probabilistic durations  Transition distribution changes to P(s’,t | s, a) which gives the probability of ending up in state s’ in t time steps after taking action a in state s  Planning and learning algorithms are very similar to standard MDPs. The equations are just a bit more complex to account for time.

15 Percepts Actions World perfect vs. noisy fully observable vs. partially observable instantaneous vs. durative deterministic vs. stochastic ???? sole source of change vs. other sources concurrent actions vs. single action goal satisfaction vs. general reward Objective known world model vs. unknown numeric vs. discrete

16 Durative Actions  Generally different actions have different durations  Often durations are stochastic  Semi-Markov MDPs (SMDPs) are an extension to MDPs that account for actions with probabilistic durations  Transition distribution changes to P(s’,t | s, a) which gives the probability of ending up in state s’ in t time steps after taking action a in state s  Planning and learning algorithms are very similar to standard MDPs. The equations are just a bit more complex to account for time.

17 Percepts Actions World perfect vs. noisy fully observable vs. partially observable instantaneous vs. durative deterministic vs. stochastic ???? sole source of change vs. other sources concurrent actions vs. single action goal satisfaction vs. general reward Objective known world model vs. unknown numeric vs. discrete

18 Concurrent Durative Actions  In many problems we need to form plans that direct the actions of a team of agents  Typically requires planning over the space of concurrent activities, where the different activities can have different durations  Can treat these problems as a huge MDP (SMDP) where the action space is the cross-product of the individual agent actions  Standard MDP algorithms will break  There are multi-agent or concurrent-action extensions to most of the formalisms we studied in class

19 Percepts Actions World perfect vs. noisy fully observable vs. partially observable instantaneous vs. durative deterministic vs. stochastic ???? sole source of change vs. other sources concurrent actions vs. single action goal satisfaction vs. general reward Objective known world model vs. unknown numeric vs. discrete

20 Percepts Actions World perfect vs. noisy fully observable vs. partially observable instantaneous vs. durative deterministic vs. stochastic ???? sole source of change vs. other sources concurrent actions vs. single action goal satisfaction vs. general reward Objective known world model vs. unknown numeric vs. discrete

21 Percepts Actions World perfect vs. noisy fully observable vs. partially observable instantaneous vs. durative deterministic vs. stochastic AI Planning ???? sole source of change vs. other sources concurrent actions vs. single action goal satisfaction vs. general reward Objective known world model vs. unknown

22 Percepts Actions World perfect vs. noisy fully observable vs. partially observable instantaneous vs. durative deterministic vs. stochastic ???? sole source of change vs. other sources concurrent actions vs. single action goal satisfaction vs. general reward Objective known world model vs. unknown

23 Percepts Actions World perfect vs. noisy fully observable vs. partially observable instantaneous vs. durative deterministic vs. stochastic AI Planning ???? sole source of change vs. other sources concurrent actions vs. single action goal satisfaction vs. general reward Objective known world model vs. unknown

24 Percepts Actions World perfect vs. noisy fully observable vs. partially observable instantaneous vs. durative deterministic vs. stochastic AI Planning ???? sole source of change vs. other sources concurrent actions vs. single action goal satisfaction vs. general reward Objective known world model vs. unknown