Timothy Boger and Mike Korostelev

Slides:



Advertisements
Similar presentations
Value and Planning in MDPs. Administrivia Reading 3 assigned today Mahdevan, S., “Representation Policy Iteration”. In Proc. of 21st Conference on Uncertainty.
Advertisements

Markov Decision Process
Value Iteration & Q-learning CS 5368 Song Cui. Outline Recap Value Iteration Q-learning.
Decision Theoretic Planning
Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.
An Introduction to Markov Decision Processes Sarah Hickmott
Markov Decision Processes
Planning under Uncertainty
KI Kunstmatige Intelligentie / RuG Markov Decision Processes AIMA, Chapter 17.
Planning in MDPs S&B: Sec 3.6; Ch. 4. Administrivia Reminder: Final project proposal due this Friday If you haven’t talked to me yet, you still have the.
Uninformed Search Reading: Chapter 3 by today, Chapter by Wednesday, 9/12 Homework #2 will be given out on Wednesday DID YOU TURN IN YOUR SURVEY?
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Optimal Fixed-Size Controllers for Decentralized POMDPs Christopher Amato Daniel.
Markov Decision Processes
Markov Decision Processes Value Iteration Pieter Abbeel UC Berkeley EECS TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
The Value of Plans. Now and Then Last time Value in stochastic worlds Maximum expected utility Value function calculation Today Example: gridworld navigation.
Department of Computer Science Undergraduate Events More
More RL. MDPs defined A Markov decision process (MDP), M, is a model of a stochastic, dynamic, controllable, rewarding process given by: M = 〈 S, A,T,R.
Reinforcement Learning
Computer Science CPSC 502 Lecture 14 Markov Decision Processes (Ch. 9, up to 9.5.3)
Reinforcement Learning 主講人:虞台文 Content Introduction Main Elements Markov Decision Process (MDP) Value Functions.
Department of Computer Science Undergraduate Events More
Mobile Agent Migration Problem Yingyue Xu. Energy efficiency requirement of sensor networks Mobile agent computing paradigm Data fusion, distributed processing.
Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama1)2) Hirotaka Hachiya1)2) Christopher Towell2) Sethu.
Attributions These slides were originally developed by R.S. Sutton and A.G. Barto, Reinforcement Learning: An Introduction. (They have been reformatted.
Reinforcement Learning 主講人:虞台文 大同大學資工所 智慧型多媒體研究室.
Learning to Navigate Through Crowded Environments Peter Henry 1, Christian Vollmer 2, Brian Ferris 1, Dieter Fox 1 Tuesday, May 4, University of.
MDPs (cont) & Reinforcement Learning
Decision Theoretic Planning. Decisions Under Uncertainty  Some areas of AI (e.g., planning) focus on decision making in domains where the environment.
Decision Making Under Uncertainty CMSC 471 – Spring 2041 Class #25– Tuesday, April 29 R&N, material from Lise Getoor, Jean-Claude Latombe, and.
Reinforcement Learning Dynamic Programming I Subramanian Ramamoorthy School of Informatics 31 January, 2012.
Department of Computer Science Undergraduate Events More
MDPs and Reinforcement Learning. Overview MDPs Reinforcement learning.
Possible actions: up, down, right, left Rewards: – 0.04 if non-terminal state Environment is observable (i.e., agent knows where it is) MDP = “Markov Decision.
Abstract LSPI (Least-Squares Policy Iteration) works well in value function approximation Gaussian kernel is a popular choice as a basis function but can.
1 Passive Reinforcement Learning Ruti Glick Bar-Ilan university.
R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Chapter 3: The Reinforcement Learning Problem pdescribe the RL problem we will.
Intelligent Agents (Ch. 2)
Markov Decision Process (MDP)
Announcements Grader office hours posted on course website
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3
Reinforcement learning
Making complex decisions
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3
Reinforcement learning (Chapter 21)
Reinforcement Learning
Markov Decision Processes
Biomedical Data & Markov Decision Process
ECE 517: Reinforcement Learning in Artificial Intelligence
Reinforcement Learning
"Playing Atari with deep reinforcement learning."
Markov Decision Processes
UAV Route Planning in Delay Tolerant Networks
Markov Decision Processes
CS 188: Artificial Intelligence
CMSC 671 – Fall 2010 Class #22 – Wednesday 11/17
Course Logistics CS533: Intelligent Agents and Decision Making
Reinforcement learning
Chapter 3: The Reinforcement Learning Problem
Reinforcement Learning with Partially Known World Dynamics
Dr. Unnikrishnan P.C. Professor, EEE
Chapter 3: The Reinforcement Learning Problem
Chapter 3: The Reinforcement Learning Problem
CS 188: Artificial Intelligence Spring 2006
CMSC 471 – Fall 2011 Class #25 – Tuesday, November 29
Lecture 3: Environs and Algorithms
CS 416 Artificial Intelligence
Markov Decision Processes
Markov Decision Processes
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3
Presentation transcript:

Timothy Boger and Mike Korostelev Markov Decision Processes for Path Planning in Unpredictable Environment Timothy Boger and Mike Korostelev

Current Implementation Overview Problem Application MDP’s A* Search & Manhattan Current Implementation

Competition Overview “Each team is faced with the mission of providing medical supplies to a victim trapped in a building following an earthquake.” Semi-Known Environment Unpredictable Environment

Application Platform and Motivation Indoor Aerial Robotics Competition (IARC) IARC 2012: Urban Search and Rescue Drexel Autonomous Systems Lab Have access to unknown shortcuts Shortest time to reach finish line Focus on vision and navigation algorithms, particularly maze solving and collision avoidance Fully autonomous systems

Autonomous Unmanned Aerial Vehicles

High Level And Low Level Control Autonomous Unmanned Aerial Vehicles High Level And Low Level Control Ultrasonic Sensors Inertial Measurement

High Level Control Maze Solving Path Planning Decision Making Environment Initialization Override Controls

Maze Solving and Problem Statement Addition of Shortcuts and Obstacles Knowledge of affected area Costly detours or beneficial detours Need to evaluate whether an area is should be avoided or will the benefit outweigh the cost?

Proposed: Markov Decision Processes How to formulate problem? Movement cost no longer static Damaging event changed area dynamics Semi-known environment Follow established path or deviate from path to explore safer ways to goal

Markov Decision Processes An MDP is a reinforcement learning problem.

Markov Decision Processes State St 5 4 3 2 1 1 2 3 4 5 Action at St St+1 at rt+1

Markov Decision Processes Agent and environment interact at discrete time steps t = 0, 1, 2, … Agent observes state at step t: st S produces action at step t: at A(st) gets resulting reward: rt+1 R gets to resulting state: st+1

Markov Decision Processes

Markov Decision Processes: Policy Policy is a mapping from states to actions

Nearsighted 0←ɣ→1 Farsighted Markov Decision Processes: Rewards Suppose a sequence of rewards: How to maximize? T is time until terminal state is reached. When a task requires a large number of state transitions consider discount factor Where ɣ, 0 ≤ ɣ ≤ 1 Nearsighted 0←ɣ→1 Farsighted

Markov Decision Processes Markov Property – Memory less Ideally a state should summarize past sensations as to retain essential essential information. When reinforcement learning task has a Markov Property – it is a Markov Decision Process. Define MDP: one-step dynamics – transition probabilities 0.8 0.1 0.1

Markov Decision Processes: Value rt+3 Why move to some states and not others? Value of State: The expected return starting from that state. rt+2 State – value function for policy ∏ (depends on policy) rt+1 rt Value of taking an action for policy ∏

Markov Decision Processes: Bellman Equations This is a set of equations (in fact, linear), one for each state The value function for pi is its unique solution

Current Implementation And Challenges The Bellman equation expresses a relationship between the value of a state s and the values of its successor states s’.

Markov Decision Processes: Optimization

Markov Decision Processes 5 4 3 2 1 1 2 3 4 5 State St Action at

Current Implementation And Challenges A* Search Algorithm, Manhattan Heuristic Affected region and probability of shortcuts and obstacles not taken into account Semi-known environment not taken into account

A* Search Algorithm and the Manhattan Heuristic Destination: 31st St. & Park Ave. 7 Blocks 7 Blocks Origin: 3rd Ave. & 26th St.

A* Search Algorithm and the Manhattan Heuristic

Need to score the path to decide which way to search A* Search Algorithm and the Manhattan Heuristic Need to score the path to decide which way to search F = G + H  G = the movement cost to move from the starting point A to a given square on the grid, following the path generated to get there.  H = the estimated movement cost to move from that given square on the grid to the final destination, point B.

A* Search Algorithm and the Manhattan Heuristic Horizontal movement cost – 10 Diagonal movement cost – 14 H – No diagonals allowed

A* Search Algorithm and the Manhattan Heuristic

A* Search Algorithm and the Manhattan Heuristic

A* Search Algorithm and the Manhattan Heuristic When Destination location is unknown - Dijkstra's Algorithm - meaning H = 0

Current Implementation And Challenges A* Search Algorithm, Manhattan Heuristic Affected region and probability of shortcuts and obstacles not taken into account Semi-known environment not taken into account Java Android maze solver