By: Messias, Spaan, Lima Presented by: Mike Plasker DMES – Ocean Engineering.

Slides:



Advertisements
Similar presentations
Markov Decision Process
Advertisements

1 University of Southern California Keep the Adversary Guessing: Agent Security by Policy Randomization Praveen Paruchuri University of Southern California.
Modified MDPs for Concurrent Execution AnYuan Guo Victor Lesser University of Massachusetts.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Solving POMDPs Using Quadratically Constrained Linear Programs Christopher Amato.
CSE-573 Artificial Intelligence Partially-Observable MDPS (POMDPs)
Markov Models for Multi-Agent Coordination Maayan Roth Multi-Robot Reading Group April 13, 2005.
MDP Presentation CS594 Automated Optimal Decision Making Sohail M Yousof Advanced Artificial Intelligence.
Statistical Probabilistic Model Checking Håkan L. S. Younes Carnegie Mellon University.
1 Slides for the book: Probabilistic Robotics Authors: Sebastian Thrun Wolfram Burgard Dieter Fox Publisher: MIT Press, Web site for the book & more.
What Are Partially Observable Markov Decision Processes and Why Might You Care? Bob Wall CS 536.
An Introduction to Markov Decision Processes Sarah Hickmott
COSC 878 Seminar on Large Scale Statistical Machine Learning 1.
Planning under Uncertainty
CPSC 322, Lecture 37Slide 1 Finish Markov Decision Processes Last Class Computer Science cpsc322, Lecture 37 (Textbook Chpt 9.5) April, 8, 2009.
In practice, we run into three common issues faced by concurrent optimization algorithms. We alter our model-shaping to mitigate these by reasoning about.
Markov Decision Processes CSE 473 May 28, 2004 AI textbook : Sections Russel and Norvig Decision-Theoretic Planning: Structural Assumptions.
Multi-agent Oriented Constraint Satisfaction Authors: Jiming Liu, Han Jing and Y.Y. Tang Speaker: Lin Xu CSCE 976, May 1st 2002.
INSTITUTO DE SISTEMAS E ROBÓTICA Minimax Value Iteration Applied to Robotic Soccer Gonçalo Neto Institute for Systems and Robotics Instituto Superior Técnico.
Bayesian Reinforcement Learning with Gaussian Processes Huanren Zhang Electrical and Computer Engineering Purdue University.
CMPUT 551 Analyzing abstraction and approximation within MDP/POMDP environment Magdalena Jankowska (M.Sc. - Algorithms) Ilya Levner (M.Sc - AI/ML)
Concurrent Probabilistic Temporal Planning (CPTP) Mausam Joint work with Daniel S. Weld University of Washington Seattle.
1 University of Southern California Security in Multiagent Systems by Policy Randomization Praveen Paruchuri, Milind Tambe, Fernando Ordonez University.
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
Probabilistic Robotics Introduction Probabilities Bayes rule Bayes filters.
1 University of Southern California Increasing Security through Communication and Policy Randomization in Multiagent Systems Praveen Paruchuri, Milind.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Optimal Fixed-Size Controllers for Decentralized POMDPs Christopher Amato Daniel.
Markov Decision Processes Value Iteration Pieter Abbeel UC Berkeley EECS TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
Exploration in Reinforcement Learning Jeremy Wyatt Intelligent Robotics Lab School of Computer Science University of Birmingham, UK
Policy Generation for Continuous-time Stochastic Domains with Concurrency Håkan L. S. YounesReid G. Simmons Carnegie Mellon University.
MAKING COMPLEX DEClSlONS
Conference Paper by: Bikramjit Banerjee University of Southern Mississippi From the Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence.
1 Endgame Logistics  Final Project Presentations  Tuesday, March 19, 3-5, KEC2057  Powerpoint suggested ( to me before class)  Can use your own.
Planning and Verification for Stochastic Processes with Asynchronous Events Håkan L. S. Younes Carnegie Mellon University.
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 1 Chapter 1: The DP Algorithm To do:  sequential decision-making  state.
CSE-473 Artificial Intelligence Partially-Observable MDPS (POMDPs)
Planning and Execution with Phase Transitions Håkan L. S. Younes Carnegie Mellon University Follow-up paper to Younes & Simmons’ “Solving Generalized Semi-Markov.
1 Operations Research Prepared by: Abed Alhameed Mohammed Alfarra Supervised by: Dr. Sana’a Wafa Al-Sayegh 2 nd Semester ITGD4207 University.
Markov Decision Processes1 Definitions; Stationary policies; Value improvement algorithm, Policy improvement algorithm, and linear programming for discounted.
1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.
Behrouz Haji Soleimani Dr. Moradi. Outline What is uncertainty? Some examples Solutions to uncertainty Ignoring uncertainty Markov Decision Process (MDP)
Reinforcement Learning Presentation Markov Games as a Framework for Multi-agent Reinforcement Learning Mike L. Littman Jinzhong Niu March 30, 2004.
Model-based Bayesian Reinforcement Learning in Partially Observable Domains by Pascal Poupart and Nikos Vlassis (2008 International Symposium on Artificial.
CHAPTER 15 SECTION 1 – 2 Markov Models. Outline Probabilistic Inference Bayes Rule Markov Chains.
Decision Making Under Uncertainty Lec #8: Reinforcement Learning UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Most slides by Jeremy.
Carnegie Mellon University
A Tutorial on the Partially Observable Markov Decision Process and Its Applications Lawrence Carin June 7,2006.
Extending PDDL to Model Stochastic Decision Processes Håkan L. S. Younes Carnegie Mellon University.
Decision-Theoretic Planning with Asynchronous Events Håkan L. S. Younes Carnegie Mellon University.
© 2015 McGraw-Hill Education. All rights reserved. Chapter 19 Markov Decision Processes.
1 Introduction to Reinforcement Learning Freek Stulp.
Announcements  Upcoming due dates  Wednesday 11/4, 11:59pm Homework 8  Friday 10/30, 5pm Project 3  Watch out for Daylight Savings and UTC.
Reinforcement Learning Dynamic Programming I Subramanian Ramamoorthy School of Informatics 31 January, 2012.
A Formalism for Stochastic Decision Processes with Asynchronous Events Håkan L. S. YounesReid G. Simmons Carnegie Mellon University.
Some Final Thoughts Abhijit Gosavi. From MDPs to SMDPs The Semi-MDP is a more general model in which the time for transition is also a random variable.
1 (Chapter 3 of) Planning and Control in Stochastic Domains with Imperfect Information by Milos Hauskrecht CS594 Automated Decision Making Course Presentation.
Instructor: Spyros Reveliotis IE7201: Production & Service Systems Engineering Fall 2009 Closure.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Probabilistic Robotics Introduction. SA-1 2 Introduction  Robotics is the science of perceiving and manipulating the physical world through computer-controlled.
Partial Observability “Planning and acting in partially observable stochastic domains” Leslie Pack Kaelbling, Michael L. Littman, Anthony R. Cassandra;
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Achieving Goals in Decentralized POMDPs Christopher Amato Shlomo Zilberstein UMass.
Reinforcement Learning for 3 vs. 2 Keepaway P. Stone, R. S. Sutton, and S. Singh Presented by Brian Light.
Keep the Adversary Guessing: Agent Security by Policy Randomization
Intelligent Agents (Ch. 2)
Analytics and OR DP- summary.
Markov Decision Processes
Markov Decision Processes
Course Logistics CS533: Intelligent Agents and Decision Making
Hidden Markov Models (cont.) Markov Decision Processes
CS 416 Artificial Intelligence
Reinforcement Nisheeth 18th January 2019.
Presentation transcript:

By: Messias, Spaan, Lima Presented by: Mike Plasker DMES – Ocean Engineering

Introduction Robotic Planning under uncertainty MDP solutions Limited real-world application

Assumptions for Multi-Robot teams Communication (Inexpensive, free, or costly) Synchronous and steady state transitions Discretization of environment

A Different Approach States and actions discrete (like MDP) Continuous measure of time State transitions regarded as random ‘events’

Advantages Non-Markovian effects of discretization minimized Fully reactive to changes Communication only required for ‘events’

GSMDPs Generic temporal probability distributions over events Can model concurrent (persistently enabled) events Solvable by discrete-time MDP algorithms by obtaining an equivalent (semi-)Markovian model Avoids negative effects of synchronous alternatives

Why GSMDPs for Robotics Cooperative Robotics requires: Operation in inherently continuous environments Uncertainty in actions (and observations) Joint decision making for optimization Reactive

Definitions multiagent GSMDP: tuple d = number agents S = state space (contains state factors) X = state factors A = set of joint actions T = transition function F = time model R = instantaneous reward function C = cumulative reward rate h = planning over continuous time

Definitions Event in a GSMDP: An abstraction to state transitions that share the same properties Persistently enabled events: Events that are enabled from step ‘t’ to step ‘t+1’, but not triggered at step ‘t’

Common Approach Synchronous action Pre-defined time step Performance Reaction time

GSMDPs Persistently enabled events modeled by allowing their temporal distributions to depend on the time they were enabled Explicit modeling of non-Markovian effects from discretization Communication efficiency

Modeling Events Group state transitions as events to minimize temporal distributions and transitions(battery low) Transition function found by estimating relative frequency of each transition in the event Time model found by timing the transition data Approximated as a phase-type distribution Replaces events with acyclic Markov chains

Events (cont.) Not always possible Decompose events with minimum duration into deterministically timed transitions Can then better approximate using phase-type distribution

Solving a GSMDP Can be viewed as an equivalent discrete-time MDP Almost all solution algorithms for MDPs work

Experiment Robotic soccer Score a goal (reward 150) Passing around obstacle (reward 60)

Results MDP: T = 4s GSMDP

Results No idle time Reduced communication Improved scoring efficiency System failures (zero goals) independent of model

Example Video

Future Work Extend to partially observable domains Apply bilateral phase distributions to increase the class of non-Markovian events that are able to be modeled

Questions?

MESSIAS, J.; SPAAN, M.; LIMA, P.. GSMDPs for Multi-Robot Sequential Decision- Making. AAAI Conference on Artificial Intelligence, North America, jun Available at:. Date accessed: 06 Apr. 2014http://