Learning to Navigate Through Crowded Environments Peter Henry 1, Christian Vollmer 2, Brian Ferris 1, Dieter Fox 1 Tuesday, May 4, 2010 1 University of.

Slides:

Advertisements

Similar presentations

1 University of Southern California Keep the Adversary Guessing: Agent Security by Policy Randomization Praveen Paruchuri University of Southern California.

Advertisements

Sensor-Based Abnormal Human-Activity Detection Authors: Jie Yin, Qiang Yang, and Jeffrey Junfeng Pan Presenter: Raghu Rangan.

1. Algorithms for Inverse Reinforcement Learning 2

Patch to the Future: Unsupervised Visual Prediction

Monte Carlo Localization for Mobile Robots Karan M. Gupta 03/10/2004

4/15/2017 Using Gaussian Process Regression for Efficient Motion Planning in Environments with Deformable Objects Barbara Frank, Cyrill Stachniss, Nichola.

Jur van den Berg, Stephen J. Guy, Ming Lin, Dinesh Manocha University of North Carolina at Chapel Hill Optimal Reciprocal Collision Avoidance (ORCA)

1 Reactive Pedestrian Path Following from Examples Ronald A. Metoyer Jessica K. Hodgins Presented by Stephen Allen.

What Are Partially Observable Markov Decision Processes and Why Might You Care? Bob Wall CS 536.

Reinforcement Learning & Apprenticeship Learning Chenyi Chen.

Planning under Uncertainty

Apprenticeship learning for robotic control Pieter Abbeel Stanford University Joint work with Andrew Y. Ng, Adam Coates, Morgan Quigley.

Paper Discussion: “Simultaneous Localization and Environmental Mapping with a Sensor Network”, Marinakis et. al. ICRA 2011.

Toward Object Discovery and Modeling via 3-D Scene Comparison Evan Herbst, Peter Henry, Xiaofeng Ren, Dieter Fox University of Washington; Intel Research.

Reinforcement Learning

Pieter Abbeel and Andrew Y. Ng Apprenticeship Learning via Inverse Reinforcement Learning Pieter Abbeel Stanford University [Joint work with Andrew Ng.]

Apprenticeship Learning by Inverse Reinforcement Learning Pieter Abbeel Andrew Y. Ng Stanford University.

Apprenticeship Learning by Inverse Reinforcement Learning Pieter Abbeel Andrew Y. Ng Stanford University.

Pieter Abbeel and Andrew Y. Ng Apprenticeship Learning via Inverse Reinforcement Learning Pieter Abbeel Stanford University [Joint work with Andrew Ng.]

Probabilistic Robotics

Crowds Andrew Kaufman Michael Welsman-Dinelle. What is a crowd? A group of agents performing actions. Agents can affect each other. Agent actions may.

Distributed Q Learning Lars Blackmore and Steve Block.

Cooperative Q-Learning Lars Blackmore and Steve Block Expertness Based Cooperative Q-learning Ahmadabadi, M.N.; Asadpour, M IEEE Transactions on Systems,

4/1 Agenda: Markov Decision Processes (& Decision Theoretic Planning)

1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.

Continuum Crowds Adrien Treuille, Siggraph 王上文.

High Speed Obstacle Avoidance using Monocular Vision and Reinforcement Learning Jeff Michels Ashutosh Saxena Andrew Y. Ng Stanford University ICML 2005.

Pieter Abbeel and Andrew Y. Ng Apprenticeship Learning via Inverse Reinforcement Learning Pieter Abbeel and Andrew Y. Ng Stanford University.

Algorithms For Inverse Reinforcement Learning Presented by Alp Sardağ.

Markov Decision Processes Value Iteration Pieter Abbeel UC Berkeley EECS TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:

Pieter Abbeel and Andrew Y. Ng Reinforcement Learning and Apprenticeship Learning Pieter Abbeel and Andrew Y. Ng Stanford University.

Reinforcement Learning Yishay Mansour Tel-Aviv University.

Exploration in Reinforcement Learning Jeremy Wyatt Intelligent Robotics Lab School of Computer Science University of Birmingham, UK

Aeronautics & Astronautics Autonomous Flight Systems Laboratory All slides and material copyright of University of Washington Autonomous Flight Systems.

Ioannis Karamouzas, Roland Geraerts, Mark Overmars Indicative Routes for Path Planning and Crowd Simulation.

Planning and Verification for Stochastic Processes with Asynchronous Events Håkan L. S. Younes Carnegie Mellon University.

General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning Duke University Machine Learning Group Discussion Leader: Kai Ni June 17, 2005.

TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta.

Apprenticeship Learning for Robotic Control Pieter Abbeel Stanford University Joint work with: Andrew Y. Ng, Adam Coates, J. Zico Kolter and Morgan Quigley.

Reinforcement Learning

Approximate Dynamic Programming Methods for Resource Constrained Sensor Management John W. Fisher III, Jason L. Williams and Alan S. Willsky MIT CSAIL.

Adrian Treuille, Seth Cooper, Zoran Popović 2006 Walter Kerrebijn

© D. Weld and D. Fox 1 Reinforcement Learning CSE 473.

Cooperative Q-Learning Lars Blackmore and Steve Block Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents Tan, M Proceedings of the.

CUHK Learning-Based Power Management for Multi-Core Processors YE Rong Nov 15, 2011.

A Robust Method for Lane Tracking Using RANSAC James Ian Vaughn Daniel Gicklhorn CS664 Computer Vision Cornell University Spring 2008.

Robotics Club: 5:30 this evening

Wouter G. van Toll Atlas F. Cook IV Roland Geraerts Realistic Crowd Simulation with Density-Based Path Planning ICT.OPEN / ASCI October 22nd, 2012.

Distributed Q Learning Lars Blackmore and Steve Block.

Decision Theoretic Planning. Decisions Under Uncertainty  Some areas of AI (e.g., planning) focus on decision making in domains where the environment.

Reinforcement learning (Chapter 21)

1 Estimating Empirical Unit Hydrographs (and More) Using AB_OPT LMRFC Calibration Workshop March 10-13, 2009.

Abstract LSPI (Least-Squares Policy Iteration) works well in value function approximation Gaussian kernel is a popular choice as a basis function but can.

Traffic Models Alaa Hleihel Daniel Mishne /32.

On-Line Markov Decision Processes for Learning Movement in Video Games

M. Lopes (ISR) Francisco Melo (INESC-ID) L. Montesano (ISR)

Generative Adversarial Imitation Learning

CS b659: Intelligent Robotics

Reinforcement learning (Chapter 21)

Schedule for next 2 weeks

Reinforcement learning (Chapter 21)

Probabilistic Robotics

Timothy Boger and Mike Korostelev

Daniel Brown and Scott Niekum The University of Texas at Austin

Roland Geraerts and Mark Overmars CASA’08

Reinforcement Learning with Partially Known World Dynamics

Apprenticeship Learning via Inverse Reinforcement Learning

Intrinsically Motivated Collective Motion

Unsupervised Perceptual Rewards For Imitation Learning

Presentation transcript:

Learning to Navigate Through Crowded Environments Peter Henry 1, Christian Vollmer 2, Brian Ferris 1, Dieter Fox 1 Tuesday, May 4, University of Washington, Seattle, USA 2 Ilmenau University of Technology, Germany

The Goal Enable robot navigation within crowded environments

Motivation Robots should move naturally and predictably within crowded environments Robots should move naturally and predictably within crowded environments Move amongst people in a socially transparent way Move amongst people in a socially transparent way More efficient and safer motion More efficient and safer motion Humans trade off various factors Humans trade off various factors To move with the flow To move with the flow To avoid high density areas To avoid high density areas To walk on the left/right side To walk on the left/right side To reach the goal To reach the goal

Challenge Humans naturally balance between various factors Humans naturally balance between various factors Relatively easy to list factors Relatively easy to list factors But they can’t specify how they are making the tradeoff But they can’t specify how they are making the tradeoff Previous work typically uses heuristics and parameters are hand-tuned Previous work typically uses heuristics and parameters are hand-tuned Shortest path with collision avoidance [Burgard, et al., AI 1999] Shortest path with collision avoidance [Burgard, et al., AI 1999] Track and follow a single person [Kirby, et al., HRI 2007] Track and follow a single person [Kirby, et al., HRI 2007] Follow people moving in same direction [Mueller, et al., CogSys 2008] Follow people moving in same direction [Mueller, et al., CogSys 2008]

Contribution Learn how humans trade off various factors Learn how humans trade off various factors A framework for learning to navigate as humans do within crowded environments A framework for learning to navigate as humans do within crowded environments Extension of Maximum Entropy Inverse Reinforcement Learning [Ziebart, et al., AAAI 2008] to incorporate: Extension of Maximum Entropy Inverse Reinforcement Learning [Ziebart, et al., AAAI 2008] to incorporate: Limited locally observable area Limited locally observable area Dynamic crowd flow features Dynamic crowd flow features

Markov Decision Processes States States Actions Actions Rewards / Costs Rewards / Costs (Transition Probabilities) (Transition Probabilities) (Discount Factor) (Discount Factor) S0S0 S0S0 S1S1 S1S1 S2S2 S2S2 S3S3 S3S3 Goal

Navigating in a Crowd as an MDP States s i States s i In crowd scenario: Grid cell + orientation In crowd scenario: Grid cell + orientation Actions a i,j from s i to s j Actions a i,j from s i to s j In crowd scenario: Move to adjacent cell In crowd scenario: Move to adjacent cell Cost = An unknown linear combination of action features Cost = An unknown linear combination of action features Cost weights to be learned: θ Cost weights to be learned: θ Path: τ Path: τ Features: f τ Features: f τ

Inverse Reinforcement Learning Inverse Reinforcement Learning (IRL): Inverse Reinforcement Learning (IRL): Given: The MDP structure and a set of example paths Given: The MDP structure and a set of example paths Find: The reward function resulting in the same behavior Find: The reward function resulting in the same behavior (Also called “Inverse Optimal Control”) (Also called “Inverse Optimal Control”) Has been previously applied with success Has been previously applied with success Lane changing [Abbeel ICML 2004] Lane changing [Abbeel ICML 2004] Parking lot navigation [Abbeel IROS 2008] Parking lot navigation [Abbeel IROS 2008] Driving route choice and prediction [Ziebart AAAI 2008] Driving route choice and prediction [Ziebart AAAI 2008] Pedestrian route prediction [Ziebart IROS 2009] Pedestrian route prediction [Ziebart IROS 2009]

Exponential distribution over paths: Exponential distribution over paths: Learning: Learning: Gradient: Match observed and expected feature counts Gradient: Match observed and expected feature counts Maximum Entropy IRL

Locally Observable Features It is unrealistic to assume the agent has global knowledge of the crowd It is unrealistic to assume the agent has global knowledge of the crowd Contrast: Continuum Crowd Simulator explicitly finds a global solution for the entire crowd Contrast: Continuum Crowd Simulator explicitly finds a global solution for the entire crowd We do assume knowledge of the map itself We do assume knowledge of the map itself Training: Only provide flow features for small radius around current position Training: Only provide flow features for small radius around current position Assumes that these are the features available to the “expert” Assumes that these are the features available to the “expert” A single demonstration path becomes many small demonstrations of locally motivated paths A single demonstration path becomes many small demonstrations of locally motivated paths

Locally Observable Dynamic Features Crowd flow changes as the agent moves Crowd flow changes as the agent moves Locally observable dynamic feature training: Locally observable dynamic feature training: 1. Update flow features within local horizon 2. Compute feature gradient within grid 3. Perform stochastic update of weights 4. Take the next step of the observed path

Locally Observable Dynamic IRL The path probability decomposes into many short paths over the current features in the locally observable horizon The path probability decomposes into many short paths over the current features in the locally observable horizon Decompose over timesteps Local Horizon Features for actions within horizon at time t

Locally Observable Dynamic Gradient Uses current estimate of features at time t Uses current estimate of features at time t Computes gradient only within local horizon H Computes gradient only within local horizon H Observed features within H Expected features for actions within H

Map and Features Each grid cell encompasses 8 oriented states Each grid cell encompasses 8 oriented states Allows for flow features relative to orientation Allows for flow features relative to orientation Features Features Distance Distance Crowd flow speed and direction Crowd flow speed and direction Crowd density Crowd density (many others possible…) (many others possible…) Chosen as being reasonable to obtain from current sensors Chosen as being reasonable to obtain from current sensors

Crowd Simulator [Continuum Crowds, Treuille et al., SIGGRAPH 2006]

Simulator Environment

Experimental Setup We used ROS [Willow Garage] to integrate the crowd simulator and IRL learning and planner 1. Extract individual crowd traces and observable features 2. Learn feature weights with our IRL algorithm 3. Use weights for a simulated robot in test scenarios Planning is A* search Planning is A* search Re-planning occurs every grid cell with updated features Re-planning occurs every grid cell with updated features The robot is represented to the crowd simulator as just another person for realistic reactions from the crowd The robot is represented to the crowd simulator as just another person for realistic reactions from the crowd

Quantitative Results Measure similarity to “human” path Measure similarity to “human” path Shortest Path (baseline): Ignores crowd Shortest Path (baseline): Ignores crowd Learned Path: The path from our learned planner Learned Path: The path from our learned planner Mean / Maximum Difference: Over all path cells, difference to closest “human” path cell Mean / Maximum Difference: Over all path cells, difference to closest “human” path cell Shortest Path Learned Path Improvement Mean Difference % Maximum Difference % (Difference is significant at p=0.05 level)

Mall Scenario (Video)

Lane Formation (Video)

Future Work Train on real crowd data Train on real crowd data Overhead video + tracking? Overhead video + tracking? Wearable sensors to mimic robot sensor input? Wearable sensors to mimic robot sensor input? Implement on actual robot Implement on actual robot Is the method effective for raw sensor data? Is the method effective for raw sensor data? Which are the most useful features? Which are the most useful features? Pedestrian prediction Pedestrian prediction Compare / incorporate other recent work [Ziebart IROS 2009] Compare / incorporate other recent work [Ziebart IROS 2009]

Conclusion We have presented a framework for learning to imitate human behavior from example traces We have presented a framework for learning to imitate human behavior from example traces We learn weights that produce paths matching observed behavior from whatever features are made available We learn weights that produce paths matching observed behavior from whatever features are made available Our inverse reinforcement learning algorithm handles locally observable dynamic features Our inverse reinforcement learning algorithm handles locally observable dynamic features Resulting paths are more similar to observed human paths Resulting paths are more similar to observed human paths