Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02.

Slides:

Advertisements

Similar presentations

Advertisements

Reactive and Potential Field Planners

Conceptual Clustering

A Hierarchical Multiple Target Tracking Algorithm for Sensor Networks Songhwai Oh and Shankar Sastry EECS, Berkeley Nest Retreat, Jan

PARTITIONAL CLUSTERING

Using Parallel Genetic Algorithm in a Predictive Job Scheduling

Coverage by Directional Sensors Jing Ai and Alhussein A. Abouzeid Dept. of Electrical, Computer and Systems Engineering Rensselaer Polytechnic Institute.

Randomized Sensing in Adversarial Environments Andreas Krause Joint work with Daniel Golovin and Alex Roper International Joint Conference on Artificial.

Agent-based sensor-mission assignment for tasks sharing assets Thao Le Timothy J Norman WambertoVasconcelos

3D-STAF: Scalable Temperature and Leakage Aware Floorplanning for Three-Dimensional Integrated Circuits Pingqiang Zhou, Yuchun Ma, Zhouyuan Li, Robert.

Chapter Learning Objectives

DARPA Mobile Autonomous Robot SoftwareMay Adaptive Intelligent Mobile Robotics William D. Smart, Presenter Leslie Pack Kaelbling, PI Artificial.

Effective Reinforcement Learning for Mobile Robots Smart, D.L and Kaelbing, L.P.

1 Solving problems by searching Chapter 3. 2 Why Search? To achieve goals or to maximize our utility we need to predict what the result of our actions.

Applications of Single and Multiple UAV for Patrol and Target Search. Pinsky Simyon. Supervisor: Dr. Mark Moulin.

Yiannis Demiris and Anthony Dearden By James Gilbert.

Planning under Uncertainty

Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Reinforcement Learning.

© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.

Energy-Efficient Target Coverage in Wireless Sensor Networks Mihaela Cardei, My T. Thai, YingshuLi, WeiliWu Annual Joint Conference of the IEEE Computer.

Watchdog Confident Event Detection in Heterogeneous Sensor Networks Matthew Keally 1, Gang Zhou 1, Guoliang Xing 2 1 College of William and Mary, 2 Michigan.

Cooperative Q-Learning Lars Blackmore and Steve Block Expertness Based Cooperative Q-learning Ahmadabadi, M.N.; Asadpour, M IEEE Transactions on Systems,

Integrating POMDP and RL for a Two Layer Simulated Robot Architecture Presented by Alp Sardağ.

1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.

Hierarchical Reinforcement Learning Ersin Basaran 19/03/2005.

1 Kunstmatige Intelligentie / RuG KI Reinforcement Learning Johan Everts.

ALLIANCE: An Architecture for Fault Tolerant Multirobot Cooperation L. E. Parker, 1998 Presented by Guoshi Li April 25th, 2005.

Multirobot Coordination in USAR Katia Sycara The Robotics Institute

Opportunistic Optimization for Market-Based Multirobot Control M. Bernardine Dias and Anthony Stentz Presented by: Wenjin Zhou.

1 Solving problems by searching Chapter 3. 2 Why Search? To achieve goals or to maximize our utility we need to predict what the result of our actions.

Query Operations: Automatic Global Analysis. Motivation Methods of local analysis extract information from local set of documents retrieved to expand.

Radial Basis Function Networks

Clustering Unsupervised learning Generating “classes”

CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.

Decentralised Coordination of Mobile Sensors School of Electronics and Computer Science University of Southampton Ruben Stranders,

Trust-based Multi-Objective Optimization for Node-to-Task Assignment in Coalition Networks 1 Jin-Hee Cho, Ing-Ray Chen, Yating Wang, and Kevin S. Chan.

DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence.

REINFORCEMENT LEARNING LEARNING TO PERFORM BEST ACTIONS BY REWARDS Tayfun Gürel.

Encoding Robotic Sensor States for Q-Learning using the Self-Organizing Map Gabriel J. Ferrer Department of Computer Science Hendrix College.

Mutual Exclusion in Wireless Sensor and Actor Networks IEEE SECON 2006 Ramanuja Vedantham, Zhenyun Zhuang and Raghupathy Sivakumar Presented.

Hybrid Behavior Co-evolution and Structure Learning in Behavior-based Systems Amir massoud Farahmand (a,b,c) (

DARPA TMR Program Collaborative Mobile Robots for High-Risk Urban Missions Third Quarterly IPR Meeting May 11, 1999 P. I.s: Leonidas J. Guibas and Jean-Claude.

Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者：郝柏翰 2013/01/28.

Chapter 40 Springer Handbook of Robotics, ©2008 Presented by:Shawn Kristek.

1 S ystems Analysis Laboratory Helsinki University of Technology Flight Time Allocation Using Reinforcement Learning Ville Mattila and Kai Virtanen Systems.

A Passive Approach to Sensor Network Localization Rahul Biswas and Sebastian Thrun International Conference on Intelligent Robots and Systems 2004 Presented.

Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.

Distributed Algorithms for Multi-Robot Observation of Multiple Moving Targets Lynne E. Parker Autonomous Robots, 2002 Yousuf Ahmad Distributed Information.

Cooperative Q-Learning Lars Blackmore and Steve Block Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents Tan, M Proceedings of the.

CUHK Learning-Based Power Management for Multi-Core Processors YE Rong Nov 15, 2011.

1 Iterative Integer Programming Formulation for Robust Resource Allocation in Dynamic Real-Time Systems Sethavidh Gertphol and Viktor K. Prasanna University.

CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.

SCALABLE INFORMATION-DRIVEN SENSOR QUERYING AND ROUTING FOR AD HOC HETEROGENEOUS SENSOR NETWORKS Paper By: Maurice Chu, Horst Haussecker, Feng Zhao Presented.

Distributed Q Learning Lars Blackmore and Steve Block.

Behavior-based Multirobot Architectures. Why Behavior Based Control for Multi-Robot Teams? Multi-Robot control naturally grew out of single robot control.

Robust Estimation With Sampling and Approximate Pre-Aggregation Author: Christopher Jermaine Presented by: Bill Eberle.

Reinforcement learning (Chapter 21)

Selection of Behavioral Parameters: Integration of Case-Based Reasoning with Learning Momentum Brian Lee, Maxim Likhachev, and Ronald C. Arkin Mobile Robot.

Global Hybrid Control and Cooperative Mobile Robotics Yi Guo Center for Engineering Science Advanced Research Computer Science and Mathematics Division.

Path Planning Based on Ant Colony Algorithm and Distributed Local Navigation for Multi-Robot Systems International Conference on Mechatronics and Automation.

Planning Under Uncertainty. Sensing error Partial observability Unpredictable dynamics Other agents.

Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning Maxim Likhachev, Michael Kaess, and Ronald C. Arkin Mobile Robot Laboratory.

Efficient Placement and Dispatch of Sensors in a Wireless Sensor Network You-Chiun Wang, Chun-Chi Hu, and Yu-Chee Tseng IEEE Transactions on Mobile Computing.

Chapter 10 Understanding Work Teams

Data Mining – Algorithms: Instance-Based Learning

CS b659: Intelligent Robotics

Reinforcement learning (Chapter 21)

Solving problems by searching

Solving problems by searching

Presentation transcript:

Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02

Motivations Robots are cool. Robot teams are cooler. Robots are hard to program/control. Robot teams are even harder.

Motivations Robotic soccer - hard!

Motivations Diagnose and rebuild the transmission of this 1969 Jaguar E-Type - Really Hard!

Motivations Answer: Robot Learning!

Motivations Challenges: –Very large state spaces –Uncertain credit assignments –Limited training time –Uncertainty in sensing and shared info –Non-deterministic actions –Difficulty in defining appropriate abstractions for learned info –Difficulty of merging info from different robot experiences

Motivations Benefits –Increased robustness –Reduced Complexity –Increased ease of adding new assets to team

Motivations 4 types of learning in robotic systems: –Learning numerical functions for calibrations or parameter adjustments –Learning about the world –Learning to coordinate behaviors –Learning new behaviors

Learning New Cooperative Behaviors Inherently cooperative tasks are difficult to learn! –Utility of the action of a robot dependent on the actions of other robots –Soccer a good example –Cooperative Multi-robot Observation of Multiple Moving Targets (CMOMMC) Scalable

Learning New Cooperative Behaviors CMOMMT Application: –S: a 2-D bounded, enclosed spatial region –V: a team of m robot vehicles w/ 360  field of view w/ limited range. –O(t): a set of n targets in region S at time t –B(t): a matrix such that B ij = 1 if robot i is observing target j at time t. –Sensor coverage is much less than region area

Learning New Cooperative Behaviors Goal: develop algorithm A-CMOMMT –Maximize average number of targets observed at any given time.

Learning New Cooperative Behaviors Human-Generated Solution –Local force vectors Targets attract Teammates repel –Magnitude dependent on distance from robot –Weight reduced if target already being observed –Direction given by summing vectors

Learning New Cooperative Behaviors Results:

Learning New Cooperative Behaviors Distributed, Pessimistic Lazy Q-Learning –No a priori model –Reinforcement learning –Instance-based learning –Assumes lower bound on utility

Learning New Cooperative Behaviors Q-Learning –For each action/state pair, Q(s,a) = 0 –Observe state s. –Do: Select an action and execute Receive reward r Observe new state s’ Update table entry for Q(s,a)

Learning New Cooperative Behaviors Lazy Learning (instance-based learning) –Delays use of gathered info until necessary Randomly built look-up table: (state, action) Reinforcement Function Situation MatcherEvaluation Function World ActionState

Learning New Cooperative Behaviors Pessimistic Algorithm –Rates utility of an action based on lower bound Predict the state following each possible action in current state Compute lower bound of utility of each new state Choose action corresponding to highest lower bound

Learning New Cooperative Behaviors Results –Much better than random –Not as good as human-generated –Significant results

Learning New Cooperative Behaviors Q-Learning w/ VQQL and GLA –State space huge –Want generalized algorithm –2 Phases Learn quantizer Learn Q function

Learning New Cooperative Behaviors Generalized Lloyd Algorithm –Clustering technique Converts continuous state space to discrete –Takes set T of M states –Returns set C of N states –Stopping Criterion (D m - D m+1 )/ D m < 

Learning New Cooperative Behaviors Vector Quantization for Q-Learning –Obtain a set T of examples of states –Design a vector quantizer C using T with GLA –Learn the Q function Choose an action following an exploration strategy Receive experience tuple Quantize the tuple obtaining Update the Q table

Learning New Cooperative Behaviors 2 experiments –Local reward function –Collaborative reward function

Learning New Cooperative Behaviors Results –Competitive –Can handle higher dimension state spaces

Learning for Parameter Adjustment Need robots to perform life-long tasks –Environmental changes –Variations in robot capabilities –Heterogeneity Overlap in capabilities Change in heterogeneity

Learning for Parameter Adjustment Problem def –R: set of n robots –T: set of m tasks –A i : set of actions robot i can perform –H: A i ->T set of functions H, return task completed by action A i –q(a ij ): quality metric –U i : set of actions robot i performs in current mission

Learning for Parameter Adjustment –Given R, T, A i and H, determine set of actions U i that optimizes the performance metric

Learning for Parameter Adjustment ALLIANCE overview –Completely distributed –Behaviors grouped into sets Activated as a set Controlled by high-level motivational behaviors –Impatience and Acquiescence thresholds –Broadcast communication

Learning for Parameter Adjustment L-ALLIANCE overview –Extension of ALLIANCE Automatically updates motivational behaviors –2 problems to solve: How to give robots ability to obtain knowledge about the quality of team member performance How to use team member performance knowledge to select a task to pursue

Learning for Parameter Adjustment –Performance monitors One for every behavior set Monitors how self and others performing

Learning for Parameter Adjustment –Control phases Active learning phase –Random choices –Maximally patient –Catalog monitors and update control parameters Adaptive learning phase –Must make effort to accomplish mission –Acquiesce and become impatient quickly –Still catalog monitors and update control parameters

Learning for Parameter Adjustment –Action Selection Strategy At each iteration, robot r i divides remaining tasks into two categories –Tasks that r i expects to perform better than all others and are not being currently done –All other tasks r i can do Robot r i repeats following until no tasks left to do –Select tasks from the first category, longest first until none left –Select tasks from second category, shortest first

Learning for Parameter Adjustment Results - Box Pushing –Experiment 1 2 identical robots 1 fails

Learning for Parameter Adjustment –Experiment 2 2 different robots Different capabilities –L-ALLIANCE capable of keeping teams working toward goal Changes to composition Changes to ability

Conclusions Lots of challenges left Rewards tantalizing Learning approaches not yet superior to human generated solutions

Questions?