L ABORATORY FOR P ERCEPTUAL R OBOTICS U NIVERSITY OF M ASSACHUSETTS A MHERST D EPARTMENT OF C OMPUTER S CIENCE Learning Prospective Robot Behavior Shichao.

Slides:



Advertisements
Similar presentations
Learning to Coordinate Behaviors Pattie Maes & Rodney A. Brooks Presented by: Javier Martinez.
Advertisements

CSCTR Session 11 Dana Retová.  Start bottom-up  Create cognition based on sensori-motor interaction ◦ Cohen et al. (1996) – Building a baby ◦ Cohen.
Autonomic Systems Justin Moles, Winter 2006 Enabling autonomic behavior in systems software with hot swapping Paper by: J. Appavoo, et al. Presentation.
Robot Sensor Networks. Introduction For the current sensor network the topography and stability of the environment is uncertain and of course time is.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Solving POMDPs Using Quadratically Constrained Linear Programs Christopher Amato.
Thrust ID: Peer-to-Peer HRI Training and Learning with Humans Rod Grupen (lead) Cynthia Breazeal Nicholas Roy MURI 8 Kickoff Meeting 2007.
Perception and Perspective in Robotics Paul Fitzpatrick MIT Computer Science and Artificial Intelligence Laboratory Humanoid Robotics Group Goal To build.
Yiannis Demiris and Anthony Dearden By James Gilbert.
Module 14 Thought & Language. INTRODUCTION Definitions –Cognitive approach method of studying how we process, store, and use information and how this.
Laboratory for Perceptual Robotics – Department of Computer Science Hierarchical Mechanisms for Robot Programming Shiraj Sen Stephen Hart Rod Grupen Laboratory.
L ABORATORY FOR P ERCEPTUAL R OBOTICS U NIVERSITY OF M ASSACHUSETTS A MHERST D EPARTMENT OF C OMPUTER S CIENCE A Relational Representation for Procedural.
U NIVERSITY OF M ASSACHUSETTS, A MHERST D EPARTMENT OF C OMPUTER S CIENCE Advanced Compilers CMPSCI 710 Spring 2003 More data flow analysis Emery Berger.
L ABORATORY FOR P ERCEPTUAL R OBOTICS U NIVERSITY OF M ASSACHUSETTS A MHERST D EPARTMENT OF C OMPUTER S CIENCE Generalized Grasping and Manipulation Laboratory.
Laboratory for Perceptual Robotics Department of Computer Science University of Massachusetts Amherst Natural Task Decomposition with Intrinsic Potential.
Learning From Demonstration Atkeson and Schaal Dang, RLAB Feb 28 th, 2007.
Mary (Missy) Cummings Humans & Automation Lab
Laboratory for Perceptual Robotics – Department of Computer Science Whole-Body Collision-Free Motion Planning Brendan Burns Laboratory for Perceptual Robotics.
Distributed Q Learning Lars Blackmore and Steve Block.
CS 326 A: Motion Planning Exploring and Inspecting Environments.
Cooperative Q-Learning Lars Blackmore and Steve Block Expertness Based Cooperative Q-learning Ahmadabadi, M.N.; Asadpour, M IEEE Transactions on Systems,
Integrating POMDP and RL for a Two Layer Simulated Robot Architecture Presented by Alp Sardağ.
Hierarchical Reinforcement Learning Ersin Basaran 19/03/2005.
L ABORATORY FOR P ERCEPTUAL R OBOTICS U NIVERSITY OF M ASSACHUSETTS A MHERST D EPARTMENT OF C OMPUTER S CIENCE Intent Recognition as a Basis for Imitation.
U NIVERSITY OF M ASSACHUSETTS, A MHERST D EPARTMENT OF C OMPUTER S CIENCE Emery Berger University of Massachusetts, Amherst Advanced Compilers CMPSCI 710.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Operating Systems CMPSCI 377 Lecture.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Optimal Fixed-Size Controllers for Decentralized POMDPs Christopher Amato Daniel.
U NIVERSITY OF M ASSACHUSETTS, A MHERST D EPARTMENT OF C OMPUTER S CIENCE Advanced Compilers CMPSCI 710 Spring 2003 Data flow analysis Emery Berger University.
Modelling Motivation for Experience-Based Attention Focus in Reinforcement Learning Candidate Kathryn Merrick School of Information Technologies University.
Introduction : ‘Skoll: Distributed Continuous Quality Assurance’ Morimichi Nishigaki.
SLAM: Simultaneous Localization and Mapping: Part II BY TIM BAILEY AND HUGH DURRANT-WHYTE Presented by Chang Young Kim These slides are based on: Probabilistic.
RL via Practice and Critique Advice Kshitij Judah, Saikat Roy, Alan Fern and Tom Dietterich PROBLEM: RL takes a long time to learn a good policy. Teacher.
8/9/20151 DARPA-MARS Kickoff Adaptive Intelligent Mobile Robots Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT.
COMPUTATIONAL MODELING OF INTEGRATED COGNITION AND EMOTION Bob MarinierUniversity of Michigan.
Gerhard K. Kraetzschmar The Cool Science Institute Educational Robotics A Glimpse on Robotics Tutorial Material.
RAISSA, ESPEN, KENNETH, ERLEND. Theory  A longitudinal review of Mobile HCI research Methods. (Kjeldskov, J., 2012)  On the Move with a Magic Thing:
Sérgio Ronaldo Barros dos Santos (ITA-Brazil) Sidney Nascimento Givigi Júnior (RMC-Canada) Cairo Lúcio Nascimento Júnior (ITA-Brazil) Autonomous Construction.
Belief space planning assuming maximum likelihood observations Robert Platt Russ Tedrake, Leslie Kaelbling, Tomas Lozano-Perez Computer Science and Artificial.
A Framework for Distributed Model Predictive Control
General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning Duke University Machine Learning Group Discussion Leader: Kai Ni June 17, 2005.
U NIVERSITY OF M ASSACHUSETTS, A MHERST D EPARTMENT OF C OMPUTER S CIENCE Emery Berger University of Massachusetts, Amherst Advanced Compilers CMPSCI 710.
Towards Cognitive Robotics Biointelligence Laboratory School of Computer Science and Engineering Seoul National University Christian.
Chapter 7 Developing a Core Knowledge Framework
Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field.
DARPA TMR Program Collaborative Mobile Robots for High-Risk Urban Missions Third Quarterly IPR Meeting May 11, 1999 P. I.s: Leonidas J. Guibas and Jean-Claude.
Ongoing Emergence: A Core Concept in Epigenetic Robotics Christopher G. Prince, Nathan A. Helder & George J. Hollich Robert White.
Chapter 7 Developing a Core Knowledge Framework
1 S ystems Analysis Laboratory Helsinki University of Technology Flight Time Allocation Using Reinforcement Learning Ville Mattila and Kai Virtanen Systems.
DARPA ITO/MARS Project Update Vanderbilt University A Software Architecture and Tools for Autonomous Robots that Learn on Mission K. Kawamura, M. Wilkes,
Controlling the Behavior of Swarm Systems Zachary Kurtz CMSC 601, 5/4/
Riga Technical University Department of System Theory and Design Usage of Multi-Agent Paradigm in Multi-Robot Systems Integration Assistant professor Egons.
Cooperative Q-Learning Lars Blackmore and Steve Block Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents Tan, M Proceedings of the.
Learning to Navigate Through Crowded Environments Peter Henry 1, Christian Vollmer 2, Brian Ferris 1, Dieter Fox 1 Tuesday, May 4, University of.
Object Lesson: Discovering and Learning to Recognize Objects Object Lesson: Discovering and Learning to Recognize Objects – Paul Fitzpatrick – MIT CSAIL.
Purposive Behavior Acquisition for a real robot by vision based Reinforcement Learning Minuru Asada,Shoichi Noda, Sukoya Tawarasudia, Koh Hosoda Presented.
Chapter 7. Learning through Imitation and Exploration: Towards Humanoid Robots that Learn from Humans in Creating Brain-like Intelligence. Course: Robots.
COSC 460 – Neural Networks Gregory Caza 17 August 2007.
Chapter 10. The Explorer System in Cognitive Systems, Christensen et al. Course: Robots Learning from Humans On, Kyoung-Woon Biointelligence Laboratory.
Distributed Q Learning Lars Blackmore and Steve Block.
Model Minimization in Hierarchical Reinforcement Learning Balaraman Ravindran Andrew G. Barto Autonomous Learning Laboratory.
Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02.
Speaker Min-Koo Kang March 26, 2013 Depth Enhancement Technique by Sensor Fusion: MRF-based approach.
Situated interaction in an educational setting. Irene Mavrommati Computer Technology Institute Research Unit 3: Applied Information Systems
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Achieving Goals in Decentralized POMDPs Christopher Amato Shlomo Zilberstein UMass.
University of Pennsylvania 1 GRASP Control of Multiple Autonomous Robot Systems Vijay Kumar Camillo Taylor Aveek Das Guilherme Pereira John Spletzer GRASP.
Autonomous Skill Acquisition on a Mobile Manipulator Hauptseminar: Topics in Robotics Jonah Vincke George Konidaris MIT CSAIL Scott Kuindersma.
Learning Fast and Slow John E. Laird
Automation as the Subject of Mechanical Engineer’s interest
Joseph Xu Soar Workshop 31 June 2011
TEACHING TO ENHANCE LEARNING AND DEVELOPMENT
Unsupervised Perceptual Rewards For Imitation Learning
Presentation transcript:

L ABORATORY FOR P ERCEPTUAL R OBOTICS U NIVERSITY OF M ASSACHUSETTS A MHERST D EPARTMENT OF C OMPUTER S CIENCE Learning Prospective Robot Behavior Shichao Ou and Roderic Grupen Laboratory for Perceptual Robotics University of Massachusetts Amherst

L ABORATORY FOR P ERCEPTUAL R OBOTICS U NIVERSITY OF M ASSACHUSETTS A MHERST D EPARTMENT OF C OMPUTER S CIENCE A Developmental Approach Infant Learning –In stages Maturation processes –Parents provide constrained learning contexts Protect Easy  Complex –Motion mobile for newborns –Use brightly colored, easy to pick up objects –Use building blocks –Association of words and objects

L ABORATORY FOR P ERCEPTUAL R OBOTICS U NIVERSITY OF M ASSACHUSETTS A MHERST D EPARTMENT OF C OMPUTER S CIENCE Application in Robotics Framework for Robot Developmental Learning –Role of teacher: setup learning contexts that make target concept conspicuous –Role of robot: acquire concepts, generalize to new contexts by autonomous exploration, provide feedback Control Basis –Robot actions are created using combinations of –Establish stages of learning by time-varying constraints on resources Easy  Complex

L ABORATORY FOR P ERCEPTUAL R OBOTICS U NIVERSITY OF M ASSACHUSETTS A MHERST D EPARTMENT OF C OMPUTER S CIENCE Example Learning to Reach for Objects –Stage 1: SearchTrack Focus attention using single brightly colored object (σ) Limit DOF (τ) to use head ONLY –Stage 2: ReachGrab Limit DOF (τ) to use one arm ONLY –Stage 3: Handedness, Scale- Sensitive Hart et. al, 2008

L ABORATORY FOR P ERCEPTUAL R OBOTICS U NIVERSITY OF M ASSACHUSETTS A MHERST D EPARTMENT OF C OMPUTER S CIENCE Prospective Learning Infant adapts to new situations by prospectively look ahead and predict failure and then learn a repair strategy

L ABORATORY FOR P ERCEPTUAL R OBOTICS U NIVERSITY OF M ASSACHUSETTS A MHERST D EPARTMENT OF C OMPUTER S CIENCE Robot Prospective Learning with Human Guidance S0S0 S1S1 SiSi SnSn SjSj a0a0 a1a1 a i-1 aiai a j-1 ajaj a n-1 S0S0 S1S1 SiSi SnSn SjSj S i1 S in S ij sub-task a0a0 a1a1 a i-1 aiai a j-1 ajaj a n-1 S0S0 S1S1 SiSi SnSn SjSj g(f)=1 g(f)=0 a0a0 a1a1 a i-1 aiai a j-1 ajaj a n-1 Challenge

L ABORATORY FOR P ERCEPTUAL R OBOTICS U NIVERSITY OF M ASSACHUSETTS A MHERST D EPARTMENT OF C OMPUTER S CIENCE A 2D Navigation Domain Problem 30x30 map 6 doors, randomly closed 6 buttons 1 start and 1 goal 3-bit door sensor on robot

L ABORATORY FOR P ERCEPTUAL R OBOTICS U NIVERSITY OF M ASSACHUSETTS A MHERST D EPARTMENT OF C OMPUTER S CIENCE Flat Learning Results Flat Q-Learning –5-bit state (x,y, door-bit1, door-bit2, door-bit3) –4 actions up, down, left, right –Reward 1 for reaching the goal for every step taken –Learning parameter α=0.1, γ=1.0, ε=0.1 Learned solutions after 30,000 episodes

L ABORATORY FOR P ERCEPTUAL R OBOTICS U NIVERSITY OF M ASSACHUSETTS A MHERST D EPARTMENT OF C OMPUTER S CIENCE Prospective Learning Stage 1 –All doors open –Constrain resources to use only (x,y) sensors –Allow agent learn a policy from start to goal S0S0 S1S1 SiSi SnSn SjSj Right DownRight UpRight

L ABORATORY FOR P ERCEPTUAL R OBOTICS U NIVERSITY OF M ASSACHUSETTS A MHERST D EPARTMENT OF C OMPUTER S CIENCE Prospective Learning Stage 2 –Close 1 door –Robot learns the cause of the failure –Robot back tracks and finds an earlier indicator of this cause

L ABORATORY FOR P ERCEPTUAL R OBOTICS U NIVERSITY OF M ASSACHUSETTS A MHERST D EPARTMENT OF C OMPUTER S CIENCE Prospective Learning Stage 2 –Close 1 door –Robot learns the cause of the failure –Robot back tracks and finds an earlier indicator of this cause –Create a sub-task –Learn a new policy to sub- task

L ABORATORY FOR P ERCEPTUAL R OBOTICS U NIVERSITY OF M ASSACHUSETTS A MHERST D EPARTMENT OF C OMPUTER S CIENCE Prospective Learning Stage 2 –Close 1 door –Robot learns the cause of the failure –Robot back tracks and finds an earlier indicator of this cause –Create a sub-task –Learn a new policy to sub- task –Resume original policy

L ABORATORY FOR P ERCEPTUAL R OBOTICS U NIVERSITY OF M ASSACHUSETTS A MHERST D EPARTMENT OF C OMPUTER S CIENCE Prospective Learning Results Learned solutions < 2000 episodes

L ABORATORY FOR P ERCEPTUAL R OBOTICS U NIVERSITY OF M ASSACHUSETTS A MHERST D EPARTMENT OF C OMPUTER S CIENCE Humanoid Robot Manipulation Domain Benefits of Prospective Learning –Adapt to new contexts by maintaining majority of the existing policy –Automatically generates sub-goals –Sub-task can be learned in a completely different state space. –Supports interactive learning

L ABORATORY FOR P ERCEPTUAL R OBOTICS U NIVERSITY OF M ASSACHUSETTS A MHERST D EPARTMENT OF C OMPUTER S CIENCE Conclusion A developmental view to robot learning A framework enables interactive incremental learning in stages Extension to the control basis learning framework using the idea of prospective learning