AAAI Spring Symposium : 23 March Brenna D. Argall : The Robotics Institute Learning Robot Motion Control from Demonstration and Human Advice Brenna.

Slides:



Advertisements
Similar presentations
Viktor Zhumatiya, Faustino Gomeza,
Advertisements

Manuela Veloso, Anthony Stentz, Alexander Rudnicky Brett Browning, M. Bernardine Dias Faculty Thomas Harris, Brenna Argall, Gil Jones Satanjeev Banerjee.
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
Imbalanced data David Kauchak CS 451 – Fall 2013.
Kshitij Judah, Alan Fern, Tom Dietterich TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: School of EECS, Oregon State.
1 Kshitij Judah, Alan Fern, Tom Dietterich TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: UAI-2012 Catalina Island,
Learning Parameterized Maneuvers for Autonomous Helicopter Flight Jie Tang, Arjun Singh, Nimbus Goehausen, Pieter Abbeel UC Berkeley.
Proactive Learning: Cost- Sensitive Active Learning with Multiple Imperfect Oracles Pinar Donmez and Jaime Carbonell Pinar Donmez and Jaime Carbonell Language.
DARPA Mobile Autonomous Robot SoftwareMay Adaptive Intelligent Mobile Robotics William D. Smart, Presenter Leslie Pack Kaelbling, PI Artificial.
Effective Reinforcement Learning for Mobile Robots Smart, D.L and Kaelbing, L.P.
Jürgen Wolf 1 Wolfram Burgard 2 Hans Burkhardt 2 Robust Vision-based Localization for Mobile Robots Using an Image Retrieval System Based on Invariant.
Anahita Mohseni-Kabir, Sonia Chernova and Charles Rich
Fast Iterative Alignment of Pose Graphs with Poor Initial Estimates Edwin Olson John Leonard, Seth Teller
4/15/2017 Using Gaussian Process Regression for Efficient Motion Planning in Environments with Deformable Objects Barbara Frank, Cyrill Stachniss, Nichola.
Introduction to Mobile Robotics Bayes Filter Implementations Gaussian filters.
Optimizing Estimated Loss Reduction for Active Sampling in Rank Learning Presented by Pinar Donmez joint work with Jaime G. Carbonell Language Technologies.
Probabilistic Robotics: Kalman Filters
12 June, STD( ): learning state temporal differences with TD( ) Lex Weaver Department of Computer Science Australian National University Jonathan.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Sep 5, 2005 Lecture 2.
1 Jun Wang, 2 Sanjiv Kumar, and 1 Shih-Fu Chang 1 Columbia University, New York, USA 2 Google Research, New York, USA Sequential Projection Learning for.
Learning From Demonstration Atkeson and Schaal Dang, RLAB Feb 28 th, 2007.
1 Efficient planning of informative paths for multiple robots Amarjeet Singh *, Andreas Krause +, Carlos Guestrin +, William J. Kaiser *, Maxim Batalin.
Behavior Planning for Character Animation Manfred Lau and James Kuffner Carnegie Mellon University.
Probabilistic Robotics
Machine Learning Motivation for machine learning How to set up a problem How to design a learner Introduce one class of learners (ANN) –Perceptrons –Feed-forward.
Probabilistic Robotics Bayes Filter Implementations Gaussian filters.
1 Presenter: Chien-Chih Chen Proceedings of the 2002 workshop on Memory system performance.
RL via Practice and Critique Advice Kshitij Judah, Saikat Roy, Alan Fern and Tom Dietterich PROBLEM: RL takes a long time to learn a good policy. Teacher.
Tomer Sagi and Avigdor Gal Technion - Israel Institute of Technology Non-binary Evaluation for Schema Matching ER 2012 October 2012, Florence.
Function Approximation for Imitation Learning in Humanoid Robots Rajesh P. N. Rao Dept of Computer Science and Engineering University of Washington,
Business Process Performance Prediction on a Tracked Simulation Model Andrei Solomon, Marin Litoiu– York University.
Decision-Making on Robots Using POMDPs and Answer Set Programming Introduction Robots are an integral part of many sectors such as medicine, disaster rescue.
Simultaneous Localization and Mapping Presented by Lihan He Apr. 21, 2006.
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
Probabilistic Robotics Robot Localization. 2 Localization Given Map of the environment. Sequence of sensor measurements. Wanted Estimate of the robot’s.
Probabilistic Robotics Bayes Filter Implementations Gaussian filters.
Skill Acquisition via Transfer Learning and Advice Taking Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin.
Learning From Demonstration. Robot Learning A good control policy u=  (x,t) is often hard to engineer from first principles Reinforcement learning 
A Confidence-Based Approach to Multi-Robot Demonstration Learning Sonia Chernova Manuela Veloso Carnegie Mellon University Computer Science Department.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Playing GWAP with strategies - using ESP as an example Wen-Yuan Zhu CSIE, NTNU.
Mingyang Zhu, Huaijiang Sun, Zhigang Deng Quaternion Space Sparse Decomposition for Motion Compression and Retrieval SCA 2012.
Automated Critique of Sketched Mechanisms Jon Wetzel and Ken Forbus Qualitative Reasoning Group Northwestern University.
Mobile Robot Localization (ch. 7)
Confidence Based Autonomy: Policy Learning by Demonstration Manuela M. Veloso Thanks to Sonia Chernova Computer Science Department Carnegie Mellon University.
Learning to Navigate Through Crowded Environments Peter Henry 1, Christian Vollmer 2, Brian Ferris 1, Dieter Fox 1 Tuesday, May 4, University of.
1 On Optimal Worst-Case Matching Cheng Long (Hong Kong University of Science and Technology) Raymond Chi-Wing Wong (Hong Kong University of Science and.
DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; January Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence.
Advanced Software Engineering Lecture 4: Process & Project Metrics.
Learning Photographic Global Tonal Adjustment with a Database of Input / Output Image Pairs.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Online Evolutionary Collaborative Filtering RECSYS 2010 Intelligent Database Systems Lab. School of Computer Science & Engineering Seoul National University.
1 Travel Times from Mobile Sensors Ram Rajagopal, Raffi Sevlian and Pravin Varaiya University of California, Berkeley Singapore Road Traffic Control TexPoint.
Reinforcement Learning for Mapping Instructions to Actions S.R.K. Branavan, Harr Chen, Luke S. Zettlemoyer, Regina Barzilay Computer Science and Artificial.
Determining Optimal Processor Speeds for Periodic Real-Time Tasks with Different Power Characteristics H. Aydın, R. Melhem, D. Mossé, P.M. Alvarez University.
Learning: Neural Networks Artificial Intelligence CMSC February 3, 2005.
Learning Sequential Composition Plans Using Reduced Dimensionality Examples Nik A. Melchior AAAI 2009 Spring Symposium Agents that Learn from Human Teachers.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
Learning with Perceptrons and Neural Networks
Learning from Human Boyuan Chen.
An Empirical Study of Learning to Rank for Entity Search
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
Reading: Pedro Domingos: A Few Useful Things to Know about Machine Learning source: /cacm12.pdf reading.
Chapter 12: Automated data collection methods
Joseph Xu Soar Workshop 31 June 2011
An Adaptive Middleware for Supporting Time-Critical Event Response
Junheng, Shengming, Yunsheng 11/09/2018
Topological Signatures For Fast Mobility Analysis
Wellington Cabrera Advisor: Carlos Ordonez
Morteza Kheirkhah University College London
Presentation transcript:

AAAI Spring Symposium : 23 March Brenna D. Argall : The Robotics Institute Learning Robot Motion Control from Demonstration and Human Advice Brenna D. Argall Dr. Brett Browning Prof. Manuela Veloso School of Computer Science Carnegie Mellon University

Motion Control for Mobile Robots AAAI Spring Symposium : 23 March Brenna D. Argall : The Robotics Institute Challenges: - Noisy Sensors - Non-deterministic actions - Complex motion trajectories - Development requirements - lots of tuning - lots of expertise - Want complex behaviors

Learning from Demonstration AAAI Spring Symposium : 23 March Brenna D. Argall : The Robotics Institute Benefits: - Representation - Demonstration - Successful robot applications Common Errors: - Correspondence - Undemonstrated state - Suboptimal teacher

Our Approach AAAI Spring Symposium : 23 March Brenna D. Argall : The Robotics Institute Multiple feedback types Focused application of the feedback Human teacher evaluation Multiple feedback rounds (practice runs)

Update from Execution Experience AAAI Spring Symposium : 23 March Brenna D. Argall : The Robotics Institute More teacher demonstrations. [Calinon & Billard] [Chernova & Veloso] [Grollman & Jenkins] Populates undemonstrated areas; clarifies ambiguous areas. Does not address poor correspondence or suboptimal demonstrator. Requires revisiting state. State reward. Correcting an execution. [Chernova & Veloso] [Nicolescu & Mataric] Addresses poor correspondence and suboptimal demonstrator. Does not require revisiting state (hopefully). Applied only to discrete-valued, infrequently sampled, action domains. Preview: Advice-Operators Applied to continuous-valued, frequently sampled, action domains.

Feedback Types AAAI Spring Symposium : 23 March Brenna D. Argall : The Robotics Institute Critique Correction Binary Continuous-valued Continuity? Adjust policy use of existing data Incorporation? Generate new data, rederive policy Low Information amount? High Fine Granularity? Fine: High frequency, strict data association.

Example LfD Policy Derivation AAAI Spring Symposium : 23 March Brenna D. Argall : The Robotics Institute Observations Actions Select action Demonstration Dataset: Dataset Point 1-NN Policy Target Behavior Query Point

Feedback: Critique AAAI Spring Symposium : 23 March Brenna D. Argall : The Robotics Institute Observations Actions Pro: Credit dataset points Con: No directionality w.r.t. query Con: No indication of preferred action Con: Restricted to dataset actions Dataset Point 1-NN Policy Target Behavior Query Point Pre-feedback Policy

Advice-Operators AAAI Spring Symposium : 23 March Brenna D. Argall : The Robotics Institute - Defined jointly between the teacher and student - Perform mathematical computations on student executions - Produce new synthesized data Example: Increase translational speed Operator Increase speed

Feedback: Correction AAAI Spring Symposium : 23 March Brenna D. Argall : The Robotics Institute Observations Suppose receives correction Add to dataset Dataset Point 1-NN Policy Target Behavior Query Point Pre-feedback Policy Pro: Credit dataset points Con: No directionality w.r.t. query Con: No indication of preferred action Con: Restricted to dataset actions Pro: Indication of preferred action Pro: Not restricted to dataset actions.

More Complex Advice-Operators AAAI Spring Symposium : 23 March Brenna D. Argall : The Robotics Institute Observations Actions “Slow down and turn faster”

Algorithm: Binary Critiquing AAAI Spring Symposium : 23 March Brenna D. Argall : The Robotics Institute Argall et al. Learning from Demonstration with the Critique of a Human teacher. HRI Criqiue feedback. Regression type: 1-Nearest Neighbor (Must be able to credit predicting dataset points.)

BC: Empirical Validation AAAI Spring Symposium : 23 March Brenna D. Argall : The Robotics Institute (0.0, 0.0) (5.0, 0.0) Robot Trajectory Ball Trajectory (0.0, 5.0) Motion Interception, Simulation Task : observation : action : policy

Segment Selection AAAI Spring Symposium : 23 March Brenna D. Argall : The Robotics Institute Inefficient execution (pre-feedback) Efficient execution (post-feedback) Segment selection Pretty straightforward to define metrics that evaluate overall performance... But to credit the contributing dataset points is not straightforward. Using a human to select execution segments is similar to solving reward back- propagation.

Algorithm: Advice-Operator Policy Improvement (A-OPI) AAAI Spring Symposium : 23 March Brenna D. Argall : The Robotics Institute Argall et al. Learning Robot Motion Control with Demonstration and Advice-Operators. IROS Corrective Feedback. Regression type: Locally Weighted Learning (No restrictions: any type possible.)

AAAI Spring Symposium : 23 March Brenna D. Argall : The Robotics Institute A-OPI: Empirical Implementation Spatial positioning task with a Segway RMP robot

AAAI Spring Symposium : 23 March Brenna D. Argall : The Robotics Institute A-OPI: Improvement with Corrections Smaller Datasets More Precise Executions

Conclusions AAAI Spring Symposium : 23 March Brenna D. Argall : The Robotics Institute Development of multiple feedback types to address LfD error sources. Binary critiques. Corrective advice, via advice-operators. Implementation of algorithms that incorporate each feedback type. Algorithm Binary Critiquing (BC) Algorithm Advice-Operator Policy Improvement (A-OPI) Empirical validation shows performance improvement with feedback. Simulated motion interception task (BC). Segway RMP spatial positioning task (A-OPI). Techniques appropriate for low-level motion control on a mobile robot.

Thank you! AAAI Spring Symposium : 23 March Brenna D. Argall : The Robotics Institute Questions?