Learning Parameterized Maneuvers for Autonomous Helicopter Flight Jie Tang, Arjun Singh, Nimbus Goehausen, Pieter Abbeel UC Berkeley.

Slides:



Advertisements
Similar presentations
An Interactive-Voting Based Map Matching Algorithm
Advertisements

Fast Algorithms For Hierarchical Range Histogram Constructions
NUS CS5247 Motion Planning for Camera Movements in Virtual Environments By Dennis Nieuwenhuisen and Mark H. Overmars In Proc. IEEE Int. Conf. on Robotics.
NONLINEAR BACKSTEPPING CONTROL WITH OBSERVER DESIGN FOR A 4 ROTORS HELICOPTER L. Mederreg, F. Diaz and N. K. M’sirdi LRV Laboratoire de Robotique de Versailles,
Adam Coates, Pieter Abbeel, and Andrew Y. Ng Stanford University ICML 2008 Learning for Control from Multiple Demonstrations TexPoint fonts used in EMF.
Luis Mejias, Srikanth Saripalli, Pascual Campoy and Gaurav Sukhatme.
Apprenticeship Learning for Robotic Control, with Applications to Quadruped Locomotion and Autonomous Helicopter Flight Pieter Abbeel Stanford University.
Learning from Demonstrations Jur van den Berg. Kalman Filtering and Smoothing Dynamics and Observation model Kalman Filter: – Compute – Real-time, given.
1 Finding good models for model-based control and optimization Paul Van den Hof Okko Bosgra Delft Center for Systems and Control 17 July 2007 Delft Center.
Design of Attitude and Path Tracking Controllers for Quad-Rotor Robots using Reinforcement Learning Sérgio Ronaldo Barros dos Santos Cairo Lúcio Nascimento.
Bayes Filters Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics TexPoint fonts used in EMF. Read the.
Apprenticeship learning for robotic control Pieter Abbeel Stanford University Joint work with Andrew Y. Ng, Adam Coates, Morgan Quigley.
Attitude Determination - Using GPS. 20/ (MJ)Danish GPS Center2 Table of Contents Definition of Attitude Attitude and GPS Attitude Representations.
MASKS © 2004 Invitation to 3D vision Lecture 11 Vision-based Landing of an Unmanned Air Vehicle.
Fusing Machine Learning & Control Theory With Applications to Smart Buildings & ActionWebs UC Berkeley ActionWebs Meeting November 03, 2010 By Jeremy Gillula.
Using Inaccurate Models in Reinforcement Learning Pieter Abbeel, Morgan Quigley and Andrew Y. Ng Stanford University.
Computing correspondences in order to study spatial and temporal patterns of gene expression Charless Fowlkes UC Berkeley, Computer Science.
An Application of Reinforcement Learning to Autonomous Helicopter Flight Pieter Abbeel, Adam Coates, Morgan Quigley and Andrew Y. Ng Stanford University.
Apprenticeship Learning for the Dynamics Model Overview  Challenges in reinforcement learning for complex physical systems such as helicopters:  Data.
Motion based Correspondence for Distributed 3D tracking of multiple dim objects Ashok Veeraraghavan.
Probability: Review TexPoint fonts used in EMF.
Discriminative Training of Kalman Filters P. Abbeel, A. Coates, M
7. Experiments 6. Theoretical Guarantees Let the local policy improvement algorithm be policy gradient. Notes: These assumptions are insufficient to give.
Exploration and Apprenticeship Learning in Reinforcement Learning Pieter Abbeel and Andrew Y. Ng Stanford University.
Estimation and the Kalman Filter David Johnson. The Mean of a Discrete Distribution “I have more legs than average”
Course AE4-T40 Lecture 5: Control Apllication
Pieter Abbeel and Andrew Y. Ng Reinforcement Learning and Apprenticeship Learning Pieter Abbeel and Andrew Y. Ng Stanford University.
Our acceleration prediction model Predict accelerations: f : learned from data. Obtain velocity, angular rates, position and orientation from numerical.
Kalman Filtering Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics TexPoint fonts used in EMF. Read.
Particle Filters++ TexPoint fonts used in EMF.
Simple set-membership methods and control algorithms applied to robots for exploration Fabrice LE BARS.
IPPW- 9 Royal Observatory of Belgium 20 June Von Karman Institute for Fluid Dynamics Obtaining atmospheric profiles during Mars entry Bart Van Hove.
Sensys 2009 Speaker:Lawrence.  Introduction  Overview & Challenges  Algorithm  Travel Time Estimation  Evaluation  Conclusion.
Muhammad Moeen YaqoobPage 1 Moment-Matching Trackers for Difficult Targets Muhammad Moeen Yaqoob Supervisor: Professor Richard Vinter.
Real-time Dense Visual Odometry for Quadrocopters Christian Kerl
Sérgio Ronaldo Barros dos Santos (ITA-Brazil) Sidney Nascimento Givigi Júnior (RMC-Canada) Cairo Lúcio Nascimento Júnior (ITA-Brazil) Autonomous Construction.
Brian Renzenbrink Jeff Robble Object Tracking Using the Extended Kalman Particle Filter.
The Effects of Ranging Noise on Multihop Localization: An Empirical Study from UC Berkeley Abon.
Vision-based Landing of an Unmanned Air Vehicle
Sérgio Ronaldo Barros dos Santos (ITA-Brazil)
Parameter/State Estimation and Trajectory Planning of the Skysails flying kite system Jesus Lago, Adrian Bürger, Florian Messerer, Michael Erhard Systems.
Apprenticeship Learning for Robotics, with Application to Autonomous Helicopter Flight Pieter Abbeel Stanford University Joint work with: Andrew Y. Ng,
Sensorless Sliding-Mode Control of Induction Motors Using Operating Condition Dependent Models 教 授: 王明賢 學 生: 謝男暉 南台科大電機系.
KDC Arm Project John Kua Kathryn Rivard Benjamin Stephens Katie Strausser.
Apprenticeship Learning for Robotic Control Pieter Abbeel Stanford University Joint work with: Andrew Y. Ng, Adam Coates, J. Zico Kolter and Morgan Quigley.
Institute of Flight Mechanics and Control Barcelona, LISA7 Symposium, June 17th 2008 IFR – University of Stuttgart LISA Pathfinder.
Mobile Robot Navigation Using Fuzzy logic Controller
HQ U.S. Air Force Academy I n t e g r i t y - S e r v i c e - E x c e l l e n c e Improving the Performance of Out-of-Order Sigma-Point Kalman Filters.
Vehicle Segmentation and Tracking From a Low-Angle Off-Axis Camera Neeraj K. Kanhere Committee members Dr. Stanley Birchfield Dr. Robert Schalkoff Dr.
Processing Sequential Sensor Data The “John Krumm perspective” Thomas Plötz November 29 th, 2011.
Confidence Based Autonomy: Policy Learning by Demonstration Manuela M. Veloso Thanks to Sonia Chernova Computer Science Department Carnegie Mellon University.
Theme 2: Data & Models One of the central processes of science is the interplay between models and data Data informs model generation and selection Models.
NCAF Manchester July 2000 Graham Hesketh Information Engineering Group Rolls-Royce Strategic Research Centre.
Chapter 8. Learning of Gestures by Imitation in a Humanoid Robot in Imitation and Social Learning in Robots, Calinon and Billard. Course: Robots Learning.
Optimization-Based Cooperative Multi-Robot Target Tracking with Reasoning about Occlusions Karol Hausman, Gregory Kahn, Sachin Patil, Joerg Mueller, Ken.
Cameron Rowe.  Introduction  Purpose  Implementation  Simple Example Problem  Extended Kalman Filters  Conclusion  Real World Examples.
1 Place Proper DISTRIBUTION STATEMENT Here Integrity  Service  Excellence MAX REVIEW Cooperative Navigation in GPS Denied Environments 19 April 2013.
Tracking Mobile Nodes Using RF Doppler Shifts
A Low-Cost and Fail-Safe Inertial Navigation System for Airplanes Robotics 전자공학과 깡돌가
Learning of Coordination of a Quad-Rotors Team for the Construction of Multiple Structures. Sérgio Ronaldo Barros dos Santos. Supervisor: Cairo Lúcio Nascimento.
Kalman Filter and Data Streaming Presented By :- Ankur Jain Department of Computer Science 7/21/03.
Zhaoxia Fu, Yan Han Measurement Volume 45, Issue 4, May 2012, Pages 650–655 Reporter: Jing-Siang, Chen.
AAAI Spring Symposium : 23 March Brenna D. Argall : The Robotics Institute Learning Robot Motion Control from Demonstration and Human Advice Brenna.
Robotic Calibration of Multi-Sensor Capture System
Using Sensor Data Effectively
Sérgio Ronaldo Barros dos Santos Cairo Lúcio Nascimento Júnior
Velocity Estimation from noisy Measurements
Vehicle Segmentation and Tracking in the Presence of Occlusions
Lecture 17 Kalman Filter.
2. University of Northern British Columbia, Prince George, Canada
Presentation transcript:

Learning Parameterized Maneuvers for Autonomous Helicopter Flight Jie Tang, Arjun Singh, Nimbus Goehausen, Pieter Abbeel UC Berkeley

Dynamics Model Optimal Control Overview Target Trajectory Controller

Problem Robotics tasks involve complex trajectories – Stall turn Challenging, nonlinear dynamics

Dynamics Model Optimal Control Overview Target Trajectory Controller Demonstrations

Learning Target Trajectory From Demonstration Height Problem: Demonstrations are suboptimal – Use multiple demonstrations – Current state of the art in helicopter aerobatics (Coates, Abbeel, and Ng, ICML 2008) – Our work: learn parameterized maneuver classes Problem: Demonstrations will be different from desired target trajectory

Example Data

Learning Trajectory HMM-like generative model – Dynamics model used as HMM transition model – Synthetic observations enforce parameterization – Demos are observations of hidden trajectory Problem: how do we align observations to hidden trajectory? Demo 1 Demo 2 Hidden Height 50m

Learning Trajectory Dynamic Time Warping Extended Kalman filter / smoother Repeat Demo 1 Demo 2 Hidden Height 50m

Smoothed Dynamic Time Warping Potential outcome of dynamic time warping: More desirable outcome: Introduce smoothing penalty – Extra dimension in dynamic program

Some demonstrations should contribute more to target trajectory than others – Difficult to tune these observation covariances Learn optimal observation covariances using EM Weighting Demonstrations Target Height

Learned Trajectory Target Height

Dynamics Model Optimal Control Overview Target Trajectory Controller Demonstrations Frequency Sweeps and Step Responses

Learning dynamics Standard helicopter dynamics model estimated from data – Has relatively large errors in aggressive flight regimes After learning target trajectory, we obtain aligned demonstrations – Errors in model are consistent for executions of the same maneuver class Many hidden variables are not modeled explicitly – Airflow, rotor speed, actuator latency Learn corrections to dynamics model along each target trajectory 2G error

Dynamics Model Optimal Control Overview Target Trajectory Controller Standard Dynamics Model + Trajectory-Specific Corrections Frequency Sweeps and Step Responses Optimal Control Receding Horizon Differential Dynamic Programming Demonstrations

Experimental Setup Onboard Offboard Cameras Extended Kalman Filter RHDDP controller 20Hz “Position” 3-axis magnetometer, accelerometer, gyroscope (“Orientation”)

Results: Stall Turn Max speed: 57 mph

Results: Loops

Results: Tic-Tocs

Typical Flight Performance: Stall Turn

Quantitative Evaluation Flight conditions: wind up to 15mph Similar accuracy is maintained for queries very different from our demonstrations – e.g., can learn 60m stall turns from 40m, 80m demonstrations Four or five demonstrations sufficient to cover a wide range of stall turns, loops, and tic-tocs – e.g., four stall turns at 20m, 40m, 60m, 80m sufficient to generate any stall turn between 20m and 80m

Conclusions Presented an algorithm for learning parameterized target trajectories and accurate dynamics models from demonstrations With few demonstrations, can generate a wide variety of novel trajectories Validated on a variety of parameterized aerobatic helicopter maneuvers

Thank you