Learning Parameterized Maneuvers for Autonomous Helicopter Flight Jie Tang, Arjun Singh, Nimbus Goehausen, Pieter Abbeel UC Berkeley
Dynamics Model Optimal Control Overview Target Trajectory Controller
Problem Robotics tasks involve complex trajectories – Stall turn Challenging, nonlinear dynamics
Dynamics Model Optimal Control Overview Target Trajectory Controller Demonstrations
Learning Target Trajectory From Demonstration Height Problem: Demonstrations are suboptimal – Use multiple demonstrations – Current state of the art in helicopter aerobatics (Coates, Abbeel, and Ng, ICML 2008) – Our work: learn parameterized maneuver classes Problem: Demonstrations will be different from desired target trajectory
Example Data
Learning Trajectory HMM-like generative model – Dynamics model used as HMM transition model – Synthetic observations enforce parameterization – Demos are observations of hidden trajectory Problem: how do we align observations to hidden trajectory? Demo 1 Demo 2 Hidden Height 50m
Learning Trajectory Dynamic Time Warping Extended Kalman filter / smoother Repeat Demo 1 Demo 2 Hidden Height 50m
Smoothed Dynamic Time Warping Potential outcome of dynamic time warping: More desirable outcome: Introduce smoothing penalty – Extra dimension in dynamic program
Some demonstrations should contribute more to target trajectory than others – Difficult to tune these observation covariances Learn optimal observation covariances using EM Weighting Demonstrations Target Height
Learned Trajectory Target Height
Dynamics Model Optimal Control Overview Target Trajectory Controller Demonstrations Frequency Sweeps and Step Responses
Learning dynamics Standard helicopter dynamics model estimated from data – Has relatively large errors in aggressive flight regimes After learning target trajectory, we obtain aligned demonstrations – Errors in model are consistent for executions of the same maneuver class Many hidden variables are not modeled explicitly – Airflow, rotor speed, actuator latency Learn corrections to dynamics model along each target trajectory 2G error
Dynamics Model Optimal Control Overview Target Trajectory Controller Standard Dynamics Model + Trajectory-Specific Corrections Frequency Sweeps and Step Responses Optimal Control Receding Horizon Differential Dynamic Programming Demonstrations
Experimental Setup Onboard Offboard Cameras Extended Kalman Filter RHDDP controller 20Hz “Position” 3-axis magnetometer, accelerometer, gyroscope (“Orientation”)
Results: Stall Turn Max speed: 57 mph
Results: Loops
Results: Tic-Tocs
Typical Flight Performance: Stall Turn
Quantitative Evaluation Flight conditions: wind up to 15mph Similar accuracy is maintained for queries very different from our demonstrations – e.g., can learn 60m stall turns from 40m, 80m demonstrations Four or five demonstrations sufficient to cover a wide range of stall turns, loops, and tic-tocs – e.g., four stall turns at 20m, 40m, 60m, 80m sufficient to generate any stall turn between 20m and 80m
Conclusions Presented an algorithm for learning parameterized target trajectories and accurate dynamics models from demonstrations With few demonstrations, can generate a wide variety of novel trajectories Validated on a variety of parameterized aerobatic helicopter maneuvers
Thank you