Transfer and Multi-Task Learning in Reinforcement Learning Alessandro LAZARIC “Machine Learning with Interdependent and Non-identically Distributed Data” SequeL Inria Lille – Nord Europe April 7-10, 2015
Reinforcement Learning April 7-10, 2015 A. LAZARIC – Transfer in RL- 2 agent environment critic delay <position, speed><handlebar, pedals><new position, new speed>, advancement Value Function Control Policy
Transfer in Reinforcement Learning April 7-10, 2015 A. LAZARIC – Transfer in RL- 3 agent environment critic delay transfer of knowledge
Transfer in RL is not trivial April 7-10, 2015 A. LAZARIC – Transfer in RL- 4 Techniques developed in supervised learning cannot be always re-used in RL: Many different “objects” that can be transferred (eg, policies, value functions, samples) Tasks may be similar in many different ways Samples are often non-iid “Unsupervised” samples are not well defined Different objectives (eg, exploration-exploitation)
My research (present and future): transfer for exploration-exploitation April 7-10, 2015 A. LAZARIC – Transfer in RL- 5 Motivating problems Intelligent tutoring systems Recommendation systems Computer games Attempted (successful) approaches in multi-armed bandit Identification of finite set of models Transfer of samples Open questions Estimation of the bias for selective transfer Appropriate measure of similarity Exploration vs exploitation vs transfer
Thanks!! Inria Lille – Nord Europe