Why? Interactive robot learning –Facilitate human-robot interaction –Study learner-teacher relations –Study learning and adaptation –Future: enable robots to interactively cooperate in an efficient manner with humans in ways that are natural to humans. –This talk: human affective facial expressions as training signal to robot
Outline Emotion influences thought and behavior EARL: –Studying relation emotion and adaptation in reinforcement learning This talk: Human affect as reinforcement to robot Reinforcement-based robot learning Experiments: –Affect as additional reinforcement –Affect as input to social reward function Results and conclusions –Learning is positively influence, especially in the learned social reward function case
Emotion, Thought and Behavior Emotion –Bodily expression (face, posture) –Action tendencies (Frijda) –Feelings –Cognitive appraisal (Arnolds, Lazarus, Scherer) Affect –Everything to do with emotion, as in affective computing, or –abstraction over emotion (e.g., Russell) composed of Arousal (alertness) Valence (pleasure) –We use the latter definition of affect in the experiment Short timescale Ignore arousal
Emotion, Thought and Behavior Emotion and affect influence thought and behavior: –The kind of thoughts we have Mood congruency –The way we process information Narrow vs. broad look (Goschke & Dreisbach) A lot vs. a little processing effort (Scherer, Forgas) –What we think about things Emotion/mood as information (Clore & Gasper) Emotion as belief anchor (Frijda & Mesquita) –How we learn and adapt Emotion/affect as social reinforcement Emotion/affect as intrinsic reinforcement Emotion as metaparameter to control learning process Empathy
EARL To study relations between emotion and adaptation in context of reinforcement learning. –Simulated robot (but see later comments) –Maze navigation tasks –Webcam and emotion recognition to interpret emotions –Reinforcement learning (RL) approach to robot learning –Robot has own model of emotion –Robot head to express emotion Potential influences experimented with –Evaluate models of emotion in RL setting –Evaluate models of emotional expression. –Test influence of emotion/affect on RL learning parameters –Experiment with communicated and robot emotion as reward
Human Affect as Reinforcement to Robot Interactive robot learning –Learning by Example E.g., imitation learning (see Breazeal & Scassellati) –Learning by Guidance (Thomaz & Breazeal) Future directed learning cues Anticipatory reward –Learning by Feedback Additional reinforcement signal (Breazeal & Velasquez; Isbell et al; Mitsunaga et al; Papudesi & Hubert) In our experiment: affective signal as additional reinforcement
Human Affect as Reinforcement to Robot Affective signal as additional reinforcement –Web cam –Emotional expression analysis –Positive emotion (happy) = reward –Negative emotion (sad) = punishment –So: emotional expression is used in learning as r human, a social reward coming from the human observer Note: –We interpret happy as positively valenced and sad as negatively valenced. –THIS IS A SIMPLIFIED SETUP THAT ENABLES US TO TEST OUR HYPOTHESIS!
Reinforcement-based robot learning Continuous Gridworld –World features placed on grid –Agent has Real coordinates and speed, unlimited locations –Local perception, agent-based perspective, = current state s Task –find food (as usual) Training: Multilayer perceptron networks (MLP) –Input is agents perceived state, s. –Each action (fwd, left, right) has two networks First to train action-value Q a (s) Second to train inverse action-value (value of NOT doing the action) –Value function has network trained to predict Q(s) Action-selection uses action values as predicted by MLPs In terms of representing the world, the perceived state and the actions, this setup is close to real-world robotics.
Reinforcement-based robot learning pathwall food 1/e 2 b
Experiments: Affect as additional reinforcement Test difference between standard agent and social agents –200 trials to learn path to food –Standard agent uses R(s) from environment to update Q a (s) and Q(s). –Social agent uses r human in addition to R(s)
Experiments: Affect as additional reinforcement Three social settings –Moderate social reinforcement (setting a) r human is small Long period of training with r human (trials 20-30) –Strong social reinforcement (setting b) r human is large Short period of training with r human (trials 20-25) –Learned social reinforcement (setting c) r human is used as above and to train R social (s) (an MLP). Period using r human is between 29 and 45. After that, R social (s) is used.
Conclusion A critical learning period can be used to influence robot learning using affective signals, in real-time, in a non-trivial learning environment. This has a benefit on learning Most specifically when the robot learns to predict the social feedback by training a reward function R social (s)
Further work Use affect/emotion as metaparameter to control –Learning rate –Exploration exploitation Differentiate between meanings of negative and positive emotions –Anger: negative feedback due to action of agent –Fear: negative anticipatory feedback –Surprise: strong positive feedback due to action of agent –Frustration: connect to exploration/exploitation rate? Affective Robot-Robot interaction? Use robot to human signals such as hesitation