Presentation is loading. Please wait.

Presentation is loading. Please wait.

Abstract This presentation questions the need for reinforcement learning and related paradigms from machine-learning, when trying to optimise the behavior.

Similar presentations


Presentation on theme: "Abstract This presentation questions the need for reinforcement learning and related paradigms from machine-learning, when trying to optimise the behavior."— Presentation transcript:

1 Abstract This presentation questions the need for reinforcement learning and related paradigms from machine-learning, when trying to optimise the behavior of an agent. We show that it is fairly simple to teach an agent complicated and adaptive behaviors under a free-energy principle. This principle suggests that agents adjust their internal states and sampling of the environment to minimize their free-energy. In this context, free-energy represents a bound on the probability of being in a particular state, given the nature of the agent, or more specifically the model of the environment an agent entails. We show that such agents learn causal structure in the environment and sample it in an adaptive and self-supervised fashion. The result is a policy that reproduces exactly the policies that are optimized by reinforcement learning and dynamic programming. Critically, at no point do we need to invoke the notion of reward, value or utility. We illustrate these points by solving a benchmark problem in dynamic programming; namely the mountain-car problem using just the free-energy principle. The ensuing proof of concept is important because the free-energy formulation also provides a principled account of perceptual inference in the brain and furnishes a unified framework for action and perception. Action and active inference: A free-energy formulation

2 Perception, memory and attention (Bayesian Brain) Action and value learning (Optimum control) causes ( ) Prediction error sensory input S R Q CS reward (US) action S-R S-S The free-energy principle

3 Overview The free energy principle and action Active inference and prediction error Orientation and stabilization Intentional movements Cued movements Goal-directed movements Autonomous movements Forward and inverse models

4 agent - m environment Separated by a Markov blanket External states Internal states Sensation Action Exchange with the environment

5 Perceptual inference Perceptual learning Perceptual uncertainty Action to minimise a bound on surprise The free-energy principle Perception to optimise the bound The conditional density and separation of scales

6 Overview The free energy principle and action Active inference and prediction error Orientation and stabilization Intentional movements Cued movements Goal-directed movements Autonomous movements Forward and inverse models

7 Hierarchical model Top-down messagesBottom-up messages Prediction error Action Active inference: closing the loop (synergy)

8 action perception Action needs access to sensory-level prediction error

9 prediction From reflexes to action action dorsal horndorsal root ventral rootventral horn

10 Overview The free energy principle and action Active inference and prediction error Orientation and stabilization Intentional movements Cued movements Goal-directed movements Autonomous movements Forward and inverse models

11 sensory prediction and errorhidden states (location) cause (perturbing force)perturbation and action Active inference under flat priors (movement with percept) 102030405060 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 time 102030405060 -2 0 1 2 time 102030405060 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 time 102030405060 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 time Visual stimulus Sensory channels

12 sensory prediction and errorhidden states (location) cause (perturbing force)perturbation and action Active inference under tight priors (no movement or percept) 102030405060 0 0.5 1 time 102030405060 -2 0 1 2 time 102030405060 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 time 102030405060 -0.5 0 0.5 1 time

13 under flat priorsunder tight priors action perceived and true perturbation Retinal stabilisation or tracking induced by priors Visual stimulus -2012 -2 0 1 2 displacement -2012 -2 0 1 2 displacement 102030405060 -0.5 0 0.5 1 time 102030405060 -0.5 0 0.5 1 time real perceived

14 Overview The free energy principle and action Active inference and prediction error Orientation and stabilization Intentional movements Cued movements Goal-directed movements Autonomous movements Forward and inverse models

15 sensory prediction and error cause (prior)perturbation and action Active inference under tight priors (movement and percept) Proprioceptive input 102030405060 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 time 102030405060 -0.2 0 0.2 0.4 0.6 0.8 1 time 102030405060 -2 0 1 2 time 102030405060 -0.2 -0.1 0 0.1 0.2 time hidden states (location)

16 robust to perturbation and change in motor gain displacementtime trajectories real perceived action and causes action perceived cause (prior) exogenous cause Self-generated movements induced by priors

17 Overview The free energy principle and action Active inference and prediction error Orientation and stabilization Intentional movements Cued movements Goal-directed movements Autonomous movements Forward and inverse models

18 from reflexes to action Jointed arm Cued movements and sensorimotor integration

19 Trajectory Cued reaching with noisy proprioception

20 0 2 4 6 8 10 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 Noisy proprioception Noisy vision position Conditional precisions Bayes optimal integration of sensory modalities

21 Overview The free energy principle and action Active inference and prediction error Orientation and stabilization Intentional movements Cued movements Goal-directed movements Autonomous movements Forward and inverse models

22 position velocity null-clines -2012 -2 The mountain car problem equations of motion -2012 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 position Height -2012 -1.5 -0.5 0 0.5 1 1.5 2 2.5 3 position Forces Desired location

23 flow and density nullclines velocity -2012 -2 -1.5 -0.5 0 0.5 1 1.5 2 -2012 -2 -1.5 -0.5 0 0.5 1 1.5 2 velocity -2012 -2 -1.5 -0.5 0 0.5 1 1.5 2 -2012 -2 -1.5 -0.5 0 0.5 1 1.5 2 position velocity -2012 -2 -1.5 -0.5 0 0.5 1 1.5 2 position -2012 -2 -1.5 -0.5 0 0.5 1 1.5 2 Uncontrolled Controlled Expected

24 Learning in controlled environment Active inference in uncontrolled environment

25 Using just the free-energy principle and a simple gradient ascent scheme, we have solved a benchmark problem in optimal control theory with just a handful of learning trials. At no point did we use reinforcement learning or dynamic programming. Goal-directed behaviour and trajectories

26 prediction and error time hidden states time velocity behaviouraction position velocity time perturbation and actionbehaviour Action under perturbation

27 Simulating Parkinson's disease?

28 Overview The free energy principle and action Active inference and prediction error Orientation and stabilization Intentional movements Cued movements Goal-directed movements Autonomous movements Forward and inverse models

29 -20-1001020 -30 -20 -10 0 10 20 30 -20-1001020 -30 -20 -10 0 10 20 30 velocity -20-1001020 -30 -20 -10 0 10 20 30 -20-1001020 -30 -20 -10 0 10 20 30 velocity position -20-1001020 -30 -20 -10 0 10 20 30 -20-1001020 -30 -20 -10 0 10 20 30 position velocity controlled velocity before after trajectoriesdensities Learning autonomous behaviour

30 50100150200250 -30 -20 -10 0 10 20 30 40 50 prediction and error time 50100150200250 -30 -20 -10 0 10 20 30 40 50 hidden states time position velocity learnt -20-1001020 -30 -20 -10 0 10 20 30 50100150200250 -10 -5 0 5 10 time perturbation and action Autonomous behaviour under random perturbations

31 Overview The free energy principle and action Active inference and prediction error Orientation and stabilization Intentional movements Cued movements Goal-directed movements Autonomous movements Forward and inverse models

32 Desired and inferred states Sensory prediction error Motor command (action) Forward model (generative model) Inverse model Desired and inferred states Sensory prediction error Forward model Motor command (action) Environment Free-energy formulationForward-inverse formulation Inverse model (control policy) Corollary discharge Efference copy

33 Summary The free-energy can be minimised by action (through changes in states generating sensory input) or perception (through optimising the predictions of that input) The only way that action can suppress free-energy is through reducing prediction error at the sensory level (speaking to a juxtaposition of motor and sensory systems) Action fulfils expectations, which can manifest as an explaining away of prediction error through resampling sensory input (e.g., visual tracking); Or intentional movement, fulfilling expectations furnished by empirical priors. In an optimum control setting a training environment can be constructed by minimising the cross-entropy between the ensemble density and some desired density. This can be learnt and reproduced under active inference.


Download ppt "Abstract This presentation questions the need for reinforcement learning and related paradigms from machine-learning, when trying to optimise the behavior."

Similar presentations


Ads by Google