The free-energy principle: a rough guide to the brain? K Friston

Slides:



Advertisements
Similar presentations
The Helmholtz Machine P Dayan, GE Hinton, RM Neal, RS Zemel
Advertisements

Thomas Trappenberg Autonomous Robotics: Supervised and unsupervised learning.
Pattern Recognition and Machine Learning
The free-energy principle: a rough guide to the brain? Karl Friston
Artificial Spiking Neural Networks
Visual Recognition Tutorial
Visual Recognition Tutorial
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
CSC2535: 2013 Advanced Machine Learning Lecture 3a: The Origin of Variational Bayes Geoffrey Hinton.
Computational and Physiological Models Part 1
Abstract We start with a statistical formulation of Helmholtz’s ideas about neural energy to furnish a model of perceptual inference and learning that.
The free-energy principle: a rough guide to the brain? K Friston Summarized by Joon Shik Kim (Thu) Computational Models of Intelligence.
Model Inference and Averaging
Natural Gradient Works Efficiently in Learning S Amari (Fri) Computational Modeling of Intelligence Summarized by Joon Shik Kim.
Kalman Filter (Thu) Joon Shik Kim Computational Models of Intelligence.
Mean Field Variational Bayesian Data Assimilation EGU 2012, Vienna Michail Vrettas 1, Dan Cornford 1, Manfred Opper 2 1 NCRG, Computer Science, Aston University,
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Abstract This talk summarizes recent attempts to integrate action and perception within a single optimization framework. We start with a statistical formulation.
Discriminative Training and Acoustic Modeling for Automatic Speech Recognition - Chap. 4 Discriminative Training Wolfgang Macherey Von der Fakult¨at f¨ur.
Motor Control. Beyond babbling Three problems with motor babbling: –Random exploration is slow –Error-based learning algorithms are faster but error signals.
Abstract We offer a formal treatment of choice behaviour based on the premise that agents minimise the expected free energy of future outcomes. Crucially,
Abstract Predictive coding models and the free-energy principle, suggests that cortical activity in sensory brain areas reflects the precision of prediction.
Abstract This presentation questions the need for reinforcement learning and related paradigms from machine-learning, when trying to optimise the behavior.
BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) Direction (deg) Activity
How much about our interactions with – and experience of – our world can be deduced from basic principles? This talk reviews recent attempts to understand.
Reasons to be careful about reward A flow (policy) cannot be specified with a scalar function of states: the fundamental theorem of vector calculus – aka.
Dropout as a Bayesian Approximation
Abstract We suggested recently that attention can be understood as inferring the level of uncertainty or precision during hierarchical perception. In.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Discrete-time Random Signals
Ch. 5 Bayesian Treatment of Neuroimaging Data Will Penny and Karl Friston Ch. 5 Bayesian Treatment of Neuroimaging Data Will Penny and Karl Friston 18.
6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Can small quantum systems learn? NATHAN WIEBE & CHRISTOPHER GRANADE, DEC
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
Abstract How much about our interactions with – and experience of – our world can be deduced from basic principles? This talk reviews recent attempts to.
A probabilistic approach to cognition
Learning Deep Generative Models by Ruslan Salakhutdinov
Variational filtering in generated coordinates of motion
Nicolas Alzetta CoNGA: Cognition and Neuroscience Group of Antwerp
Dimension Review Many of the geometric structures generated by chaotic map or differential dynamic systems are extremely complex. Fractal : hard to define.
Model Inference and Averaging
Department of Civil and Environmental Engineering
LECTURE 10: EXPECTATION MAXIMIZATION (EM)
Probabilistic Robotics
Of Probability & Information Theory
Data Mining Lecture 11.
Latent Variables, Mixture Models and EM
Statistical Learning Dong Liu Dept. EEIS, USTC.
Dynamic Causal Modelling (DCM): Theory
Information Based Criteria for Design of Experiments
Akio Utsugi National Institute of Bioscience and Human-technology,
Computational models for imaging analyses
Bayesian Models in Machine Learning
Probabilistic Models with Latent Variables
Filtering and State Estimation: Basic Concepts
Wellcome Trust Centre for Neuroimaging University College London
Dynamic Causal Model for evoked responses in M/EEG Rosalyn Moran.
Confidence as Bayesian Probability: From Neural Origins to Behavior
A Switching Observer for Human Perceptual Estimation
Boltzmann Machine (BM) (§6.4)
Parametric Methods Berlin Chen, 2005 References:
Bayesian inference J. Daunizeau
A Switching Observer for Human Perceptual Estimation
Bayesian vision Nisheeth 14th February 2019.
Timescales of Inference in Visual Adaptation
the performance of weather forecasts
CSC 578 Neural Networks and Deep Learning
Yalchin Efendiev Texas A&M University
Presentation transcript:

The free-energy principle: a rough guide to the brain? K Friston Computational Modeling of Intelligence 11.03.04.(Fri) Summarized by Joon Shik Kim

Sufficient Statistics Quantities which are sufficient to parameterise a probability density (e.g., mean and covariance of a Gaussian density).

Surprise or self-information is the negative log-probability of an outcome. An improbable outcome is therefore surprising. : sensory input : action

Kullback-Leibler Divergence Information divergence, information gain, cross or relative entropy is a non-commutative measure of the difference between two probability distributions.

Conditional Density or posterior density is the probability distribution of causes or model parameters, given some data; i.e., a probabilistic mapping from observed data to causes. : evidence : hypothesis : prior : posterior

Generative Model or forward model is a probabilistic mapping from causes to observed consequences (data). It is usually specified in terms of the likelihood of getting some data given their causes (parameters of a model) and priors on the parameters

Prior The probability distribution or density on the causes of data that encode beliefs about those causes prior to observing the data. or

Empirical Priors Priors that are induced by hierarchical models; they provide constraints on the recognition density in the usual way but depend on the data.

Bayesian Surprise A measure of salience based on the divergence between the recognition and prior densities. It measures the information in the data that can be recognised.

Entropy The average surprise of outcomes sampled from a probability distribution or density. A density with low entropy means, on average, the outcome is relatively predictable.

Ergodic A process is ergodic if its long term time-average converges to its ensemble average. Ergodic processes that evolve for a long time forget their initial states.

Free-energy An information theory measure that bounds (is greater than) the surprise on sampling some data, given a generative model.

Generalised Coordinates of motion cover the value of a variable, in its motion, acceleration, jerk and higher orders of motion. A point in generalised coordinates corresponds to a path or trajectory over time.

Gradient Descent An optimization scheme that finds a minimum of a function by changing its arguments in proportion to the negative of the gradient of the function at the current value.

Helmholtz Machine Device or scheme that uses a generative model to furnish a recognition density. They learn hidden structure in data by optimising the parameters of generative models.

Stochastic The successive states of stochastic processes that are governed by random effects.

Free Energy

Dynamic Model of World and Recognition

Neuronal Architecture

What is the computational role of neuromodulation? Previous treatments suggest that modulatory neurotransmitter have distinct roles; for example, ‘dopamine signals the error in reward prediction, serotonin controls the time scale of reward prediction, noradrenalin controls the randomness in action selection, and acetylcholine controls the speed of memory update. This contrasts with a single role in encoding precision above. Can the apparently diverse functions of these neurotransmitters be understood in terms of one role (encoding precision) in different parts of the brain?

Can we entertain ambiguous percepts? Although not an integral part of the free-energy principle, we claim the brain uses unimodal recognition densities to represent one thing at a time. Although, there is compelling evidence for bimodal ‘priors’ in sensorimotor learning, people usually assume the ‘recognition’ density collapses to a single percept, when sensory information becomes available. The implicit challenge here is to find any electrophysiological or psychological evidence for multimodal recognition densities.

Does avoiding surprise suppress salient information? No; a careful analysis of visual search and attention suggests that: ‘only data observations which substantially affect the observer’s beliefs yield (Bayesian) surprise, irrespectively of how rare or informative in Shannon’s sense these observations are.’ This is consistent with active sampling of things we recognize (to reduce free-energy). However, it remains an interesting challenge to formally relate Bayesian surprise to the free-energy bound on (Shannon) surprise. A key issue here is whether saliency can be shown to depend on top-down perceptual expectations.

Which optimisation schemes does the brain use? We have assumed that the brain uses a deterministic gradient descent on free-energy to optimise action and perception. However, it might also use stochastic searches; sampling the sensorium randomly for a percept with low free-energy. Indeed, there is compelling evidence that our eye movements implement an optimal stochastic strategy. This raises interesting questions about the role of stochastic searches; from visual search to foraging, in both perception and action.