Information Theory and Learning

Slides:



Advertisements
Similar presentations
On an Improved Chaos Shift Keying Communication Scheme Timothy J. Wren & Tai C. Yang.
Advertisements

The linear/nonlinear model s*f 1. The spike-triggered average.
E ffi cient Coding: From Retina Ganglion Cells To V2 Cells Honghao Shan Garrison W. Cottrell The Temporal Dynamics of Learning Center Gary's Unbelievable.
Pattern Recognition and Machine Learning
Computer vision: models, learning and inference Chapter 8 Regression.
黃文中 Preview 2 3 The Saliency Map is a topographically arranged map that represents visual saliency of a corresponding visual scene. 4.
Hilbert Space Embeddings of Hidden Markov Models Le Song, Byron Boots, Sajid Siddiqi, Geoff Gordon and Alex Smola 1.
x – independent variable (input)
Application of Statistical Techniques to Neural Data Analysis Aniket Kaloti 03/07/2006.
Dimensional reduction, PCA
Spike-triggering stimulus features stimulus X(t) multidimensional decision function spike output Y(t) x1x1 x2x2 x3x3 f1f1 f2f2 f3f3 Functional models of.
Independent Component Analysis (ICA) and Factor Analysis (FA)
Statistical analysis and modeling of neural data Lecture 4 Bijan Pesaran 17 Sept, 2007.
ICA Alphan Altinok. Outline  PCA  ICA  Foundation  Ambiguities  Algorithms  Examples  Papers.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 20 Learning features one layer at a time Geoffrey Hinton.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
Learning Energy-Based Models of High-Dimensional Data Geoffrey Hinton Max Welling Yee-Whye Teh Simon Osindero
Linear Algebra and Image Processing
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
1 CE 530 Molecular Simulation Lecture 7 David A. Kofke Department of Chemical Engineering SUNY Buffalo
Unsupervised learning
Sparse Coding Arthur Pece Outline Generative-model-based vision Linear, non-Gaussian, over-complete generative models The penalty method.
How to do backpropagation in a brain
Another viewpoint: V1 cells are spatial frequency filters
Biointelligence Laboratory, Seoul National University
The search for organizing principles of brain function Needed at multiple levels: synapse => cell => brain area (cortical maps) => hierarchy of areas.
Low Level Visual Processing. Information Maximization in the Retina Hypothesis: ganglion cells try to transmit as much information as possible about the.
Parallel ICA Algorithm and Modeling Hongtao Du March 25, 2004.
Independent Component Analysis Zhen Wei, Li Jin, Yuxue Jin Department of Statistics Stanford University An Introduction.
Blind Source Separation by Independent Components Analysis Professor Dr. Barrie W. Jervis School of Engineering Sheffield Hallam University England
2 2  Background  Vision in Human Brain  Efficient Coding Theory  Motivation  Natural Pictures  Methodology  Statistical Characteristics  Models.
Learning Lateral Connections between Hidden Units Geoffrey Hinton University of Toronto in collaboration with Kejie Bao University of Toronto.
Projects: 1.Predictive coding in balanced spiking networks (Erwan Ledoux). 2.Using Canonical Correlation Analysis (CCA) to analyse neural data (David Schulz).
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Learning Theory Reza Shadmehr LMS with Newton-Raphson, weighted least squares, choice of loss function.
On Natural Scenes Analysis, Sparsity and Coding Efficiency Redwood Center for Theoretical Neuroscience University of California, Berkeley Mind, Brain.
Fields of Experts: A Framework for Learning Image Priors (Mon) Young Ki Baik, Computer Vision Lab.
Image cryptosystems based on PottsNICA algorithms Meng-Hong Chen Jiann-Ming Wu Department of Applied Mathematics National Donghwa University.
Principal Manifolds and Probabilistic Subspaces for Visual Recognition Baback Moghaddam TPAMI, June John Galeotti Advanced Perception February 12,
CSC2515: Lecture 7 (post) Independent Components Analysis, and Autoencoders Geoffrey Hinton.
Fast Learning in Networks of Locally-Tuned Processing Units John Moody and Christian J. Darken Yale Computer Science Neural Computation 1, (1989)
Research Methods Lecturer: Steve Maybank
Several strategies for simple cells to learn orientation and direction selectivity Michael Eisele & Kenneth D. Miller Columbia University.
Understanding early visual coding from information theory By Li Zhaoping Lecture at EU advanced course in computational neuroscience, Arcachon, France,
Machine Vision Edge Detection Techniques ENT 273 Lecture 6 Hema C.R.
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
CSC 2535: Computation in Neural Networks Lecture 10 Learning Deterministic Energy-Based Models Geoffrey Hinton.
Statistical Modeling and Learning in Vision --- cortex-like generative models Ying Nian Wu UCLA Department of Statistics JSM, August 2010.
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Independent Component Analysis features of Color & Stereo images Authors: Patrik O. Hoyer Aapo Hyvarinen CIS 526: Neural Computation Presented by: Ajay.
CSC2535: Computation in Neural Networks Lecture 7: Independent Components Analysis Geoffrey Hinton.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
1 Nonlinear models for Natural Image Statistics Urs Köster & Aapo Hyvärinen University of Helsinki.
Deep Feedforward Networks
LECTURE 11: Advanced Discriminant Analysis
CSC321: Neural Networks Lecture 22 Learning features one layer at a time Geoffrey Hinton.
Spontaneous activity in V1: a probabilistic framework
LECTURE 10: DISCRIMINANT ANALYSIS
Classification with Perceptrons Reading:
Neuro-Computing Lecture 4 Radial Basis Function Network
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Biointelligence Laboratory, Seoul National University
Learning Theory Reza Shadmehr
LECTURE 09: DISCRIMINANT ANALYSIS
Mathematical Foundations of BME
Linear Discrimination
How to win big by thinking straight about relatively trivial problems
Goodfellow: Chapter 14 Autoencoders
Presentation transcript:

Information Theory and Learning Tony Bell Helen Wills Neuroscience Institute University of California at Berkeley

One input, one output deterministic Infomax: match the input distribution to the non-linearity:

Gradient descent learning rule to maximise the transferred information deterministic sensory only

Examples of score functions LOGISTIC LAPLACIAN In stochastic gradient algorithms (online training), we dispense with the ensemble averages giving: for a single training example and a laplacian ‘prior’.

Same theory for multiple dimensions: fire vectors into the the unit hypercube uniformly: ( ) where this is the absolute determinant of the Jacobian matrix, measuring how stretchy the mapping is for square or overcomplete transforms Undercomplete transformations are not invertable, and require the more complex formula:

Same theory for multiple dimensions: fire vectors into the the unit hypercube uniformly: ( ) Post-multiplying this by a positive definate transform rescales the gradient optimally (called the Natural Gradient - Amari) giving the pleasantly simple form:

Decorrelation is not enough: diagonal matrix f gives higher order statistics, through its Taylor expansion

Infomax/ICA on image patches: learn co-ordinates for natural scenes. In this linear generative model, we want u = s: recover independent sources. After training, we calculate A = W , and plot the columns. For 16x16 images, we get 256 bases -1

f from logistic density

f from laplacian density

f from Gaussian density

But this does not actually make the neurons independent. Many joint densities p(u1,u2) are decorrelated but still radially symmetric: they factorise in polar co-ordinates, but not in cartesian, unless they’re Gaussian.. instead of This happens when cells have similar position, spatial frequency, and orientation selectivity, but different phase. Dependent filters can combine to make non-linear complex cells (oriented but phase insensitive).

‘Dependent’ Component Analysis. First, the maximum likelihood framework. What we have been doing is: Infomax Maximum Likelihood Minimum KL Divergence We are fitting a model to the data: or equivalently: But a much more general model is the ‘energy-based’ model (Hinton): sum of functions on subsets of with

‘Dependent’ Component Analysis. For the completely general model: the learning rule is: with the 2nd term reducing to -I (identity) in the case of ICA. Unfortunately this involves an intractable integral over the model q. Nonetheless, we can still work with all dependency models which are non-loopy hypergraphs. Learn as before, but with a modified score function: : a loopy hypergraph: instead of

For example, we can split the space into subspaces such that the cells are independent between subspaces and dependent within the subspaces. Eg: for 4 cells: 1 3 2 4 We now show a sequence of symmetry-breaking occuring as we move from training, on images, a model which is one big 256-dimensional hyperball, down to a model which is 64 four-dimensional hyperballs:

Logistic Density 1 subspace

Logistic density 2 subspaces

Logistic density 4 subspaces

Logistic density 8 subspaces

Logistic density 16 subspaces

Logistic density 32 subspaces

Logistic density 64 subspaces

Topographic ICA Arrange the cells in a 2D map with a statistical model q constructed from overlapping subsets. This is a loopy hypergraph, an un-normalised model, but it still gives a nice result…. The hyperedges of our hypergraph are overlapping 4x4 neighbourhoods etc.

That was from Hyvarinen & Hoyer. Here’s one from Osindero & Hinton.

Conclusion. Well, we did get somewhere: We seem to have an information-theoretic explanation of some properties of area V1 of visual cortex: -simple cells (Olshausen &Field, Bell & Sejnowski) -complex cells (Hyvarinen & Hoyer) -topographic maps with singularities (Hyvarinen & Hoyer) -colour receptive fields (Doi & Lewicki) -direction sensitivity (van Hateren & Ruderman) But we are stuck on: -the gradient of the partition function -still working with rate models, not spiking neurons -no top-down feedback -no sensory-motor (all passive world modeling)

References. The references for all the work in these 3 talks will be forwarded separately. If you don’t have access to them email me at tbell@berkeley.edu, and I’ll send them to you.