Bayesian inference in neural networks

Slides:



Advertisements
Similar presentations
The Helmholtz Machine P Dayan, GE Hinton, RM Neal, RS Zemel
Advertisements

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Context Model, Bayesian Exemplar Models, Neural Networks.
Probabilistic models Haixu Tang School of Informatics.
Psychology 290 Special Topics Study Course: Advanced Meta-analysis April 7, 2014.
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
Brain Mechanisms of Unconscious Inference J. McClelland Symsys 100 April 22, 2010.
LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.
Data Mining CS 341, Spring 2007 Lecture 4: Data Mining Techniques (I)
Perceptual Inference and Information Integration in Brain and Behavior PDP Class Jan 11, 2010.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Introduction to Regression Analysis, Chapter 13,
CSC2535: 2013 Advanced Machine Learning Lecture 3a: The Origin of Variational Bayes Geoffrey Hinton.
Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Naive Bayes Classifier
Inference: Conscious and Unconscious J. McClelland SymSys 100 April 20, 2010.
The Interactive Activation Model. Ubiquity of the Constraint Satisfaction Problem In sentence processing –I saw the grand canyon flying to New York –I.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Emergence of Semantic Knowledge from Experience Jay McClelland Stanford University.
Brain Mechanisms of Unconscious Inference J. McClelland Symsys 100 April 7, 2011.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Bayesian Inference, Review 4/25/12 Frequentist inference Bayesian inference Review The Bayesian Heresy (pdf)pdf Professor Kari Lock Morgan Duke University.
BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) Direction (deg) Activity
BCS547 Neural Decoding.
Principled Probabilistic Inference and Interactive Activation Psych209 January 25, 2013.
Classification Ensemble Methods 1
6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,
Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
Probabilistic Inference: Conscious and Unconscious J. McClelland SymSys 100 April 5, 2011.
Naive Bayes Classifier. REVIEW: Bayesian Methods Our focus this lecture: – Learning and classification methods based on probability theory. Bayes theorem.
Chapter 9 Knowledge. Some Questions to Consider Why is it difficult to decide if a particular object belongs to a particular category, such as “chair,”
A probabilistic approach to cognition
Dependent-Samples t-Test
What is cognitive psychology?
Regression Analysis AGEC 784.
Network States as Perceptual Inferences
Perception, interaction, and optimality
Naive Bayes Classifier
Hypothesis Testing Review
Chapter 25 Comparing Counts.
Data Mining Lecture 11.
Bayesian inference in neural networks
Network States as Perceptual Inferences
Dynamical Models of Decision Making Optimality, human performance, and principles of neural information processing Jay McClelland Department of Psychology.
Akio Utsugi National Institute of Bioscience and Human-technology,
The t distribution and the independent sample t-test
Dynamic Causal Model for evoked responses in M/EEG Rosalyn Moran.
Dynamical Models of Decision Making Optimality, human performance, and principles of neural information processing Jay McClelland Department of Psychology.
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Connectionist Units, Probabilistic and Biologically Inspired
Some Basic Aspects of Perceptual Inference Under Uncertainty
The Naïve Bayes (NB) Classifier
Banburismus and the Brain
Confidence as Bayesian Probability: From Neural Origins to Behavior
A Switching Observer for Human Perceptual Estimation
Chapter 26 Comparing Counts.
A Switching Observer for Human Perceptual Estimation
Machine Learning: Lecture 6
Bayesian vision Nisheeth 14th February 2019.
James M. Jeanne, Tatyana O. Sharpee, Timothy Q. Gentner  Neuron 
Segregation of Object and Background Motion in Visual Area MT
Chapter 26 Comparing Counts Copyright © 2009 Pearson Education, Inc.
Machine Learning: UNIT-3 CHAPTER-1
Timescales of Inference in Visual Adaptation
Chapter 26 Comparing Counts.
Volume 23, Issue 21, Pages (November 2013)
Volume 74, Issue 1, Pages (April 2012)
Volume 28, Issue 19, Pages e8 (October 2018)
Presentation transcript:

Bayesian inference in neural networks Psychology 209 Jan 12, 2017

Perception as inference What objects, words, and letters are present in a scene? Often all elements of evidence are inconclusive – yet as a whole correct perception is inevitable Other times, we can increase our chance of being correct, but can’t completely ensure we will always be correct

Key ideas – 1 Bayes’ formula provides a basis for a theory of perceptual inference This theory is quite general but depends on having a lot of knowledge Any number of alternative hypotheses Any number of elements of evidence Sometimes we can relate this knowledge to a ‘generative model’ of the process that produced the evidence. Even when we cannot, the concept of a generative model is a useful one for understanding perceptual inference.

Probability of cancer given a positive mammogram? The probability of a positive mammogram given breast cancer is .9 The probability of a positive mammogram given no breast cancer is .05 What is the probability of cancer given a positive mammogram? The probability of breast cancer in the population of adult women is .001 (one woman in one-thousand has breast cancer)

A generative model for letter perception Experiment: I present some features to you – what letter do you think these features represent? Generative Model: A letter is chosen according to some distribution. Features are chosen conditionally independently based on the letter. Some features are hidden from view.

Bayesian calculations

Some observations Odds ratios don’t change when we add or remove alternatives p(hi)/p(hj) is unaffected by the number of other hypotheses For example: In our case, p(A)/p(H) should stay constant independent of the number of other possible letter alternatives. Common factors cancel out For example, when p(hi) is the same for all letters, p(hi) cancels out of p(hi|e) Likewise, evidence supporting a pair of alternatives equally doesn’t affect p(hi)/p(hj) For example, we could remove some of the features on which A and H agree, and p(A)/p(H) would be unaffected

Key Ideas – 2 Real neural networks can perform perceptual inferences, and combine information in ways that are consistent with Bayesian approaches Artificial neural networks rely on computations that allow these processes to be modeled There are fairly simple linking assumptions that make it reasonably straightforward to see how one simulates the other even though the correspondences are abstract Deep learning models such as CNN’s are grounded in these ideas

Salzman Newsome experiment Stimuli consisted of fields of random dots moving in one of eight different directions Coherence of dots varies across trials of the experiment There’s also an electrode in the Monkey’s brain, stimulating

Basic results Bias visible at low coherence when there’s no microstimulation Visual stimulus effect dominates at high coherence Electrical stimulation dominates at low coherence All have effects at intermediate coherence

Bayesian calculations

Logarithmic transformation of Bayesian inference Using logs: Then applying the softmax function: p(hi)|e) = exp(log(Si))/(Si’exp(log(Si’))) Is equivalent to the Bayesian computation previously described

Simple neural network that estimates posterior probabilities Input units for different elements of evidence aj = 1 if present, 0 if absent Bias weights bi = log(pi/c) where c is any constant Connection weights wij = log(p(ej|hi)/c) where c is any constant Net input neti = bi + Sj aj wij Softmax makes activation correspond to posterior probability

Fit to Salzman Newsome data Data (dashed lines) pooled from conditions in which both visual and electrical stimulus affected behavior Top panel: data and fits when VS and ES were 135 degrees apart Bottom panel: data and fit of model when VS and ES were 90 degrees apart Weights and biases estimated using logistic regression to find best fit Neural networks perform an advanced version of logistic regression when they learn Top shows animals are not just averaging the two kinds of input, although the effect approximates averaging when the two sources of evidence point in similar directions (bottom)

Using just one unit where there are only two alternatives Softmax version: Divide by net2: New version of net: Or b = log(p1/p2); wj = log(p(ej|h1)/p(ej|h2)) Data (points) and fits (lines) to an experiment on phoneme identification with varying auditory and visual support for ‘BA’ and ‘DA’

Neural and conceptual descriptions

Relationships between units and neurons Simple case: units correspond to explicit hypotheses You can think that several actual neurons are dedicated to each explicit hypothesis or element of evidence – this can help guide intuitions More interesting case: neural activity patterns correspond to hypotheses – individual neurons participate in the activity associated with many different hypotheses Today and next time we stay closer to the simple case But we must understand this as a useful simplification, not as an assumption about the true underlying state of affairs

Morton’s logogen model of word recognition prior can operate here!

Useful concepts linking neural networks and probabilistic inference Unit: hypothesis placeholder A unit is not a neuron! Log prior: bias Log(p(ej|hi)) = wij Log p(e|h): summed synaptic input Logit: log of support: net input = bias + summed synaptic input Estimate of posterior: activation Maximizing under noise can approximate probability matching Degree of noise corresponds to scale factor 1/T in softmax function This allows performance to be completely random when noise dominates, or completely deterministic when there is no noise.

Key ideas – 1 Bayes’ formula provides a basis for a theory of perceptual inference This theory is quite general but depends on having a lot of knowledge Any number of alternative hypotheses Any number of elements of evidence Sometimes we can relate this knowledge to a ‘generative model’ of the process that produced the evidence. Even when we cannot, the concept of a generative model is a useful one for understanding perceptual inference.

Key ideas – 1 Bayes’ formula provides a basis for a theory of perceptual inference This theory is quite general but depends on having a lot of knowledge Any number of alternative hypotheses Any number of elements of evidence Sometimes we can relate this knowledge to a ‘generative model’ of the process that produced the evidence. Even when we cannot, the concept of a generative model is a useful one for understanding perceptual inference.

Next time and homeworks In general a percept corresponds to an ensemble of hypotheses that are mutually interdependent We will see next time how the entire state of a hierarchical neural network can correspond to a coherent ensemble of hypotheses… This allows us to begin to account for our ability experience percepts that we don’t have dedicated individual neurons for Do the small homework before you do the reading for next time so that you are on solid ground as you read the second half of the paper! First simulation homework will be available Tuesday and it will be due the following Tuesday.