Presentation is loading. Please wait.

Presentation is loading. Please wait.

BCS547 Neural Decoding.

Similar presentations


Presentation on theme: "BCS547 Neural Decoding."— Presentation transcript:

1 BCS547 Neural Decoding

2 Nature of the problem In response to a stimulus with unknown orientation q, you observe a pattern of activity A. What can you say about q given A? Estimation theory: come up with a single value estimate from A Bayesian approach: recover P(q|A) (the posterior distribution)

3 Population Code Tuning Curves Pattern of activity (A)

4 Estimation theory

5 ` A1 A2 A200 Decoder Trial 1 Encoder Decoder Trial 2 Encoder Decoder
-100 100 20 40 60 80 Preferred retinal location A1 Decoder Trial 1 Encoder ` -100 100 20 40 60 80 Preferred retinal location A2 Decoder Trial 2 Encoder Decoder Trial 200 Encoder -100 100 20 40 60 80 Preferred retinal location A200

6 Estimation theory is a random variable. To determine the quality of this estimate we can compute its mean, E[ ], and its variance, s 2. If E[ ] =q, the estimate is said to be unbiased If s 2 is as small as possible, the estimate is said to be efficient

7 Estimation theory A common measure of decoding performance is the mean square error between the estimate and the true value This error can be decomposed as:

8 Efficient Estimators The smallest achievable variance for an unbiased estimator is known as the Cramer-Rao bound, sCR2. An efficient estimator is such that In general :

9 Fisher Information Fisher information is defined as: and it is equal to: where P(A| q) is the distribution of the neuronal noise.

10 Fisher Information

11 Fisher Information For one neuron with Poisson noise
For n independent neurons : Large slope is good! The more neurons, the better! Small variance is good!

12 Fisher Information and Tuning Curves
Fisher information is maximum where the slope is maximum This is consistent with adaptation experiments

13 Fisher Information In 1D, Fisher information decreases with the width of the tuning curves In 2D, Fisher information does not depend on the width of the tuning curve In 3D and above, Fisher information increases with the width of the tuning curves. ATTENTION: this is true for independent gaussian noise.

14 Ideal observer The discrimination threshold of an ideal observer, dq, is proportional to the variance of the Cramer-Rao Bound. In other words, an efficient estimator is an ideal observer.

15 An ideal observer is an observer that can recover all the Fisher information in the activity (easy link between Fisher information and behavioral performance) If all distributions are gaussian, Fisher information is the same as Shannon information.

16 Estimation theory Examples of decoders

17 Voting Methods Optimal Linear Estimator

18 Voting Methods Optimal Linear Estimator

19 Voting Methods Optimal Linear Estimator Center of Mass

20 Center of Mass/Population Vector
The center of mass is optimal (unbiased and efficient) iff: The tuning curves are gaussian with a zero baseline, uniformly distributed and the noise follows a Poisson distribution In general, the center of mass has a large bias and a large variance

21 Voting Methods Optimal Linear Estimator Center of Mass
Population Vector

22 Population Vector P aiPi q

23 Population Vector Typically, Population vector is not the optimal linear estimator.

24 Population Vector Population vector is optimal iff: The tuning curves are cosine, uniformly distributed and the noise follows a normal distribution with fixed variance In most cases, the population vector is biased and has a large variance The variance of the population vector estimate does not reflect Fisher information

25 Population Vector Population vector should NEVER be used to estimate
Fisher Information Population vector should NEVER be used to estimate information content!!!!

26 Population Vector

27 Maximum Likelihood

28 Maximum Likelihood The estimate is the value of q that maximizes the likelihood P(A|q). Therefore, we seek such that:

29 Maximum Likelihood If the noise is gaussian and independent Therefore
and the estimate is given by: Distance measure: Template matching

30 Maximum Likelihood

31 Gradient descent for ML
To minimize the likelihood function with respect to q, one can use a gradient descent technique in which q is updated according to :

32 Gaussian noise with variance proportional to the mean
If the noise is gaussian with variance proportional to the mean, the distance being minimized changes to: Data point with small variance are weighted more heavily

33 Poisson noise If the noise is Poisson then And :

34 ML and template matching
Maximum likelihood is a template matching procedure BUT the metric used is not always the Euclidean distance, it depends on the noise distribution.

35 Bayesian approach We want to recover P(q|A). Using Bayes theorem, we have: prior distribution over q likelihood of q posterior distribution over q prior distribution over A

36 Bayesian approach What is the likelihood of q, P(A| q)? It is the distribution of the noise… It is the same distribution we used for maximum likelihood.

37 Bayesian approach The prior P(q) correspond to any knowledge we may have about q before we get to see any activity. Ex: Zhang et al.

38 Bayesian approach Once we have P(q|A), we can proceed in two different ways. We can keep this distribution for Bayesian inferences (as we would do in a Bayesian network) or we can make a decision about q. For instance, we can estimate q as being the value that maximizes P(q|A). This is known as the maximum a posteriori estimate (MAP). For flat prior, ML and MAP are equivalent.

39 Using the prior: Zhang et al
For a time varying variable, one can use the distribution over the previous estimate as a prior for the next one. Prior Nasty but independent of Xt+1

40 Bayesian approach Limitations: the Bayesian approach and ML require a lot of data… Alternative: estimate P(q|A) directly using a nonlinear estimate.

41 Bayesian approach: logistic regression
Example: Decoding finger movements in M1. On each trial, we observe 100 cells and we want to know which one of the 5 fingers is being moved. P(F5|A) 1 5 categories 1 2 3 4 5 100 input units 1 2 3 100 A

42 Bayesian approach: multinomial distributions
Example: Decoding finger movements in M1. Each finger can take 3 mutually exclusive states: no movement, flexion, extension. Probability of no movement Probability of flexion Probability of extension Activity of the N M1 neurons W Digit 1 Wrist Softmax Digit 2 Digit 3 Digit 4 Digit 5

43 Decoding time varying signals
s(t) r(t)

44 Decoding time varying signals

45 Decoding time varying signals

46 Decoding time varying signals
s(t) r(t)

47


Download ppt "BCS547 Neural Decoding."

Similar presentations


Ads by Google