Presentation is loading. Please wait.

Presentation is loading. Please wait.

Shin Ishii Nara Institute of Science and Technology

Similar presentations


Presentation on theme: "Shin Ishii Nara Institute of Science and Technology"— Presentation transcript:

1 Shin Ishii Nara Institute of Science and Technology
Introduction: statistical and machine learning based approaches to neurobiology Shin Ishii Nara Institute of Science and Technology

2 Mathematical Fundamentals: Maximum Likelihood and Bayesian Inferences

3 (Note: each trial is independent)
Coin tossing Tossing a skewed coin How often does the head appear for this coin? Probability of coming up one head in five tossing: One head comes up in five tossing Head Tail Tail Tail Tail (Note: each trial is independent) Parameter: rate of head appearance in an individual trial Likelihood

4 Likelihood function Likelihood: evaluation of the observed data
viewed as a function of the parameter Which parameter is better for explaining the observed data? Likelihood of parameter : What is the most likely parameter ? How to determine it? It seems natural to set according to the frequency of coming up the head. Really?

5 Kullback-Leibler (KL) divergence
A measure of the difference between two probability distributions: and We can measure the difference according to an objective and numerical value. difference Note: KL divergence is not a metric.

6 Minimize KL divergence
Random events are drawn from the real distribution true distribution minimize divergence Using the observed data, we want to estimate the true distribution using a trial distribution. trial distribution data set The smaller the KL divergence , the better an estimate.

7 Minimize KL divergence
KL divergence between the two distributions To minimize KL divergence, we have only to maximize the second term with respect to the parameter . Constant: independent of parameter

8 Likelihood and KL divergence
The second term is approximated by the sample mean: data set Log likelihood They are the same: Minimizing the KL divergence Maximizing the likelihood

9 Maximum Likelihood (ML) estimation
Maximum likelihood (ML) estimate: What is the most likely parameter in the coin tossing? Maximization condition Same as intuition Head Tail Tail Tail Tail

10 Property of ML estimate
As the number of observations increases, the squared error of an estimate decreases in order. ML estimate is asymptotically unbiased. R. Fisher ( ) If the infinite number of observations could be obtained, an ML estimate becomes the real parameter. Infeasible What happens when only a limited number of observations have been obtained from the real environment?

11 Problem with ML estimation
Is it a really skewed coin? See... Four consecutive tails occurred by chance. The ML estimate overfits to the first observations. How to avoid this overfitting? It may just happen to come up four consecutive tails. It may be detrimental to assume the parameter as Head Tail Tail Tail Tail If the data consists of one head in a single tossing, an ML estimate gives 100%. Not reasonable Consider extreme case: Five more tossing... Head Head Head Head Tail

12 Bayesian approach Bayes theorem = + Likelihood Prior Posterior
a posteriori information information obtained from data a priori = + We have no information about the probably skewed coin. Then, we now assume that the parameter is distributed around Prior distribution

13 Bayesian approach Bayes theorem = + Likelihood Prior Posterior
a posteriori information information obtained from data a priori = + observed data One head and four tails. Hmm... It may be a skewed coin, but better consider other possibilities. Likelihood function

14 Bayesian approach Bayes theorem = + Likelihood Prior Posterior
a posteriori information information obtained from data a priori = + The parameter is distributed mainly between Variance (uncertainty) exists. Posterior distribution

15 Property of Bayesian inference
Bayesian view: probability represents uncertainty of random events (subjective value). R. Fisher ( ) That can’t be! The prior distribution leads to a subjective distortion against estimation. The estimation process must be objective to obtained data. Frequentist T. Bayes ( ) No problem. Uncertainty of random events (subjective probability) depends on the amount of information obtained from data and prior knowledge of the events. Bayesian

16 Application of Bayesian approaches
Data obtained from the real world: sparse high dimension unobservable variables Bayesian methods are available User support system (Bayesian Network) Bioinfomatics

17 Bayesian Approaches to Reconstruction of Neural Codes

18 A neural decoding problem
How the brain works? Sensory information is represented in sequences of spikes. When the same stimulus is repeatedly input, spike occurrence varies between trials. An indirect approach is to reconstruct the stimuli from the observed spike trains.

19 Bayesian application to a neural code
Stimulus (prior knowledge) Spike train (observation) ? Time Possible algorithm of stimulus reconstruction (estimation) from spike train only (Maximum likelihood estimation) spike train & stimulus (Bayes estimation) Note: we focus on whether spike trains include stimulus information, NOT BUT whether the algorithm is true in the brain.

20 ‘Observation’ depends on ‘Prior’
(Bialek et al., Science, 1991) Stimulus Black box Neural system Time Estimated stimulus Spike train Estimation algorithm Time Time

21 ‘Observation’ depends on ‘Prior’
Stimulus distribution Black box Neural system s Estimated stimulus distribution Estimation algorithm x s

22 Simple example of signal estimation
Particular value of observation: x Incoming signal: s Noise h ( ) Observation = Signal + Noise : + = Estimation stimulus

23 Simple example of signal estimation
If the probability that one observes a particular x with signal s just depends on the noise h , and the noise h is supposed to be chosen from a Gaussian, If the signal s is supposed to be chosen from a Gaussian, So the posterior is,

24 Simple example of signal estimation
Maximum likelihood estimation: maximize K Bayes estimation: maximize 1 SNR Prior knowledge Likelihood (Observation) Posterior Bayes theorem:

25 Signal estimation of a fly
(Bialek et al., Science, 1991) Calliphora erythrocephala Movement sensitive neuron (H1) Gaussian visual stimulus Time Visually guided flight Time scale 30 ms of behavior H1 firing rate : spikes/s Behavioral decisions are based on a few spikes.

26 Signal estimation of a fly
(Bialek et al., Science, 1991) Observation Stimulus Encoder X Time Estimated stimulus maximizes However, can not be measured directly. Prior knowledge Likelihood (Observation) Posterior Bayes theorem:

27 Kernel reconstruction and least square
Estimated stimulation: can not still calculated, because can not be defined. Next step alternative calculation: Choosing the kernel in the function which minimizes the square error

28 Signal estimation of a fly
(Bialek et al., Science, 1991) Stimulus Estimated stimulus Kernel

29 Case of mammalian O’Keefe’s place cell Rat hippocampal CA1 cells
It is known that hippocampal CA1 cells would represent the position is a familiar field. Rat hippocampal CA1 cells (Lever et al., nature, 2002) Each place cell shows high activity when the rat is located at a specific position

30 Case of mammalian Question:
Can one estimate rat’s position in the field from firing patterns of rat hippocampal place cells? (Lever et al., nature, 2002) Each place cell shows high activity when the rat is located at a specific position Incremental Bayes estimation (Brown et al., J. Neurosci., 1998)

31 Sequential Bayes estimation from spike train
(Brown et al., J. Neurosci., 1998) Rat position can be estimated by integrating the recent place cell activities and the position estimate from the history of activities. Observation: Spike train of a place cell Prior stimulation: Rat’s position at . . . . . . time Prior knowledge Likelihood (Observation) Posterior Bayes theorem:

32 Incremental Bayes estimation from spike train
(Brown et al., J. Neurosci., 1998) Observation: Spike train of place cells . . . . . . Time Rat’s position Prior:

33 Incremental Bayes estimation from spike train
(Brown et al., J. Neurosci., 1998) . . . . . . Time Observation probability is the function of the firing rate of cells, which depends on the position & theta rhythm. Firing rate of a place cell depends on position component (receptive field) theta phase component

34 Inhomogeneous Poisson process for spike train
(Brown et al., J. Neurosci., 1998) Firing rate of a place cell depends on preferred position (receptive field) preferred phase in the theta rhythm Position component (asymmetric Gaussian): Theta phase component (cosine): Instantaneous firing rate: The parameters were determined by maximum likelihood.

35 Position estimation from spike train
(Brown et al., J. Neurosci., 1998) Assumption: The path of the rat may be approximated as a zero mean two-dimensional Gaussian random walk. Parameters, and were estimated by ML. Finally, estimation procedure is as follows: Encoding stage: estimate parameters, , , , and Decoding stage: estimate rat’s position by incremental Bayes method at each spike event with the assumption of Gaussian random walk.

36 Bayes estimation from spike train
(Brown et al., J. Neurosci., 1998) Real rat spike! EKF estimation with variance spike! spike! spike! spike! spike! Calculation of posterior distribution is done in discontinuous time steps; when a spike occurs as a new observation.

37 Position estimation from spike train (1)
(Brown et al., J. Neurosci., 1998) Mouse position Estimation Bayes estimation Maximum likelihood Maximum correlation Posterior = Model activity Correlation Prior Likelihood X Likelihood Observed firing pattern

38 Position estimation from spike train (2)
(Brown et al., J. Neurosci., 1998) Mouse position Estimation Bayes estimation Maximum likelihood Maximum correlation The ML and maximum correlation methods ignore the history of neural activities, but the incremental Bayes incorporates it as a prior. Posterior = Model activity Correlation Prior Likelihood X Likelihood Observed firing pattern

39 Information Theoretic Analysis of Spike Trains

40 Information transmission in neural systems
Spike train (neural response) Environmental stimulus Encoding Decoding How does a spike train code information about the corresponding stimuli? How efficient is the information transmission? Which kind of coding is optimal?

41 Information transmission: Generalized view
Encoder Decoder Information source Signal Transmitter Observable Receiver Message Received signal Observable Destination Channel Noise source Message Shannon’s communication system (Shannon, 1948)

42 Neural coding is stochastic process
Stimulus Observed Spike trains Neuronal responses against a given stimulus are not deterministic but stochastic, and the stimulus against each response is also probabilistic.

43 Shannon’s Information
Smallest unit of information is “bit” 1 bit = the amount of information needed to choose between two equally-likely outcomes (eg: tossing a coin) Properties: Information for independent events are additive over constituent events If we already know the outcome, there is no information

44 Shannon’s Information
Property 1 Independent events: Implies: Property 2 Certain events: Implies:

45 Eg. Tossing a coin Tossing an even coin Head Tail Observed

46 Eg. Tossing a coin Tossing a horribly skewed coin… Observed
Head Tail Observed Observing ordinary event has low information, but observing rare event is highly informative.

47 Eg. Tossing 5 coins Case 1: even 5 coins Case 2: skewed 5 coins Head
Tail Tail Tail Tail Case 1: even 5 coins Case 2: skewed 5 coins

48 Entropy On average, how much information do we get from an observation drawn from the distribution? Entropy is the expectation of the information over all possible observations Entropy can be defined… discrete continuous

49 Some properties of entropy
Scalar property of a probability distribution Entropy is maximum if P(X) is constant Least certainty of the event Entropy is minimum if P(X) is a delta function Entropy is always positive The higher the entropy is, the more you learn (on average) by observing values of the random variable The higher the entropy is, the less you can predict the values of the random variable

50 Eg. Tossing a coin Entropy Head Tail
Entropy reaches the maximum when each event occurs with equal probability, i.e., occurs most randomly.

51 Eg. Entropy of Gaussian distributions
Entropy only depends the standard deviation, i.e., the entropy reflects variability of information source.

52 What distribution maximizes the entropy of random variables?
discrete uniform distribution continuous Gaussian

53 Entropy of Spike Trains
A spike train can be transformed into a binary vector by discretizing the time into small bins. Computing the entropy of possible spike trains How informative are such spike trains? spike train 1 binary word MacKay and McCulloch (1952)

54 Entropy of spike trains
Brillouin (1962) How many different binary words may occur over the whole bins All possible words Entropy Stirling approximation The entropy is linear to length of time window, T

55 Entropy rate: the unit of bits of information per second
If the chance of a spike in a bin is small (low rate, or high sampling rate) then we can approximate the entropy rate as: Entropy rate of temporal (timing) code (ms) Eg.

56 Entropy of spike count distribution
Entropy of spike count distribution (rate code) What should we choose for p(n)? We know only know two constraints about p(n): probability distributions must be normalized average spike count in T should be We cannot determine p(n) uniquely, but can obtain p(n) which maximizes the entropy.

57 Entropy of spike count distribution
Entropy for spike counts is maximized by an exponential distribution: Entropy is then:

58 Conditional entropy and mutual information
I(s,r) discrete continuous Mutual information The entropy represents the uncertainty about the response in the absence of any other information. The conditional entropy represents the remaining uncertainty about the response for a fixed stimulus s. I(r,s) is the mutual information between s and r; representing the reduction of uncertainty in r achieved by measuring s. If r and s are statistically independent, then I(r,s)=0.

59 Reproducibility and variability in neural spike trains
Dynamic stimuli (natural condition) random walk with diffusion constant Ordered firing patterns (high reproducibility) Calliphora erythrocephala Movement sensitive neuron (H1) Static stimuli (artificial condition) Irregular firing patterns (low reproducibility, Poisson-like patterns) van Steveninck et al., Science, 1997

60 Reproducibility and variability in neural spike trains
Low spike count variance Dynamic stimuli (natural condition) random walk with diffusion constant Calliphora erythrocephala Movement sensitive neuron (H1) mean count Static stimuli (artificial condition) High spike count variance (Poisson-like mean-variance relationship) Does more precise spike timing convey more information about input stimuli? van Steveninck et al., Science, 1997

61 Quantifying information transfer
30-ms window At each t, divide spike trains in 10 contiguous 3-ms bins, and construct local word frequency Stepping in 3-ms bins, words are sampled (900 trials, 10 s) 100 responses Distribution of 1500 words word local word frequency van Steveninck et al., Science, 1997

62 Quantifying information transfer
Entropy of spike trains Conditional entropy of neuronal response given stimulus Transmitted information (mutual information between W and t) H1’s responses to dynamic stimuli van Steveninck et al., Science, 1997

63 Comparison with simulated spike trains
Simulate spike trains by a modulated Poisson process has the correct dynamics of the firing rate of the responses to dynamic stimuli but follows the mean-variance relation of static stimuli (mean=variance). Models that may accurately accounts for the H1 neural response to static stimuli can significant underestimate the signal transfer under more natural conditions. Simulated responses H1’s responses to dynamic stimuli More than twice as much information

64 Summary Statistical inference Information theory
Maximum likelihood inference Bayesian inference Bayesian approach to neural decoding problems Information theory Information amount and entropy Information theoretic approach to a neural encoding system


Download ppt "Shin Ishii Nara Institute of Science and Technology"

Similar presentations


Ads by Google