Neural Coding Through The Ages February 1, 2002 Albert E. Parker Complex Biological Systems Department of Mathematical Sciences Center for Computational.

Neural Coding Through The Ages February 1, 2002 Albert E. Parker Complex Biological Systems Department of Mathematical Sciences Center for Computational Biology Montana State University

 Introduction  The Problem  Approaches in Neural Encoding –Spike Count Coding (Adrian and Zotterman 1926) –Poisson Model (Fatt and Katz 1952) –Wiener/Volterra series (1930, 1958)  Approaches in Neural Decoding –Linear Methods Linear Reconstruction (Reike et al 1997) Vector Method (Georgopoulos et al 1983) Optimal Linear Estimator (Abbot and Salinas 1994) –Gaussian Model (de Ruyter van Steveninck and Bialek 1988) –Metric Space (Victor and Purpura 1996) Outline

We want to understand the neural code. We seek an answer to the question: How does neural activity represent information about environmental stimuli? “The little fly sitting in the fly’s brain trying to fly the fly”

stimulus X(  ) response Y(t) Looking for the dictionary to the neural code … decoding encoding

… but the dictionary is not deterministic! Given a stimulus, an experimenter observes many different neural responses (spike trains): X(  ) Y i (t)| X(  ) i = 1, 2, 3, 4

… but the dictionary is not deterministic! Given a stimulus, an experimenter observes many different neural responses (spike trains): X(  ) Y i (t)| X(  ) i = 1, 2, 3, 4 Neural coding is stochastic!!

Similarly, neural decoding is stochastic: Y(t) X i (  )|Y(t) i = 1, 2, …, 9

Probability Framework X(  ) Y(t) encoder: P(Y(t) |X(  )) environmental stimuli neural responses decoder: P(X(  )|Y(t))

Y X environmental stimuli neural responses P(X(  ),Y(t)) Areas of high probability Information Theory tells us that if the relationship between X and Y can be modeled as an optimal communication channel, then … A coding scheme needs to be stochastic on a fine scale and almost deterministic on a large scale.

How to determine a coding scheme? There are 2 methodologies in this search: encoding (determining P(Y|X)) decoding (determining P(X|Y)) As we search for a coding scheme, we proceed in the spirit of John Tukey: It is better to be approximately right than exactly wrong.

Neural encoding ….

Stimulus Amplitude Response Amplitude Spike Count Coding (Adrian and Zotterman 1926) Encoding as a non-linear process: the response tuning curve Hanging weights from a muscle ( Adrian 1926) Stimulus Amplitude: mass in grams Moving a pattern across the visual field of a blowfly (de Ruyter van Steveninck and Bialek, 1988) Stimulus Amplitude: average velocity (omms/s) in 200ms window In Spike Count Coding, response amplitude is the (spikes count)/time

Stimulus Amplitude P(Y |X) Spike Count Coding spikes/time An experimenter repeats each stimulus many times to estimate the encoder P(Y | X).

Stimulus Amplitude P(X) P(Y) Spike Count Coding spikes/time P(Y |X) And now you can get P(X | Y) from Bayes rule

Spike Count Coding the directional tuning curve ( Miller, Jacobs and Theunissen 1991) The response tuning curves for the 4 interneurons in the cricket cercal sensory system. The preferred directions are orthogonal to each other.

Cons Counting spikes/time neglects the temporal pattern of the spikes of the neural response, which –Potentially decreases the information conveyed –Known short behavioral decision times imply that many neurons make use of just a few spikes Sensory systems respond to stimulus attributes that are very complex. That is, the space of possible stimuli for some systems is a very large (infinite dimensional) space. Hence, it is not feasible to present all possible stimuli in experiment. Pros Some neural systems do seem to encode certain stimulus attributes by (number spikes)/time. Can work well if the stimulus space is small (e.g. when coding direction in cricket cercal sensory system). Abbot, 2001

Electrical multi-site stimulation of chicken retina in vitro: P-type afferent responses in the electric fish to transdermal potential stimulus These (normalized) histograms give the probability per unit time of firing given that x(t) occurred: r[t |X(  )=x(  )] Poisson Model (Fatt and Katz 1952) (Xu, Payne and Nelson 1996) (Stett et al., 2000)

If we assume that the spikes are independent from each other given a stimulus X(  ), then we can model P(Y | X) as an inhomogenuous (dependent on time) Poisson process: Poisson Model If Y is the (spike count)/time, then P(Y | X(  )) = Poisson(  r[t |X(  )] dt) If Y(t) is a spike train, then P(Y(t) |X(  )) = Poisson-like(r[t |X(  )])

Poisson Model When is the Poisson model a good one to try? Examine the mean and variance of the spike counts/time given each stimulus. For a Poisson process, they should be equal: Abbot, 2001

Pros Have an explicit form of P(Y|X) A Poisson process is a basic, well studied process Cons Counting spikes/time neglects the temporal pattern of the spikes of the neural response. Assuming that spikes are independent neglects the refractory period of a neuron. This period must be ‘small’ compared to mean ISI in order for Poisson model to be appropriate. To deal with this: –Berry and Meister (1998) have proposed a Poisson model that includes the refractory period. –Emory Brown (2001) uses a mixture of Gamma and inverse Gaussian models. The space of possible stimuli for some systems is a very large space, so it is not possible to present all possible stimuli in experiments to estimate r(t | X(  )). (Stett et al., 2000)

Wiener / Volterra Series (1930, 1958) The Taylor series for a function y = g(x): y(x) = y(x 0 ) + y’(x 0 ) (x - x 0 ) + ½ y’’ (x 0 )(x - x 0 ) 2 + … = f 0 + f 1 (x - x 0 ) + f 2 (x - x 0 ) 2 + … The Volterra series is the analog of a Taylor series for a functional Y(t) = G[X(t)]: Y(t) = f 0 +  d  1 f 1 (  1 )X(t -  1 ) +  d  1  d  2 f 2 (  1,  2 )X(t -  1 ) X(t -  2 ) + … How to compute { f i } ?? Wiener reformulated the Volterra series in a way so that the new coefficient functions or kernels could be measured from experiment. There are theorems that assure that this series with sufficiently many terms provides a complete description for a broad class of systems. The first Wiener kernel is proportional to the cross correlation of stimulus and response: f 1 = /S X This is proportional to the spike triggered average when Y(t) is a spike train.

Wiener / Volterra Series Constructing the neural response (here, Y is the firing rate) from the first Wiener kernel … seems to be able to capture slow modulations in the firing fate. Reike et al. 1997: recordings from H1 of the fly Actual response Predicted response

Pros Computing the first Wiener kernel is inexpensive. Not much data is required. Cons Although it is theoretically possible to compute many terms in the Wiener series, practical low order approximations, of just f 0 and f 1 for example, don’t work well in practice (i.e. coding is NOT linear). Wiener series is for a continuous function Y(t). This is fine when Y(t) = r[t |X(  )], the firing rate. But how do we construct a series to model the discrete spiking of neurons? The Wiener series gives a specific Y(t) | X(  ). What is P(Y | X)? In principle, one can do a lot of repeated experiments to estimate P(Y | X). In practice, the preparation dies on you before enough data is collected. Slice of a fly brain

Neural Decoding Why Decoding? Encoding looks non-linear in many systems. Maybe decoding is linear and hence easier. It is conceivably easier to estimate P(X|Y) over an ensemble of responses {Y}, since {Y} live in a much smaller space than the {X}.

Linear Reconstruction Method (Reike et al 1997) Consider a linear Volterra approximation (X  Y): X(t)=  K 1 (  )Y(t -  ) d  =  i K 1 (t – t i ) if we represent the discrete spike train as Y(t) =  i  (t – t i ), where the i th spike occurs at t i. How to determine K 1 ? Minimize the mean squared error: min K(  )   |X observed (t) -  i K(t – t i )| 2 dt  X Fourier transform of average stimulus surrounding a spike Power spectrum of the spike train

Linear Reconstruction Method Reconstructing the stimulus with the linear filter K 1 : Reike et al. 1997: recordings from H1 of the fly Actual stimulus Predicted stimulus

Pros It’s cheap. The temporal pattern of the spikes is considered. Even though encoding is non-linear (and hence the failure of the Wiener linear approximation), decoding for some neurons seems linear. Cons Only one neuron is modeled. No explicit form of P(X|Y)

Other Linear Methods For populations of neurons Assumptions: 1. Y i = number of spikes from neuron i in a time window. 2. X(t) is randomly chosen and continuously varying. Vector Method (Georgopoulos et al 1983) X(t) =  i Y i C i where C i is the preferred stimulus for neuron i. Optimal Linear Estimator (OLE) (Abbot and Salinas 1994) X(t) =  i Y i D i where D i is chosen so that   dt |X observed (t) -  i Y i D i | 2  Y  X is minimized so that D i =  j Q ij -1 L j where L j is center of mass of the tuning curve for cell i Q ij is the correlation of Y i and Y j

Other Linear Methods (Abbot and Salinas 1994) cricket cercal sensory system Evidence suggests that the cricket can code direction with an accuracy of up to 5 degrees. This data suggests that these algorithms decode direction as well as the cricket does in this experiment. Difference between the stimulus reconstructed by the Vector and OLE methods and the true stimulus presented.

Pros Vector Method –It’s cheap –This method is ideal when the tuning curve is a (half) cosine. –Has small error if C i are orthogonal OLE –Has smallest average MSE of all linear methods over a population of neurons Cons Vector Method –It is not always obvious what the preferred stimulus C i is for generic stimuli. –Does not work well if the C i are not uniformly distributed (orthogonal) –Requires a lot of neurons in practice Counting spikes/time neglects the temporal pattern of the spikes of the neural response Y(t). No explicit form of P(X|Y)!

Gaussian Model (de Ruyter van Steveninck and Bialek 1988) In experiment, let X(t) be a randomly chosen, continuously varying (GWN) Approximate P(X|Y) with a Gaussian with mean X|Y and covariance X|Y computed from data. Y(t) mean X|Y Reike et al. 1997 recordings from H1 of the fly

Pros The temporal pattern of the spikes is considered. We have an explicit form for P(X|Y). Why should P(X|Y) be Gaussian? This choice is justified by Jaynes maximum entropy principle: of all models that satisfy a given set of constraints, choose the one that maximizes the entropy. For a fixed mean and covariance, the Gaussian is the maximum entropy model. Cons An inordinate amount of data is required to obtain good estimates of covariance X|Y=y over all observed y(t). – One way to deal with the problem of not having enough data is to cluster the responses together and estimate a gaussian model for each response cluster.

Metric Space Approach (Victor and Purpura 1996) We desire a rigorous decoding method that … Estimates P(X|Y). Takes the temporal structure of the spikes of the neural responses Y(t) into account. Deals with the insufficient data problem by clustering the responses. Assumptions: The stimuli, X 1, X 2, …, X C, must be repeated multiple times. There are a total of T neural responses: Y 1, …, Y T. Abbot, 2001

The Method: 1. Given two spike trains, Y i and Y j, the distance between them is defined by the metric D[q](Y i, Y j ), the minimum cost required to transform Y i into Y j via a path of elementary steps: a. adding or deleting a spike (cost = 1) b. shifting a spike in time by  t (cost = q ·|  t|) –1/q is a measure of the temporal precision of the metric. –D[q=0](Y i, Y j ) is just the difference in the number of spikes between the spike trains Y i and Y j. Decoding based on this metric is just counting spikes. –D[q=  ](Y i, Y j ) gives infinitesimally precise timing of the spikes. Metric Space Approach YiYi YjYj

2.Let r 1, r 2, … r C be response classes.

00000 00000 00000 00000 00000 Metric Space Approach 2.Let r 1, r 2, … r C be response classes. Let N be the classification matrix. N: the classification matrix r 1 r 2 r 3 r 4 r 5 X 1 X 2 X 3 X 4 X 5 X r

00000 00000 00000 00000 00000 Metric Space Approach 2.Let r 1, r 2, … r C be response classes. Let N be the classification matrix. 3.Suppose that Y 1 was elicited by X 1. Assign Y 1 to response class r 3 if   D[q](Y 1, Y) z  Y elicited by X_ 3  1/z is the minimum over all X k for k = 1, …, C. N: the classification matrix r 1 r 2 r 3 r 4 r 5 X 1 X 2 X 3 X 4 X 5

00100 00000 00000 00000 00000 Metric Space Approach 2.Let r 1, r 2, … r C be response classes. Let N be the classification matrix. 3.Suppose that Y 1 was elicited by X 1. Assign Y 1 to response class r 3 if   D[q](Y 1, Y) z  Y elicited by X_ 3  1/z is the minimum over all X k for k = 1, …, C. 4. Increment N 1, 3 by 1. N: the classification matrix r 1 r 2 r 3 r 4 r 5 X 1 X 2 X 3 X 4 X 5

00100 00000 00000 00000 00000 Metric Space Approach 2.Let r 1, r 2, … r C be response classes. Let N be the classification matrix. 3.Suppose that Y i was elicited by X . Assign Y i to response class r  if   D[q](Y i, Y) z  Y elicited by X_   1/z is the minimum over all X k for k = 1, …, C. 4. Increment N ,  by 1. 5. Repeat steps 3 and 4 for Y i for i = 1, …, T. N: the classification matrix r 1 r 2 r 3 r 4 r 5 X 1 X 2 X 3 X 4 X 5

311321 00000 00000 00000 00000 Metric Space Approach 2.Let r 1, r 2, … r C be response classes. Let N be the classification matrix. 3.Suppose that Y i was elicited by X . Assign Y i to response class r  if   D[q](Y i, Y) z  Y elicited by X_   1/z is the minimum over all X k for k = 1, …, C. 4. Increment N ,  by 1. 5. Repeat steps 3 and 4 for Y i for i = 1, …, T. After repeating the process for all Y elicited by X 1 … N: the classification matrix r 1 r 2 r 3 r 4 r 5 X 1 X 2 X 3 X 4 X 5 X 1 was presented 20 times eliciting 20 neural responses

311321 510320 00000 00000 00000 r 1 r 2 r 3 r 4 r 5 X 1 X 2 X 3 X 4 X 5 Metric Space Approach 2.Let r 1, r 2, … r C be response classes. Let N be the classification matrix. 3.Suppose that Y i was elicited by X . Assign Y i to response class r  if   D[q](Y i, Y) z  Y elicited by X_   1/z is the minimum over all X k for k = 1, …, C. 4. Increment N ,  by 1. 5. Repeat steps 3 and 4 for Y i for i = 1, …, T. Then for all Y elicited by X 2 … N: the classification matrix X 2 was presented 20 times eliciting 20 neural responses

311321 510320 111512 104213 23258 Metric Space Approach 2.Let r 1, r 2, … r C be response classes. Let N be the classification matrix. 3.Suppose that Y i was elicited by X . Assign Y i to response class r  if   D[q](Y i, Y) z  Y elicited by X_   1/z is the minimum over all X k for k = 1, …, C. 4. Increment N ,  by 1. 5. Repeat steps 3 and 4 for Y i for i = 1, …, T. Until we have repeated the process for all Y, including the ones elicited by X C. N: the classification matrix r 1 r 2 r 3 r 4 r 5 X 1 X 2 X 3 X 4 X 5 In this example, the T=100 responses (there were 20 neural responses elicited by each stimulus) were quantized or clustered into 5 classes. X r

Note that by normalizing the columns of the matrix N, we get the decoder P(X|r). Decode a neural response Y(t) by looking up its response class r in the normalized matrix N: Metric Space Approach.25.44.11.17.04.42.40.11.170.08.04.56.08 0.15.17.54.17.12.07.42.33 P(X | r) r 1 r 2 r 3 r 4 r 5 X 1 X 2 X 3 X 4 X 5

Pros The responses are clustered together. P(X|r) estimates P(X|Y). Considers the temporal pattern of the spikes. Minimizing the cost function D[q] is intuitively a nice way to quantify jitter in the spike trains. In information theory, this type of cost function is called a distortion function. What to choose for q and z ? The values that maximize the transmitted information from stimulus to response. Cons D[q] imposes our assumptions of what is important in the structure of spike trains (namely that shifts and spike insertions/deletions are important). The space of possible stimuli for some systems is a very large space, so it is not possible to present all possible stimuli in experiment.

Tune in next week, on Friday at the CBS seminar, to see how our method deals with these issues. So we’re looking for a decoding algorithm that … Produces an estimate of P(X|Y) as well as of X(  )|Y(t). Considers the temporal structure of the spike trains Y(t). Makes no assumptions about the linearity of decoding. Does not require that all stimuli be presented. That is, X(t) ought to be randomly chosen and continuously varying (such as a GWN stimulus). Considers a population of neurons. Deals with the problem of never having enough data by clustering the neural responses.

Neural Coding Through The Ages February 1, 2002 Albert E. Parker Complex Biological Systems Department of Mathematical Sciences Center for Computational.

Similar presentations

Presentation on theme: "Neural Coding Through The Ages February 1, 2002 Albert E. Parker Complex Biological Systems Department of Mathematical Sciences Center for Computational."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Neural Coding Through The Ages February 1, 2002 Albert E. Parker Complex Biological Systems Department of Mathematical Sciences Center for Computational.

Similar presentations

Presentation on theme: "Neural Coding Through The Ages February 1, 2002 Albert E. Parker Complex Biological Systems Department of Mathematical Sciences Center for Computational."— Presentation transcript:

Similar presentations

About project

Feedback