Presentation is loading. Please wait.

Presentation is loading. Please wait.

Collaborators: Tomas Gedeon Alexander Dimitrov John P. Miller Zane Aldworth Information Theory and Neural Coding PhD Oral Examination November 29, 2001.

Similar presentations


Presentation on theme: "Collaborators: Tomas Gedeon Alexander Dimitrov John P. Miller Zane Aldworth Information Theory and Neural Coding PhD Oral Examination November 29, 2001."— Presentation transcript:

1 Collaborators: Tomas Gedeon Alexander Dimitrov John P. Miller Zane Aldworth Information Theory and Neural Coding PhD Oral Examination November 29, 2001 Albert E. Parker Complex Biological Systems Department of Mathematical Sciences Center for Computational Biology Montana State University

2 Outline  The Problem  Our Approach  Build a Model: Probability and Information Theory  Use the Model: Optimization  Results  Bifurcation Theory  Future Work

3 Why are we interested in neural coding? We are computationalists: All computations underlying an animal's behavioral decisions are carried out within the context of neural codes. Neural prosthetics: to enable a silicon device (artificial retina, cochlea, etc.) to interface with the human nervous system.

4 Neural Coding and Decoding The Problem: Determine a coding scheme: How does neural activity represent information about environmental stimuli? Demands: An animal needs to recognize the same object on repeated exposures. Coding has to be deterministic at this level. The code must deal with uncertainties introduced by the environment and neural architecture. Coding is by necessity stochastic at this finer scale. Major Obstacle: The search for a coding scheme requires large amounts of data

5 How to determine a coding scheme? Idea: Model a part of a neural system as a communication channel using Information Theory. This model enables us to: Meet the demands of a coding scheme: oDefine a coding scheme as a relation between stimulus and neural response classes. oConstruct a coding scheme that is stochastic on the finer scale yet almost deterministic on the classes. Deal with the major obstacle: oUse whatever quantity of data is available to construct coarse but optimally informative approximations of the coding scheme. oRefine the coding scheme as more data becomes available.

6 Probability Framework (coding scheme ~ encoder) (1) Want to Find: The Encoder X Y Q(Y |X) environmental stimuli neural responses The coding scheme between X and Y is defined by the conditional probability Q.

7 Probability Framework (elements of the respective probability spaces) stimulus X=xneural response Y=y (2) We have data: realizations of the r.v.’s X and Y Q(Y=y|X=x)  k = 25ms windows over discretized time X Y environmental stimuli neural responses Q(Y |X) {0,1} k = 10 ms windows over discretized time We assume that X n (w) = X(T n (w)) and Y n (w) = Y(T n (w)) are stationary ergodic r.v.s, where T is a time shift.

8 Y 12341234 X environmental stimuli neural responses Overview of Our Approach How to determine stimulus/response classes? Given a joint probability p(X,Y):

9 The Stimulus and Response Classes environmental stimuli neural responses Distinguishable stimulus/response classes Y X 12341234

10 Information Theory The Foundation of the Model A signal x is produced by a source (r.v.) X with a probability p(X=x). A signal y is produced by another source Y with probability p(Y=y). A communication channel is a relation between two r.v.’s X and Y. It is described by the (finite) conditional probability or quantizer: Q(Y | X). Entropy: the uncertainty or self information of a r. v. Conditional Entropy: Mutual Information: the amount of information that one r.v. contains about another r.v.

11 The entropy and mutual information of the data asymptotically approach the true population entropy and mutual information respectively. Shannon McMillan Breiman Theorem (iid case) If are i.i.d., then Proof: Let Y i =log p(X i )  are i.i.d.. The theorem follows from the Strong Law of Large Numbers. This result holds if is a stationary ergodic sequences as well. This is important for us since our data is not i.i.d., but we do assume that X and Y are stationary ergodic. Why Information Theory?

12 Conceptual Model Major Obstacle: To determine a coding scheme, Q, between X and Y requires large amounts of data Idea: Determine the coding scheme, Q *, between X and Y N, a quantization of Y, such that: Y N preserves as much mutual information with X as possible: X Y Q(Y |X) environmental stimuli neural responses YNYN quantized neural responses q * (Y N |Y) New Goal: Find the quantizer q * (Y N |Y) that maximizes I(X,Y N ) Q * (Y N |X)

13 Mathematical Model We search for the maximizer q * (Y N |Y) which satisfies max q  H(Y N |Y) constrained by I(X,Y N ) = I max I max := max q  I(X,Y N ) The feasible region  assures that q * (Y N |Y) is a true conditional probability.  y  y is a product of simplices (each simplex is a discrete probability space)

14 We begin our search for the maximizer q * (Y N |Y) by solving Œ q * = argmax q  I(X,Y N ) If there are multiple solutions to , then, by Jayne's maximum entropy principle, we take the one that maximizes the entropy max q  H(Y N |Y) constrained by I(X,Y N ) = I max  In order to solve, use the method of lagrange multipliers to get max q  H(Y N |Y) +  I(X,Y N ) Annealing: In practice, we increment  in small steps to . For each , we solve q*  = argmax q  H(Y N |Y) +  I(X,Y N ) Note that lim    q*  = q * from . Justification

15 Some nice properties of the model The information functions are nice. Theorem 1 H(Y N |Y) is concave, I(X,Y N ) is convex.  is really nice. Lemma 2  is the convex hull of vertices (  ). We can reformulate as two different optimization problems Theorem 3 An equivalent problem to is to solve q * (Y N |Y) = argmax q  vertices(  ) I(X,Y N ) Proof: This result follows from Theorem 1 and Lemma 2. Corollary 4 The extrema of lie on the vertices of . Theorem 5 If q* M is the maximizer of constrained by I(X,Y N ) = I max then q* = 1/M q* M, where Proof: By Theorem 3 and the fact that  M is the convex hull of vertices(  M )

16  Annealing: q*  = argmax q  H(Y N |Y) +  I(X,Y N )  Augmented Lagrangian Method  Implicit solution: Set. Solve implicitly for q: Drawback: current choice of  is ad hoc.  Vertex Search Algorithm: max q  vertices(  ) I(X,Y N ) Drawback: |vertices (  )| = N |Y| Optimization Schemes Goal: Build a sequence. 1 2 3

17 Results: Application to synthetic data (p(X,Y) is known) Four Blob Problem optimal quantizers q * (Y N |Y) for N=2, 3, 4, 5 I(X,Y N ) vs. N

18 Signal Nervous system Communication channel Modeling the cricket cercal sensory system as a communication channel

19 Why the cricket? The structure and details of the cricket cercal sensory system are well known. All of the neural activity (about 20 neurons) which encode the wind stimuli can be measured. Other sensory systems (e.g. mammalian visual cortex) consist of millions of neurons, which are impossible (today) to measure in totality.

20 ? ?

21 Sensory interneuron Sensory afferent cerci Preparation: the cricket cercal sensory system.

22 Wind stimulus and neural response in the cricket cercal system Neural Responses (over a 30 minute recording) caused by white noise wind stimulus. T, ms Neural Responses (these are all doublets) for a 10 ms window Some of the air current stimuli preceding one of the neural responses Time in ms. A t T=0, the first spike occurs X Y

23 YNYN Y Quantization for real data: A quantizer is any map f: Y -> Y N from Y to Y N with finitely many elements. Quantizers can be deterministic or refined Y probabilistic

24

25 Optimization problem for real data max q  H(Y N |Y) constrained by H(X)-H G (X|Y N ) = I max The equivalent annealing problem: max q  H(Y N |Y) -  H G (X|Y N )

26 Applying the algorithm to cricket sensory data. Y YNYN YNYN Class 1 Class 2 Class 1 Class 2 Class 3

27 Investigating the Bifurcation Structure Goal: To efficiently solve q*  = argmax q  H(Y N |Y) +  I(X,Y N ) for each  as . Idea: Choose  wisely. Method: Study the equilibria of the of the flow which are precisely the maximizers q* . The first equilibrium is q*  = 0  1/N. Search for bifurcations of the equilibria Use numerical continuation to choose  Conjecture: There are only pitchfork bifurcations  q*  (Y N |Y)

28 Bifurcations of q*  Observed Bifurcations for the 4 Blob Problem Conceptual Bifurcation Structure  q*  (Y N |Y)

29 Other Applications Solving problems of the form x*  = argmax H(Y | X) +  D are common in many fields: Clustering Compression and communications (GLA) Pattern recognition (ISODATA, K-mean) Regression

30 Future Work Bifurcation structure oCapitalize on the symmetries of q*  (Singularity and Group Theory) Annealing Algorithm Improvement oPerform optimization only at  where bifurcations occur oUse Numerical Continuation to choose  oImplicit Solution method q n+1 = f (q n,  ) converges reliably and quickly. Why? Investigate the superattracting directions. Perform optimization over a product of M-simplices Joint Quantization oQuantize X and Y simultaneously Better maximum entropy models for real data. Compare our method to others.

31


Download ppt "Collaborators: Tomas Gedeon Alexander Dimitrov John P. Miller Zane Aldworth Information Theory and Neural Coding PhD Oral Examination November 29, 2001."

Similar presentations


Ads by Google