Collaborators: Tomas Gedeon Alexander Dimitrov John P. Miller Zane Aldworth Information Theory and Neural Coding PhD Oral Examination November 29, 2001.

Slides:



Advertisements
Similar presentations
Ulams Game and Universal Communications Using Feedback Ofer Shayevitz June 2006.
Advertisements

Feature Selection as Relevant Information Encoding Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS.
Partially Observable Markov Decision Process (POMDP)
Center for Computational Biology Department of Mathematical Sciences Montana State University Collaborators: Alexander Dimitrov Tomas Gedeon John P. Miller.
Optimization of thermal processes2007/2008 Optimization of thermal processes Maciej Marek Czestochowa University of Technology Institute of Thermal Machinery.
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
Michael A. Nielsen University of Queensland Quantum entropy Goals: 1.To define entropy, both classical and quantum. 2.To explain data compression, and.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
STAT 497 APPLIED TIME SERIES ANALYSIS
Continuation and Symmetry Breaking Bifurcation of the Information Distortion Function September 19, 2002 Albert E. Parker Complex Biological Systems Department.
1 Testing the Efficiency of Sensory Coding with Optimal Stimulus Ensembles C. K. Machens, T. Gollisch, O. Kolesnikova, and A.V.M. Herz Presented by Tomoki.
Shin Ishii Nara Institute of Science and Technology
Chapter 6 Information Theory
Visual Recognition Tutorial
Modelling and Control Issues Arising in the Quest for a Neural Decoder Computation, Control, and Biological Systems Conference VIII, July 30, 2003 Albert.
Symmetry Breaking Bifurcations of the Information Distortion Dissertation Defense April 8, 2003 Albert E. Parker III Complex Biological Systems Department.
Fundamental limits in Information Theory Chapter 10 :
Today: Entropy Information Theory. Claude Shannon Ph.D
Center for Computational Biology Department of Mathematical Sciences Montana State University Collaborators: Alexander Dimitrov John P. Miller Zane Aldworth.
2015/6/15VLC 2006 PART 1 Introduction on Video Coding StandardsVLC 2006 PART 1 Variable Length Coding  Information entropy  Huffman code vs. arithmetic.
Symmetry Breaking Bifurcation of the Distortion Problem Albert E. Parker Complex Biological Systems Department of Mathematical Sciences Center for Computational.
We use Numerical continuation Bifurcation theory with symmetries to analyze a class of optimization problems of the form max F(q,  )=max (G(q)+  D(q)).
Symmetry breaking clusters when deciphering the neural code September 12, 2005 Albert E. Parker Department of Mathematical Sciences Center for Computational.
Center for Computational Biology Department of Mathematical Sciences Montana State University Collaborators: Alexander Dimitrov John P. Miller Zane Aldworth.
Integrating POMDP and RL for a Two Layer Simulated Robot Architecture Presented by Alp Sardağ.
A Bifurcation Theoretical Approach to the Solving the Neural Coding Problem June 28 Albert E. Parker Complex Biological Systems Department of Mathematical.
We use Numerical continuation Bifurcation theory with symmetries to analyze a class of optimization problems of the form max F(q,  )=max (G(q)+  D(q)).
Neural Coding Through The Ages February 1, 2002 Albert E. Parker Complex Biological Systems Department of Mathematical Sciences Center for Computational.
2015/7/12VLC 2008 PART 1 Introduction on Video Coding StandardsVLC 2008 PART 1 Variable Length Coding  Information entropy  Huffman code vs. arithmetic.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
Information Theory and Security
Phase Transitions in the Information Distortion NIPS 2003 workshop on Information Theory and Learning: The Bottleneck and Distortion Approach December.
Noise, Information Theory, and Entropy
Recognition stimulus input Observer (information transmission channel) response Response: which category the stimulus belongs to ? What is the “information.
Noise, Information Theory, and Entropy
INFORMATION THEORY BYK.SWARAJA ASSOCIATE PROFESSOR MREC.
1. Entropy as an Information Measure - Discrete variable definition Relationship to Code Length - Continuous Variable Differential Entropy 2. Maximum Entropy.
Tahereh Toosi IPM. Recap 2 [Churchland and Abbott, 2012]
1 Hybrid methods for solving large-scale parameter estimation problems Carlos A. Quintero 1 Miguel Argáez 1 Hector Klie 2 Leticia Velázquez 1 Mary Wheeler.
Lecture 2 Signals and Systems (I)
Communication System A communication system can be represented as in Figure. A message W, drawn from the index set {1, 2,..., M}, results in the signal.
Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.
The Information Bottleneck Method clusters the response space, Y, into a much smaller space, T. In order to informatively cluster the response space, the.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Precise and Approximate Representation of Numbers 1.The Cartesian-Lagrangian representation of numbers. 2.The homotopic representation of numbers 3.Loops.
Outline Transmitters (Chapters 3 and 4, Source Coding and Modulation) (week 1 and 2) Receivers (Chapter 5) (week 3 and 4) Received Signal Synchronization.
Coding Theory Efficient and Reliable Transfer of Information
1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Lecture 10 Rate-Distortion.
ECE 8443 – Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem Proof EM Example – Missing Data Intro to Hidden Markov Models.
1 Lecture 7 System Models Attributes of a man-made system. Concerns in the design of a distributed system Communication channels Entropy and mutual information.
Discrete Random Variables. Introduction In previous lectures we established a foundation of the probability theory; we applied the probability theory.
1 EE571 PART 3 Random Processes Huseyin Bilgekul Eeng571 Probability and astochastic Processes Department of Electrical and Electronic Engineering Eastern.
A Brief Maximum Entropy Tutorial Presenter: Davidson Date: 2009/02/04 Original Author: Adam Berger, 1996/07/05
Basic Concepts of Information Theory A measure of uncertainty. Entropy. 1.
Rate Distortion Theory. Introduction The description of an arbitrary real number requires an infinite number of bits, so a finite representation of a.
Network Science K. Borner A.Vespignani S. Wasserman.
Channel Coding Theorem (The most famous in IT) Channel Capacity; Problem: finding the maximum number of distinguishable signals for n uses of a communication.
Workshop on the Elements of Predictability LOGISTICS BACKGROUND AND INTRODUCTION Roger Ghanem John Red-Horse Steve Wojtkiewicz Thanks to: Department of.
Center for Computational Biology Department of Mathematical Sciences Montana State University Collaborators: Alexander Dimitrov John P. Miller Zane Aldworth.
Approximation Algorithms based on linear programming.
The Problem of Pattern and Scale in Ecology - Summary What did this paper do that made it a citation classic? 1.It summarized a large body of work on spatial.
UNIT I. Entropy and Uncertainty Entropy is the irreducible complexity below which a signal cannot be compressed. Entropy is the irreducible complexity.
Decision Support Systems
LECTURE 10: EXPECTATION MAXIMIZATION (EM)
Latent Variables, Mixture Models and EM
Vincent Granville, Ph.D. Co-Founder, DSC
Optimization Techniques for Natural Resources SEFS 540 / ESRM 490 B
Dr. Arslan Ornek IMPROVING SEARCH
Locality In Distributed Graph Algorithms
Chapter 2. Simplex method
Presentation transcript:

Collaborators: Tomas Gedeon Alexander Dimitrov John P. Miller Zane Aldworth Information Theory and Neural Coding PhD Oral Examination November 29, 2001 Albert E. Parker Complex Biological Systems Department of Mathematical Sciences Center for Computational Biology Montana State University

Outline  The Problem  Our Approach  Build a Model: Probability and Information Theory  Use the Model: Optimization  Results  Bifurcation Theory  Future Work

Why are we interested in neural coding? We are computationalists: All computations underlying an animal's behavioral decisions are carried out within the context of neural codes. Neural prosthetics: to enable a silicon device (artificial retina, cochlea, etc.) to interface with the human nervous system.

Neural Coding and Decoding The Problem: Determine a coding scheme: How does neural activity represent information about environmental stimuli? Demands: An animal needs to recognize the same object on repeated exposures. Coding has to be deterministic at this level. The code must deal with uncertainties introduced by the environment and neural architecture. Coding is by necessity stochastic at this finer scale. Major Obstacle: The search for a coding scheme requires large amounts of data

How to determine a coding scheme? Idea: Model a part of a neural system as a communication channel using Information Theory. This model enables us to: Meet the demands of a coding scheme: oDefine a coding scheme as a relation between stimulus and neural response classes. oConstruct a coding scheme that is stochastic on the finer scale yet almost deterministic on the classes. Deal with the major obstacle: oUse whatever quantity of data is available to construct coarse but optimally informative approximations of the coding scheme. oRefine the coding scheme as more data becomes available.

Probability Framework (coding scheme ~ encoder) (1) Want to Find: The Encoder X Y Q(Y |X) environmental stimuli neural responses The coding scheme between X and Y is defined by the conditional probability Q.

Probability Framework (elements of the respective probability spaces) stimulus X=xneural response Y=y (2) We have data: realizations of the r.v.’s X and Y Q(Y=y|X=x)  k = 25ms windows over discretized time X Y environmental stimuli neural responses Q(Y |X) {0,1} k = 10 ms windows over discretized time We assume that X n (w) = X(T n (w)) and Y n (w) = Y(T n (w)) are stationary ergodic r.v.s, where T is a time shift.

Y X environmental stimuli neural responses Overview of Our Approach How to determine stimulus/response classes? Given a joint probability p(X,Y):

The Stimulus and Response Classes environmental stimuli neural responses Distinguishable stimulus/response classes Y X

Information Theory The Foundation of the Model A signal x is produced by a source (r.v.) X with a probability p(X=x). A signal y is produced by another source Y with probability p(Y=y). A communication channel is a relation between two r.v.’s X and Y. It is described by the (finite) conditional probability or quantizer: Q(Y | X). Entropy: the uncertainty or self information of a r. v. Conditional Entropy: Mutual Information: the amount of information that one r.v. contains about another r.v.

The entropy and mutual information of the data asymptotically approach the true population entropy and mutual information respectively. Shannon McMillan Breiman Theorem (iid case) If are i.i.d., then Proof: Let Y i =log p(X i )  are i.i.d.. The theorem follows from the Strong Law of Large Numbers. This result holds if is a stationary ergodic sequences as well. This is important for us since our data is not i.i.d., but we do assume that X and Y are stationary ergodic. Why Information Theory?

Conceptual Model Major Obstacle: To determine a coding scheme, Q, between X and Y requires large amounts of data Idea: Determine the coding scheme, Q *, between X and Y N, a quantization of Y, such that: Y N preserves as much mutual information with X as possible: X Y Q(Y |X) environmental stimuli neural responses YNYN quantized neural responses q * (Y N |Y) New Goal: Find the quantizer q * (Y N |Y) that maximizes I(X,Y N ) Q * (Y N |X)

Mathematical Model We search for the maximizer q * (Y N |Y) which satisfies max q  H(Y N |Y) constrained by I(X,Y N ) = I max I max := max q  I(X,Y N ) The feasible region  assures that q * (Y N |Y) is a true conditional probability.  y  y is a product of simplices (each simplex is a discrete probability space)

We begin our search for the maximizer q * (Y N |Y) by solving Œ q * = argmax q  I(X,Y N ) If there are multiple solutions to , then, by Jayne's maximum entropy principle, we take the one that maximizes the entropy max q  H(Y N |Y) constrained by I(X,Y N ) = I max  In order to solve, use the method of lagrange multipliers to get max q  H(Y N |Y) +  I(X,Y N ) Annealing: In practice, we increment  in small steps to . For each , we solve q*  = argmax q  H(Y N |Y) +  I(X,Y N ) Note that lim    q*  = q * from . Justification

Some nice properties of the model The information functions are nice. Theorem 1 H(Y N |Y) is concave, I(X,Y N ) is convex.  is really nice. Lemma 2  is the convex hull of vertices (  ). We can reformulate as two different optimization problems Theorem 3 An equivalent problem to is to solve q * (Y N |Y) = argmax q  vertices(  ) I(X,Y N ) Proof: This result follows from Theorem 1 and Lemma 2. Corollary 4 The extrema of lie on the vertices of . Theorem 5 If q* M is the maximizer of constrained by I(X,Y N ) = I max then q* = 1/M q* M, where Proof: By Theorem 3 and the fact that  M is the convex hull of vertices(  M )

 Annealing: q*  = argmax q  H(Y N |Y) +  I(X,Y N )  Augmented Lagrangian Method  Implicit solution: Set. Solve implicitly for q: Drawback: current choice of  is ad hoc.  Vertex Search Algorithm: max q  vertices(  ) I(X,Y N ) Drawback: |vertices (  )| = N |Y| Optimization Schemes Goal: Build a sequence

Results: Application to synthetic data (p(X,Y) is known) Four Blob Problem optimal quantizers q * (Y N |Y) for N=2, 3, 4, 5 I(X,Y N ) vs. N

Signal Nervous system Communication channel Modeling the cricket cercal sensory system as a communication channel

Why the cricket? The structure and details of the cricket cercal sensory system are well known. All of the neural activity (about 20 neurons) which encode the wind stimuli can be measured. Other sensory systems (e.g. mammalian visual cortex) consist of millions of neurons, which are impossible (today) to measure in totality.

? ?

Sensory interneuron Sensory afferent cerci Preparation: the cricket cercal sensory system.

Wind stimulus and neural response in the cricket cercal system Neural Responses (over a 30 minute recording) caused by white noise wind stimulus. T, ms Neural Responses (these are all doublets) for a 10 ms window Some of the air current stimuli preceding one of the neural responses Time in ms. A t T=0, the first spike occurs X Y

YNYN Y Quantization for real data: A quantizer is any map f: Y -> Y N from Y to Y N with finitely many elements. Quantizers can be deterministic or refined Y probabilistic

Optimization problem for real data max q  H(Y N |Y) constrained by H(X)-H G (X|Y N ) = I max The equivalent annealing problem: max q  H(Y N |Y) -  H G (X|Y N )

Applying the algorithm to cricket sensory data. Y YNYN YNYN Class 1 Class 2 Class 1 Class 2 Class 3

Investigating the Bifurcation Structure Goal: To efficiently solve q*  = argmax q  H(Y N |Y) +  I(X,Y N ) for each  as . Idea: Choose  wisely. Method: Study the equilibria of the of the flow which are precisely the maximizers q* . The first equilibrium is q*  = 0  1/N. Search for bifurcations of the equilibria Use numerical continuation to choose  Conjecture: There are only pitchfork bifurcations  q*  (Y N |Y)

Bifurcations of q*  Observed Bifurcations for the 4 Blob Problem Conceptual Bifurcation Structure  q*  (Y N |Y)

Other Applications Solving problems of the form x*  = argmax H(Y | X) +  D are common in many fields: Clustering Compression and communications (GLA) Pattern recognition (ISODATA, K-mean) Regression

Future Work Bifurcation structure oCapitalize on the symmetries of q*  (Singularity and Group Theory) Annealing Algorithm Improvement oPerform optimization only at  where bifurcations occur oUse Numerical Continuation to choose  oImplicit Solution method q n+1 = f (q n,  ) converges reliably and quickly. Why? Investigate the superattracting directions. Perform optimization over a product of M-simplices Joint Quantization oQuantize X and Y simultaneously Better maximum entropy models for real data. Compare our method to others.