The Information Bottleneck Method clusters the response space, Y, into a much smaller space, T. In order to informatively cluster the response space, the.

Slides:



Advertisements
Similar presentations
Mission-based Joint Optimal Resource Allocation in Wireless Multicast Sensor Networks Yun Hou Prof Kin K. Leung Archan Misra.
Advertisements

Feature Selection as Relevant Information Encoding Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS.
Experiments and Variables
Center for Computational Biology Department of Mathematical Sciences Montana State University Collaborators: Alexander Dimitrov Tomas Gedeon John P. Miller.
Continuation and Symmetry Breaking Bifurcation of the Information Distortion Function September 19, 2002 Albert E. Parker Complex Biological Systems Department.
Decision Making: An Introduction 1. 2 Decision Making Decision Making is a process of choosing among two or more alternative courses of action for the.
Pattern Recognition and Machine Learning
Principal Component Analysis
Modelling and Control Issues Arising in the Quest for a Neural Decoder Computation, Control, and Biological Systems Conference VIII, July 30, 2003 Albert.
Shapes and the Coordinate System TEKS 8.6 (A,B) & 8.7 (A,D)
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Symmetry Breaking Bifurcations of the Information Distortion Dissertation Defense April 8, 2003 Albert E. Parker III Complex Biological Systems Department.
Application of Statistical Techniques to Neural Data Analysis Aniket Kaloti 03/07/2006.
Center for Computational Biology Department of Mathematical Sciences Montana State University Collaborators: Alexander Dimitrov John P. Miller Zane Aldworth.
Symmetry Breaking Bifurcation of the Distortion Problem Albert E. Parker Complex Biological Systems Department of Mathematical Sciences Center for Computational.
We use Numerical continuation Bifurcation theory with symmetries to analyze a class of optimization problems of the form max F(q,  )=max (G(q)+  D(q)).
Symmetry breaking clusters when deciphering the neural code September 12, 2005 Albert E. Parker Department of Mathematical Sciences Center for Computational.
Center for Computational Biology Department of Mathematical Sciences Montana State University Collaborators: Alexander Dimitrov John P. Miller Zane Aldworth.
Mutual Information for Image Registration and Feature Selection
Machine Learning CMPT 726 Simon Fraser University
Support Vector Machine (SVM) Classification
Efficient Methodologies for Reliability Based Design Optimization
A Bifurcation Theoretical Approach to the Solving the Neural Coding Problem June 28 Albert E. Parker Complex Biological Systems Department of Mathematical.
Support Vector Machines
Collaborators: Tomas Gedeon Alexander Dimitrov John P. Miller Zane Aldworth Information Theory and Neural Coding PhD Oral Examination November 29, 2001.
1 Computational Learning Theory and Kernel Methods Tianyi Jiang March 8, 2004.
We use Numerical continuation Bifurcation theory with symmetries to analyze a class of optimization problems of the form max F(q,  )=max (G(q)+  D(q)).
A452 – Programming project – Mark Scheme
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Phase Transitions in the Information Distortion NIPS 2003 workshop on Information Theory and Learning: The Bottleneck and Distortion Approach December.
Incomplete Graphical Models Nan Hu. Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture.
Copyright © Cengage Learning. All rights reserved. CHAPTER 11 ANALYSIS OF ALGORITHM EFFICIENCY ANALYSIS OF ALGORITHM EFFICIENCY.
Ch 8.1 Numerical Methods: The Euler or Tangent Line Method
1 Hybrid methods for solving large-scale parameter estimation problems Carlos A. Quintero 1 Miguel Argáez 1 Hector Klie 2 Leticia Velázquez 1 Mary Wheeler.
CS774. Markov Random Field : Theory and Application Lecture 08 Kyomin Jung KAIST Sep
A REVIEW OF OCCUPANCY PROBLEMS AND THEIR APPLICATIONS WITH A MATLAB DEMO Samuel Khuvis, Undergraduate Nagaraj Neerchal, Professor of Statistics Department.
Chapter 4 Correlation and Regression Understanding Basic Statistics Fifth Edition By Brase and Brase Prepared by Jon Booze.
CSIE Dept., National Taiwan Univ., Taiwan
Frankfurt (Germany), 6-9 June 2011 Pyeongik Hwang School of Electrical Engineering Seoul National University Korea Hwang – Korea – RIF Session 4a – 0324.
Particle Filters for Shape Correspondence Presenter: Jingting Zeng.
1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.
Ch 9.8: Chaos and Strange Attractors: The Lorenz Equations
Managerial Economics Managerial Economics = economic theory + mathematical eco + statistical analysis.
Akram Bitar and Larry Manevitz Department of Computer Science
Optimal Component Analysis Optimal Linear Representations of Images for Object Recognition X. Liu, A. Srivastava, and Kyle Gallivan, “Optimal linear representations.
July 11, 2006Bayesian Inference and Maximum Entropy Probing the covariance matrix Kenneth M. Hanson T-16, Nuclear Physics; Theoretical Division Los.
Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
Ultra-high dimensional feature selection Yun Li
Chance Constrained Robust Energy Efficiency in Cognitive Radio Networks with Channel Uncertainty Yongjun Xu and Xiaohui Zhao College of Communication Engineering,
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.
A Hybrid Optimization Approach for Automated Parameter Estimation Problems Carlos A. Quintero 1 Miguel Argáez 1, Hector Klie 2, Leticia Velázquez 1 and.
Instructor : Dr. K. R. Rao Presented by : Vigneshwaran Sivaravindiran
A new approach for the gamma tracking clustering by the deterministic annealing method François Didierjean IPHC, Strasbourg.
Center for Computational Biology Department of Mathematical Sciences Montana State University Collaborators: Alexander Dimitrov John P. Miller Zane Aldworth.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
A Hybrid Optimization Approach for Automated Parameter Estimation Problems Carlos A. Quintero 1 Miguel Argáez 1, Hector Klie 2, Leticia Velázquez 1 and.
CPH Dr. Charnigo Chap. 11 Notes Figure 11.2 provides a diagram which shows, at a glance, what a neural network does. Inputs X 1, X 2,.., X P are.
Bayesian Neural Networks
CSIE Dept., National Taiwan Univ., Taiwan
Deep Feedforward Networks
Classification Analytical methods classical methods
Department of Civil and Environmental Engineering
Latent Variables, Mixture Models and EM
Generally Discriminant Analysis
A Block Based MAP Segmentation for Image Compression
Akram Bitar and Larry Manevitz Department of Computer Science
Presentation transcript:

The Information Bottleneck Method clusters the response space, Y, into a much smaller space, T. In order to informatively cluster the response space, the mutual information function between Y and T must be minimized with the constraint that the mutual information between the stimulus space, X, and T must be greater than or equal to some distortion rate, I 0. The Basic Optimization Problem: 2,4 I 0 is an information rate q ≔ q(T|Y) is a conditional probability that maps Y to T Δ ≔ {q(T|Y) : } I(X;T) ≔ I(Y;T) ≔ To solve this, we utilize the method of annealing. The algorithm for this method is 2,3 1.Perform β-step: Let β j+1 = β j +s j where s j >0 2.Take q (0) j+1 = q j + η as an initial guess for the solution q j+1 at β j+1 3.Solve to get maximizer q j+1 How to Determine Solutions: 2,4 The Hessian: The Lagrangian: If the Hessian is negative definite on ker(J 1 ) or ker(J 2 ) for a certain q*, then that q* is a solution to the initial problem. How to Determine Bifurcations: 2 The Gradient Flow: If is either positive or negative definite on ker( ) for a (q*,λ*,β*), then that (q*,λ*,β*) is a singularity of the gradient flow if and only if (q*,λ*,β*) is a bifurcation point. Results 1.The Information Bottleneck Method is providing nearly deterministic clusters of the Gaussian Four-Blob distribution. 2.Evaluating a bifurcation discriminator, we have numerical evidence that subcritical bifurcations exist, which is contrary to the hypotheses of others working with this method. Conclusions The Information Bottleneck Method is providing clusters of the information that accurately models the original system. However, we have numerical evidence that sub-critical branches exist. In other words, optimal clustering schemes not found by the method of annealing may exist. This will be further explored by making an annealing along branches predicted to be sub- critical using continuation techniques. Building Soft Clusters of Data via the Information Bottleneck Method Russell J. Ricker 1, Albert E. Parker 1, Tomáš Gedeon 1, Alexander G. Dimitrov 2 1 Montana State University Department of Mathematical Sciences 2 Washington State University Department of Mathematics Abstract Information-based distortion methods have been used successfully to analyze the relationship between stimulus and reaction spaces, such as in neural coding and text classification. Distortion methods make few assumptions concerning the correspondence between the two spaces, providing maximally informative relationships between them. I used the Information Bottleneck technique to create soft clustering of a synthetic data set with 50 stimuli and 50 neural responses with a multivariate Gaussian (with 4-blobs) which reconstructed their hypothetical relationship. The algorithm utilized an annealing method to solve the high-dimensional non-linear problem, and was implemented using Matlab®. As the annealing parameter increased, the solution to the problem underwent a series of phase transitions, or bifurcations, that eventually stabilized to a nearly deterministic clustering. By calculating the matrix of second derivatives (the Hessian), we are able to determine when the bifurcations occur. By calculating arrays of third and fourth derivatives, we are able to determine whether the bifurcations are subcritical or supercritical. The existence of subcritical branches implies that several solutions not found by the method of annealing exist. Because the method of annealing is guaranteed to converge, the subcritical branch must turn at a later bifurcation and become optimal. I show that we can obtain nearly optimal information preservation for N=4 classes, as well as provide evidence that sub-critical bifurcations exist. References [1] T. Cover and J. Thomas, Elements of Information Theory. New York: Wiley Series in Communication (1991). [2] A. E. Parker, A. G. Dimitrov, T. Gedeon. “Symmetry Breaking in Soft Clustering Decoding of Neural Codes.” IEEE Trans. Info. Theory, 56.2 (2010). Pp [3] K. Rose, “Deterministic annealing for clustering, compression, classification, regression, and related optimization problems,” Proc. IEEE, (1998). Pp [4] T. Gedeon, A. E. Parker, A. G. Dimitrov. “The Mathematical Structure of Information Bottleneck Methods.” Entropy, 14.3 (2012). Pp Figure 1. A 52-by-52 class Gaussian distribution with well-separated peaks. This was the test distribution used to obtain the results. The vertical axis is the input space, the horizontal axis is the output space, and the dark areas correspond to high-probability areas. Theoretical Background Figure 2. The Annealing parameter β versus value of the mutual information between the input space X and the clustered output space T for the Information Bottleneck Method. The dotted line is the value of the mutual information between X and Y. Figure 3. In this picture we can see the optimal clusterings for each of N=2, 3, 4 and 5 clusters. N=2 and N=3 display mismatched clusters; N=5 shows a cluster that has not yet resolved. N=4 displays well-divided and nearly deterministic clusters. Figure 5. This image, from “Symmetry Breaking in Soft Clustering Decoding of Neural Codes,” demonstrates the bifurcating branch. This figure was derived from the Information Distortion Method. Figure 4. Bifurcation Diagram when N=4. The panels illustrate the sequence of symmetry-breaking bifurcations from the branch (q 1/N,λ,β) with symmetry S 4, to symmetry S 3 (blue), to symmetry S 2 (black), to symmetry S 1 (cyan). From “Symmetry”, p17. NDiscriminator e e