July 11, 2006Bayesian Inference and Maximum Entropy 20061 Probing the covariance matrix Kenneth M. Hanson T-16, Nuclear Physics; Theoretical Division Los.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Pattern Recognition and Machine Learning
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Pattern Recognition and Machine Learning
Kriging.
Uncertainty and confidence intervals Statistical estimation methods, Finse Friday , 12.45–14.05 Andreas Lindén.
TNO orbit computation: analysing the observed population Jenni Virtanen Observatory, University of Helsinki Workshop on Transneptunian objects - Dynamical.
Linear Models for Classification: Probabilistic Methods
Visual Recognition Tutorial
Lecture 2 Probability and Measurement Error, Part 1.
How well can we learn what the stimulus is by looking at the neural responses? We will discuss two approaches: devise and evaluate explicit algorithms.
Lecture 9 Inexact Theories. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture 03Probability and.
Stochastic Differentiation Lecture 3 Leonidas Sakalauskas Institute of Mathematics and Informatics Vilnius, Lithuania EURO Working Group on Continuous.
Lecture 8 The Principle of Maximum Likelihood. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture.
Machine Learning CMPT 726 Simon Fraser University
7. Least squares 7.1 Method of least squares K. Desch – Statistical methods of data analysis SS10 Another important method to estimate parameters Connection.
Estimation and the Kalman Filter David Johnson. The Mean of a Discrete Distribution “I have more legs than average”
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Maximum likelihood (ML)
(1) A probability model respecting those covariance observations: Gaussian Maximum entropy probability distribution for a given covariance observation.
Lecture II-2: Probability Review
CSC2535: 2013 Advanced Machine Learning Lecture 3a: The Origin of Variational Bayes Geoffrey Hinton.
Principles of the Global Positioning System Lecture 10 Prof. Thomas Herring Room A;
Hydrologic Statistics
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
STATISTICS: BASICS Aswath Damodaran 1. 2 The role of statistics Aswath Damodaran 2  When you are given lots of data, and especially when that data is.
Chapter 15 Modeling of Data. Statistics of Data Mean (or average): Variance: Median: a value x j such that half of the data are bigger than it, and half.
NASSP Masters 5003F - Computational Astronomy Lecture 4 Random variables continued. Monte Carlo Uncertainty propagation Correct presentation of.
Model Inference and Averaging
July 9, 2007Bayesian Inference and Maximum Entropy Lessons about Likelihood Functions from Nuclear Physics Kenneth M. Hanson T-16, Nuclear Physics;
Lecture 3: Inference in Simple Linear Regression BMTRY 701 Biostatistical Methods II.
IID Samples In supervised learning, we usually assume that data points are sampled independently and from the same distribution IID assumption: data are.
Perceptual and Sensory Augmented Computing Advanced Machine Learning Winter’12 Advanced Machine Learning Lecture 3 Linear Regression II Bastian.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Learning Lateral Connections between Hidden Units Geoffrey Hinton University of Toronto in collaboration with Kejie Bao University of Toronto.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
Applications of optimal control and EnKF to Flow Simulation and Modeling Florida State University, February, 2005, Tallahassee, Florida The Maximum.
BCS547 Neural Decoding.
Chapter 8: Simple Linear Regression Yang Zhenlin.
NON-LINEAR REGRESSION Introduction Section 0 Lecture 1 Slide 1 Lecture 6 Slide 1 INTRODUCTION TO Modern Physics PHYX 2710 Fall 2004 Intermediate 3870 Fall.
6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,
Maximum likelihood estimators Example: Random data X i drawn from a Poisson distribution with unknown  We want to determine  For any assumed value of.
ESTIMATION METHODS We know how to calculate confidence intervals for estimates of  and  2 Now, we need procedures to calculate  and  2, themselves.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Joo-kyung Kim Biointelligence Laboratory,
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
R. Kass/W03 P416 Lecture 5 l Suppose we are trying to measure the true value of some quantity (x T ). u We make repeated measurements of this quantity.
Numerical Analysis – Data Fitting Hanyang University Jong-Il Park.
Computacion Inteligente Least-Square Methods for System Identification.
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
Statistical Concepts Basic Principles An Overview of Today’s Class What: Inductive inference on characterizing a population Why : How will doing this allow.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Biointelligence Laboratory, Seoul National University
Chapter 4 Basic Estimation Techniques
Probability Theory and Parameter Estimation I
ICS 280 Learning in Graphical Models
Model Inference and Averaging
Ch3: Model Building through Regression
Department of Civil and Environmental Engineering
Probabilistic Models for Linear Regression
Roberto Battiti, Mauro Brunato
Statistical Learning Dong Liu Dept. EEIS, USTC.
Modelling data and curve fitting
Filtering and State Estimation: Basic Concepts
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
ESTIMATION METHODS We know how to calculate confidence intervals for estimates of  and 2 Now, we need procedures to calculate  and 2 , themselves.
Introduction to Sensor Interpretation
Introduction to Sensor Interpretation
Presentation transcript:

July 11, 2006Bayesian Inference and Maximum Entropy Probing the covariance matrix Kenneth M. Hanson T-16, Nuclear Physics; Theoretical Division Los Alamos National Laboratory This presentation available at LA-UR-06-xxxx Bayesian Inference and Maximum Entropy Workshop, July 9-13, 2006

July 11, 2006Bayesian Inference and Maximum Entropy Overview Analogy between minus-log-probability and a physical potential Gaussian approximation Probing the covariance matrix with an external force ► deterministic technique to replace stochastic calculations Examples Potential applications

July 11, 2006Bayesian Inference and Maximum Entropy Analogy to physical system Analogy between minus-log-posterior and a physical potential ► a represents parameters d represents data I represents background information, essential for modeling Gradient ∂ a φ corresponds to forces acting on the parameters Maximum a posteriori (MAP) estimates parameters â MAP ► condition is ∂ a φ = 0 ► optimized model may be interpreted as mechanical system in equilibrium – net force on each parameter is zero This analogy is very useful for Bayesian inference ► conceptualization ► developing algorithms

July 11, 2006Bayesian Inference and Maximum Entropy Gaussian approximation Posterior distribution is very often well approximated by a Gaussian Then, φ is quadratic in perturbations in the model parameters a – â = δa from the minimum in φ at â: where K is the φ curvature matrix (aka Hessian); Uncertainties in the estimated parameters is summarized by the covariance matrix: Inference process reduced to finding â and C

July 11, 2006Bayesian Inference and Maximum Entropy External force Consider applying an constant external force to the parameters Effect is to add a linearly increasing piece to potential Gradient of perturbed potential is At the new minimum, gradient is zero, or Displacement of minimum a is proportional to covariance matrix times the force With the external force, one may “probe” the covariance

July 11, 2006Bayesian Inference and Maximum Entropy Effect of external force Displacement of minimizer of φ is in direction different than applied force ► its direction is affected by covariance matrix 2-D parameter space Force, f Displacement, δa a b φ contour

July 11, 2006Bayesian Inference and Maximum Entropy Fit straight line to data Linear model: Simulate 10 data points, exact values: Determine parameters, intercept a and slope b, by minimizing chi- squared (standard least-squares analysis) Result: Strong correlations between a and b Best fit10 data points Scatter plot

July 11, 2006Bayesian Inference and Maximum Entropy Apply force to solution Apply upward force to solution line at x = 0 and find new minimum in φ Effect is to pull line upward at x = 0 and reduce its slope ► data constrain solution Conclude that parameters a (intercept) and b (slope) are anti-correlated Furthermore, these relationships yield quantified results Pull upward on line

July 11, 2006Bayesian Inference and Maximum Entropy Straight line fit Family of lines for forces applied upward at x = 0: f = ± 1, 2 σ a -1 Upward force at x = 0 f = ± 1, 2 σ a -1

July 11, 2006Bayesian Inference and Maximum Entropy Straight line fit Family of lines for forces applied upward at x = 0 Plot on top shows ► perturbations proportional to f ► slope of δa = σ a -2 = C aa ► slope of δb = C ab Plot below shows φ (or χ 2 ) is quadratic function of force ► for force of f = ± σ a -1 ; min φ increases by 0.5, or min χ 2 increases by 1 Either dependence provides way to quantify variance f at x = 0

July 11, 2006Bayesian Inference and Maximum Entropy Simple spectrum Simulate a simple spectrum: ► Gaussian peak (ampl = 2, w = 0.2) ► quadratic background ► add random noise (rmsdev = 0.2) Fit involves 6 parameters ► nonlinear problem ► results: parameters of interest ampl. width ► fair degree of correlation

July 11, 2006Bayesian Inference and Maximum Entropy Simple spectrum – apply force to area To probe area under Gaussian peak, apply force appropriate to area Force should be proportional to derivatives of area wrt parameters, a = amplitude, w = rms width: Plot shows result of applying force to these two parameters in this proportion

July 11, 2006Bayesian Inference and Maximum Entropy Simple spectrum Plots for +/– forces applied to area Plot below shows nonlinear response, but approximately linear for small f ► slope at 0 is σ A -2 ► φ increases by 0.5 for | f | = σ A -1 Other displacements give covariance wrt area f = 3.4σ A -1 f = - 8σ A -1

July 11, 2006Bayesian Inference and Maximum Entropy Tomographic reconstruction from two views Problem - reconstruct uniform-density object from two projections ► 2 orthogonal, parallel projections (128 samples in each view) ► Gaussian noise added Original object Two orthogonal projections with 5% rms noise

July 11, 2006Bayesian Inference and Maximum Entropy The Bayes Inference Engine BIE data-flow diagram to find max. a posteriori (MAP) solution ► 0ptimizer uses gradients that are efficiently calculated by adjoint differentiation, a key capability of the BIE Boundary description Input projections

July 11, 2006Bayesian Inference and Maximum Entropy MAP reconstruction – two views Model object in terms of: ► deformable polygonal boundary with 50 vertices ► smoothness constraint ► constant interior density Determine boundary that maximizes posterior probability Not perfect, but very good for only two projections Question is: How do we quantify uncertainty in reconstruction? Reconstructed boundary (gray-scale) compared with original object (red line)

July 11, 2006Bayesian Inference and Maximum Entropy Tomographic reconstruction from two views Stiffness of model proportional to curvature of  Displacement obtained by applying a force to MAP model and re-minimizing  is proportional to a row of the covariance matrix Displacement divided by force ► at position of force is proportional to variance there ► elsewhere, proportional to covariance Applying force (white bar) to MAP boundary (red) moves it to new location (yellow-dashed)

July 11, 2006Bayesian Inference and Maximum Entropy Situations where probing covariance useful Technique will be most useful when ► posterior can be well approximated by Gaussian pdf in parameters ► interest is in uncertainty of one or a few quantities, but there are many parameters ► optimization easy to do ► gradient calculation can be done efficiently, e.g. by adjoint differentiation of the forward simulation code ► self-optimizing natural systems (populations, bacteria, traffic) May be useful in contexts other than probabilistic inference where Gaussian pdfs are used

July 11, 2006Bayesian Inference and Maximum Entropy Summary Technique has been presented ► based on interpreting minus-log-posterior as physical potential ► probe covariance matrix by applying force to estimated model ► stochastic calculation replaced by deterministic one ► may be related to fluctuation-dissipation relation from statistical mechanics

July 11, 2006Bayesian Inference and Maximum Entropy Bibliography ► "The hard truth," K. M. Hanson and G. S. Cunningham, Maximum Entropy and Bayesian Methods, J. Skilling and S. Sibisi, eds., pp (Kluwer Academic, Dordrecht, 1996) ► Uncertainty assessment for reconstructions based on deformable models," K. M. Hanson et al., Int. J. Imaging Syst. Technol. 8, pp (1997) ► "Operation of the Bayes Inference Engine," K. M. Hanson and G. S. Cunningham, Maximum Entropy and Bayesian Methods, W. von der Linden et al., eds., pp (Kluwer Academic, Dordrecht, 1999) ► “Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation,” A. Griewank (SIAM, 2000) This presentation available at

July 11, 2006Bayesian Inference and Maximum Entropy Likelihood analysis – chi squared When the errors in each measurement are Gaussian distributed and independent, likelihood is related to chi squared: near minimum, χ 2 is approximately quadratic in the parameters a ► where â is the parameter vector at minimum χ 2 and K is the χ 2 curvature matrix (aka the Hessian) The covariance matrix for the uncertainties in the estimated parameters is

July 11, 2006Bayesian Inference and Maximum Entropy Apply force to solution Apply upward force to solution line at x = 0: f = σ a -1 = Pull upward on line

July 11, 2006Bayesian Inference and Maximum Entropy example Apply force to just amplitude, a