Computing and Statistical Data Analysis / Stat 7

Slides:



Advertisements
Similar presentations
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
Advertisements

1 Methods of Experimental Particle Physics Alexei Safonov Lecture #21.
Integration of sensory modalities
Visual Recognition Tutorial
Statistical Data Analysis Stat 3: p-values, parameter estimation
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem, random variables, pdfs 2Functions.
SUMS OF RANDOM VARIABLES Changfei Chen. Sums of Random Variables Let be a sequence of random variables, and let be their sum:
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
Visual Recognition Tutorial
7. Least squares 7.1 Method of least squares K. Desch – Statistical methods of data analysis SS10 Another important method to estimate parameters Connection.
G. Cowan 2011 CERN Summer Student Lectures on Statistics / Lecture 41 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability.
Copyright © Cengage Learning. All rights reserved. 6 Point Estimation.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Rao-Cramer-Frechet (RCF) bound of minimum variance (w/o proof) Variance of an estimator of single parameter is limited as: is called “efficient” when the.
G. Cowan RHUL Physics Bayesian Higgs combination page 1 Bayesian Higgs combination using shapes ATLAS Statistics Meeting CERN, 19 December, 2007 Glen Cowan.
Maximum likelihood (ML)
Lecture II-2: Probability Review
1 10. Joint Moments and Joint Characteristic Functions Following section 6, in this section we shall introduce various parameters to compactly represent.
880.P20 Winter 2006 Richard Kass 1 Confidence Intervals and Upper Limits Confidence intervals (CI) are related to confidence limits (CL). To calculate.
880.P20 Winter 2006 Richard Kass 1 Maximum Likelihood Method (MLM) Does this procedure make sense? The MLM answers this question and provides a method.
G. Cowan 2009 CERN Summer Student Lectures on Statistics1 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability densities,
G. Cowan Lectures on Statistical Data Analysis Lecture 3 page 1 Lecture 3 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
G. Cowan Computing and Statistical Data Analysis / Stat 2 1 Computing and Statistical Data Analysis Stat 2: Catalogue of pdfs London Postgraduate Lectures.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
G. Cowan Lectures on Statistical Data Analysis Lecture 1 page 1 Lectures on Statistical Data Analysis London Postgraduate Lectures on Particle Physics;
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
G. Cowan Lectures on Statistical Data Analysis Lecture 8 page 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem 2Random variables and.
1 Introduction to Statistics − Day 3 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
G. Cowan TAE 2010 / Statistics for HEP / Lecture 11 Statistical Methods for HEP Lecture 1: Introduction Taller de Altas Energías 2010 Universitat de Barcelona.
Joint Moments and Joint Characteristic Functions.
G. Cowan Computing and Statistical Data Analysis / Stat 9 1 Computing and Statistical Data Analysis Stat 9: Parameter Estimation, Limits London Postgraduate.
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Learning Theory Reza Shadmehr Distribution of the ML estimates of model parameters Signal dependent noise models.
G. Cowan Statistical methods for HEP / Freiburg June 2011 / Lecture 1 1 Statistical Methods for Discovery and Limits in HEP Experiments Day 1: Introduction,
Presentation : “ Maximum Likelihood Estimation” Presented By : Jesu Kiran Spurgen Date :
Statistics for HEP Lecture 1: Introduction and basic formalism
Estimator Properties and Linear Least Squares
Copyright © Cengage Learning. All rights reserved.
Visual Recognition Tutorial
12. Principles of Parameter Estimation
Lecture 28 Section 8.3 Fri, Mar 4, 2005
Probability Theory and Parameter Estimation I
Lecture 4 1 Probability (90 min.)
TAE 2018 / Statistics Lecture 1
Graduierten-Kolleg RWTH Aachen February 2014 Glen Cowan
TAE 2017 / Statistics Lecture 1
Statistical Data Analysis Stat 3: p-values, parameter estimation
Statistical Methods: Parameter Estimation
Computing and Statistical Data Analysis / Stat 8
EC 331 The Theory of and applications of Maximum Likelihood Method
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Computing and Statistical Data Analysis Stat 3: The Monte Carlo Method
10701 / Machine Learning Today: - Cross validation,
Lecture 3 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Integration of sensory modalities
Terascale Statistics School DESY, February, 2018
Lecture 4 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Statistics for Particle Physics Lecture 3: Parameter Estimation
Computing and Statistical Data Analysis / Stat 6
Introduction to Unfolding
Decomposition of Stat/Sys Errors
Parametric Methods Berlin Chen, 2005 References:
Introduction to Statistics − Day 4
Lecture 4 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Mathematical Foundations of BME Reza Shadmehr
12. Principles of Parameter Estimation
Computing and Statistical Data Analysis / Stat 10
Presentation transcript:

Computing and Statistical Data Analysis / Stat 7 Computing and Statistical Data Analysis Stat 7: Parameter Estimation, ML, LS London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan Course web page: www.pp.rhul.ac.uk/~cowan/stat_course.html G. Cowan Computing and Statistical Data Analysis / Stat 7

The likelihood function Suppose the entire result of an experiment (set of measurements) is a collection of numbers x, and suppose the joint pdf for the data x is a function that depends on a set of parameters q: Now evaluate this function with the data obtained and regard it as a function of the parameter(s). This is the likelihood function: (x constant) G. Cowan Computing and Statistical Data Analysis / Stat 7

The likelihood function for i.i.d.*. data * i.i.d. = independent and identically distributed Consider n independent observations of x: x1, ..., xn, where x follows f (x; q). The joint pdf for the whole data sample is: In this case the likelihood function is (xi constant) G. Cowan Computing and Statistical Data Analysis / Stat 7

Maximum likelihood estimators If the hypothesized q is close to the true value, then we expect a high probability to get data like that which we actually found. So we define the maximum likelihood (ML) estimator(s) to be the parameter value(s) for which the likelihood is maximum. ML estimators not guaranteed to have any ‘optimal’ properties, (but in practice they’re very good). G. Cowan Computing and Statistical Data Analysis / Stat 7

ML example: parameter of exponential pdf Consider exponential pdf, and suppose we have i.i.d. data, The likelihood function is The value of t for which L(t) is maximum also gives the maximum value of its logarithm (the log-likelihood function): G. Cowan Computing and Statistical Data Analysis / Stat 7

ML example: parameter of exponential pdf (2) Find its maximum by setting → Monte Carlo test: generate 50 values using t = 1: We find the ML estimate: G. Cowan Computing and Statistical Data Analysis / Stat 7

Functions of ML estimators Suppose we had written the exponential pdf as i.e., we use l = 1/t. What is the ML estimator for l? For a function a(q) of a parameter q, it doesn’t matter whether we express L as a function of a or q. The ML estimator of a function a(q) is simply So for the decay constant we have Caveat: is biased, even though is unbiased. Can show (bias →0 for n →∞) G. Cowan Computing and Statistical Data Analysis / Stat 7

Example of ML: parameters of Gaussian pdf Consider independent x1, ..., xn, with xi ~ Gaussian (m,s2) The log-likelihood function is G. Cowan Computing and Statistical Data Analysis / Stat 7

Example of ML: parameters of Gaussian pdf (2) Set derivatives with respect to m, s2 to zero and solve, We already know that the estimator for m is unbiased. But we find, however, so ML estimator for s2 has a bias, but b→0 for n→∞. Recall, however, that is an unbiased estimator for s2. G. Cowan Computing and Statistical Data Analysis / Stat 7

Variance of estimators: Monte Carlo method Having estimated our parameter we now need to report its ‘statistical error’, i.e., how widely distributed would estimates be if we were to repeat the entire measurement many times. One way to do this would be to simulate the entire experiment many times with a Monte Carlo program (use ML estimate for MC). For exponential example, from sample variance of estimates we find: Note distribution of estimates is roughly Gaussian − (almost) always true for ML in large sample limit. G. Cowan Computing and Statistical Data Analysis / Stat 7

Variance of estimators from information inequality The information inequality (RCF) sets a lower bound on the variance of any estimator (not only ML): Minimum Variance Bound (MVB) Often the bias b is small, and equality either holds exactly or is a good approximation (e.g. large data sample limit). Then, Estimate this using the 2nd derivative of ln L at its maximum: G. Cowan Computing and Statistical Data Analysis / Stat 7

Variance of estimators: graphical method Expand ln L (q) about its maximum: First term is ln Lmax, second term is zero, for third term use information inequality (assume equality): i.e., → to get , change q away from until ln L decreases by 1/2. G. Cowan Computing and Statistical Data Analysis / Stat 7

Example of variance by graphical method ML example with exponential: Not quite parabolic ln L since finite sample size (n = 50). G. Cowan Computing and Statistical Data Analysis / Stat 7

Information inequality for n parameters Suppose we have estimated n parameters The (inverse) minimum variance bound is given by the Fisher information matrix: The information inequality then states that V - I-1 is a positive semi-definite matrix, where Therefore Often use I-1 as an approximation for covariance matrix, estimate using e.g. matrix of 2nd derivatives at maximum of L. G. Cowan Computing and Statistical Data Analysis / Stat 7

Example of ML with 2 parameters Consider a scattering angle distribution with x = cos q, or if xmin < x < xmax, need always to normalize so that Example: a = 0.5, b = 0.5, xmin = -0.95, xmax = 0.95, generate n = 2000 events with Monte Carlo. G. Cowan Computing and Statistical Data Analysis / Stat 7

Example of ML with 2 parameters: fit result Finding maximum of ln L(a, b) numerically (MINUIT) gives N.B. No binning of data for fit, but can compare to histogram for goodness-of-fit (e.g. ‘visual’ or c2). (Co)variances from (MINUIT routine HESSE) G. Cowan Computing and Statistical Data Analysis / Stat 7

Two-parameter fit: MC study Repeat ML fit with 500 experiments, all with n = 2000 events: Estimates average to ~ true values; (Co)variances close to previous estimates; marginal pdfs approximately Gaussian. G. Cowan Computing and Statistical Data Analysis / Stat 7

Computing and Statistical Data Analysis / Stat 7 The ln Lmax - 1/2 contour For large n, ln L takes on quadratic form near maximum: The contour is an ellipse: G. Cowan Computing and Statistical Data Analysis / Stat 7

(Co)variances from ln L contour The a, b plane for the first MC data set → Tangent lines to contours give standard deviations. → Angle of ellipse f related to correlation: Correlations between estimators result in an increase in their standard deviations (statistical errors). G. Cowan Computing and Statistical Data Analysis / Stat 7

Information inequality for n parameters Suppose we have estimated n parameters The (inverse) minimum variance bound is given by the Fisher information matrix: The information inequality then states that V - I-1 is a positive semi-definite matrix, where Therefore Often use I-1 as an approximation for covariance matrix, estimate using e.g. matrix of 2nd derivatives at maximum of L. G. Cowan Computing and Statistical Data Analysis / Stat 7

Computing and Statistical Data Analysis / Stat 7 Extended ML Sometimes regard n not as fixed, but as a Poisson r.v., mean n. Result of experiment defined as: n, x1, ..., xn. The (extended) likelihood function is: Suppose theory gives n = n(q), then the log-likelihood is where C represents terms not depending on q. G. Cowan Computing and Statistical Data Analysis / Stat 7

Computing and Statistical Data Analysis / Stat 7 Extended ML (2) Example: expected number of events where the total cross section s(q) is predicted as a function of the parameters of a theory, as is the distribution of a variable x. Extended ML uses more info → smaller errors for Important e.g. for anomalous couplings in e+e- → W+W- If n does not depend on q but remains a free parameter, extended ML gives: G. Cowan Computing and Statistical Data Analysis / Stat 7

Computing and Statistical Data Analysis / Stat 7 Extended ML example Consider two types of events (e.g., signal and background) each of which predict a given pdf for the variable x: fs(x) and fb(x). We observe a mixture of the two event types, signal fraction = q, expected total number = n, observed total number = n. Let goal is to estimate ms, mb. → G. Cowan Computing and Statistical Data Analysis / Stat 7

Computing and Statistical Data Analysis / Stat 7 Extended ML example (2) Monte Carlo example with combination of exponential and Gaussian: Maximize log-likelihood in terms of ms and mb: Here errors reflect total Poisson fluctuation as well as that in proportion of signal/background. G. Cowan Computing and Statistical Data Analysis / Stat 7