G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

Probability and Statistics Basic concepts II (from a physicist point of view) Benoit CLEMENT – Université J. Fourier / LPSC
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #21.
Statistical Data Analysis Stat 3: p-values, parameter estimation
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem, random variables, pdfs 2Functions.
G. Cowan Statistics for HEP / NIKHEF, December 2011 / Lecture 3 1 Statistical Methods for Particle Physics Lecture 3: Limits for Poisson mean: Bayesian.
G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1 Statistical Data Analysis: Lecture 2 1Probability, Bayes’ theorem 2Random variables and.
1 Nuisance parameters and systematic uncertainties Glen Cowan Royal Holloway, University of London IoP Half.
G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
7. Least squares 7.1 Method of least squares K. Desch – Statistical methods of data analysis SS10 Another important method to estimate parameters Connection.
G. Cowan 2011 CERN Summer Student Lectures on Statistics / Lecture 41 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability.
G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 1 Statistical Methods for Particle Physics (2) CERN-FNAL.
G. Cowan Lectures on Statistical Data Analysis Lecture 14 page 1 Statistical Data Analysis: Lecture 14 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Discovery and limits / DESY, 4-7 October 2011 / Lecture 1 1 Statistical Methods for Discovery and Limits Lecture 1: Introduction, Statistical.
G. Cowan Lectures on Statistical Data Analysis Lecture 13 page 1 Statistical Data Analysis: Lecture 13 1Probability, Bayes’ theorem 2Random variables and.
Rao-Cramer-Frechet (RCF) bound of minimum variance (w/o proof) Variance of an estimator of single parameter is limited as: is called “efficient” when the.
G. Cowan Statistics for HEP / NIKHEF, December 2011 / Lecture 1 1 Statistical Methods for Particle Physics Lecture 1: Introduction, Statistical Tests,
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
G. Cowan Discovery and limits / DESY, 4-7 October 2011 / Lecture 3 1 Statistical Methods for Discovery and Limits Lecture 3: Limits for Poisson mean: Bayesian.
G. Cowan RHUL Physics Bayesian Higgs combination page 1 Bayesian Higgs combination using shapes ATLAS Statistics Meeting CERN, 19 December, 2007 Glen Cowan.
Lecture II-2: Probability Review
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
880.P20 Winter 2006 Richard Kass 1 Confidence Intervals and Upper Limits Confidence intervals (CI) are related to confidence limits (CL). To calculate.
880.P20 Winter 2006 Richard Kass 1 Maximum Likelihood Method (MLM) Does this procedure make sense? The MLM answers this question and provides a method.
Model Inference and Averaging
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
G. Cowan 2009 CERN Summer Student Lectures on Statistics1 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability densities,
G. Cowan Lectures on Statistical Data Analysis Lecture 3 page 1 Lecture 3 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
G. Cowan Lectures on Statistical Data Analysis Lecture 1 page 1 Lectures on Statistical Data Analysis London Postgraduate Lectures on Particle Physics;
Uniovi1 Some statistics books, papers, etc. G. Cowan, Statistical Data Analysis, Clarendon, Oxford, 1998 see also R.J. Barlow,
G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 41 Statistics for the LHC Lecture 4: Bayesian methods and further topics Academic.
G. Cowan Lectures on Statistical Data Analysis Lecture 4 page 1 Statistical Data Analysis: Lecture 4 1Probability, Bayes’ theorem 2Random variables and.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
G. Cowan Lectures on Statistical Data Analysis Lecture 8 page 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem 2Random variables and.
1 Introduction to Statistics − Day 3 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
G. Cowan TAE 2010 / Statistics for HEP / Lecture 11 Statistical Methods for HEP Lecture 1: Introduction Taller de Altas Energías 2010 Universitat de Barcelona.
G. Cowan Lectures on Statistical Data Analysis Lecture 4 page 1 Lecture 4 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
G. Cowan Computing and Statistical Data Analysis / Stat 9 1 Computing and Statistical Data Analysis Stat 9: Parameter Estimation, Limits London Postgraduate.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
1 Introduction to Statistics − Day 2 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 1 Statistical Data Analysis: Lecture 6 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1 Statistical Data Analysis: Lecture 5 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1 Lecture 5 1 Probability Definition, Bayes’ theorem, probability densities and their properties,
G. Cowan Statistical methods for HEP / Freiburg June 2011 / Lecture 1 1 Statistical Methods for Discovery and Limits in HEP Experiments Day 1: Introduction,
G. Cowan INFN School of Statistics, Vietri Sul Mare, 3-7 June Statistical Methods (Lectures 2.1, 2.2) INFN School of Statistics Vietri sul Mare,
G. Cowan SOS 2010 / Statistical Tests and Limits -- lecture 31 Statistical Tests and Limits Lecture 3 IN2P3 School of Statistics Autrans, France 17—21.
Model Inference and Averaging
Lecture 4 1 Probability (90 min.)
TAE 2018 / Statistics Lecture 1
Graduierten-Kolleg RWTH Aachen February 2014 Glen Cowan
Statistical Data Analysis Stat 3: p-values, parameter estimation
Statistical Methods: Parameter Estimation
Computing and Statistical Data Analysis / Stat 8
Lecture 3 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Terascale Statistics School DESY, February, 2018
Lecture 4 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Statistics for Particle Physics Lecture 3: Parameter Estimation
Statistical Tests and Limits Lecture 2: Limits
Computing and Statistical Data Analysis / Stat 7
Introduction to Unfolding
Introduction to Statistics − Day 4
Lecture 4 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Computing and Statistical Data Analysis / Stat 10
Presentation transcript:

G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and probability densities 3Expectation values, error propagation 4Catalogue of pdfs 5The Monte Carlo method 6Statistical tests: general concepts 7Test statistics, multivariate methods 8Goodness-of-fit tests 9Parameter estimation, maximum likelihood 10More maximum likelihood 11Method of least squares 12Interval estimation, setting limits 13Nuisance parameters, systematic uncertainties 14Examples of Bayesian approach

G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 2 Information inequality for n parameters Suppose we have estimated n parameters The (inverse) minimum variance bound is given by the Fisher information matrix: The information inequality then states that V  I  is a positive semi-definite matrix, where Therefore Often use I  as an approximation for covariance matrix, estimate using e.g. matrix of 2nd derivatives at maximum of L.

G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 3 Example of ML with 2 parameters Consider a scattering angle distribution with x = cos , or if x min < x < x max, need always to normalize so that Example:  = 0.5,  = 0.5, x min =  0.95, x max = 0.95, generate n = 2000 events with Monte Carlo.

G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 4 Example of ML with 2 parameters: fit result Finding maximum of ln L( ,  ) numerically ( MINUIT ) gives N.B. No binning of data for fit, but can compare to histogram for goodness-of-fit (e.g. ‘visual’ or  2 ). (Co)variances from ( MINUIT routine HESSE )

G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 5 Two-parameter fit: MC study Repeat ML fit with 500 experiments, all with n = 2000 events: Estimates average to ~ true values; (Co)variances close to previous estimates; marginal pdfs approximately Gaussian.

G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 6 The ln L max  1/2 contour For large n, ln L takes on quadratic form near maximum: The contour is an ellipse:

G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 7 (Co)variances from ln L contour → Tangent lines to contours give standard deviations. → Angle of ellipse  related to correlation: Correlations between estimators result in an increase in their standard deviations (statistical errors). The ,  plane for the first MC data set

G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 8 Extended ML Sometimes regard n not as fixed, but as a Poisson r.v., mean. Result of experiment defined as: n, x 1,..., x n. The (extended) likelihood function is: Suppose theory gives = (  ), then the log-likelihood is where C represents terms not depending on .

G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 9 Extended ML (2) Extended ML uses more info → smaller errors for Example: expected number of events where the total cross section  (  ) is predicted as a function of the parameters of a theory, as is the distribution of a variable x. If does not depend on  but remains a free parameter, extended ML gives: Important e.g. for anomalous couplings in e  e  → W + W 

G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 10 Extended ML example Consider two types of events (e.g., signal and background) each of which predict a given pdf for the variable x: f s (x) and f b (x). We observe a mixture of the two event types, signal fraction = , expected total number =, observed total number = n. Let goal is to estimate  s,  b. →

G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 11 Extended ML example (2) Maximize log-likelihood in terms of  s and  b : Monte Carlo example with combination of exponential and Gaussian: Here errors reflect total Poisson fluctuation as well as that in proportion of signal/background.

G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 12 Extended ML example: an unphysical estimate A downwards fluctuation of data in the peak region can lead to even fewer events than what would be obtained from background alone. Estimate for  s here pushed negative (unphysical). We can let this happen as long as the (total) pdf stays positive everywhere.

G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 13 Unphysical estimators (2) Here the unphysical estimator is unbiased and should nevertheless be reported, since average of a large number of unbiased estimates converges to the true value (cf. PDG). Repeat entire MC experiment many times, allow unphysical estimates:

G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 14 ML with binned data Often put data into a histogram: Hypothesis iswhere If we model the data as multinomial (n tot constant), then the log-likelihood function is:

G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 15 ML example with binned data Previous example with exponential, now put data into histogram: Limit of zero bin width → usual unbinned ML. If n i treated as Poisson, we get extended log-likelihood:

G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 16 Relationship between ML and Bayesian estimators In Bayesian statistics, both  and x are random variables: Recall the Bayesian method: Use subjective probability for hypotheses (  ); before experiment, knowledge summarized by prior pdf  (  ); use Bayes’ theorem to update prior in light of data: Posterior pdf (conditional pdf for  given x)

G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 17 ML and Bayesian estimators (2) Purist Bayesian: p(  | x) contains all knowledge about . Pragmatist Bayesian: p(  | x) could be a complicated function, → summarize using an estimator Take mode of p(  | x), (could also use e.g. expectation value) What do we use for  (  )? No golden rule (subjective!), often represent ‘prior ignorance’ by  (  ) = constant, in which case But... we could have used a different parameter, e.g., = 1/ , and if prior   (  ) is constant, then  ( ) is not! ‘Complete prior ignorance’ is not well defined.

G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 18 Wrapping up lecture 10 We’ve now seen several examples of the method of Maximum Likelihood: multiparameter case variable sample size (extended ML) histogram-based data and we’ve seen the connection between ML and Bayesian parameter estimation. Next we will consider a special case of ML with Gaussian data and show how this leads to the method of Least Squares.

G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 19 Extra slides

G. Cowan Lectures on Statistical Data Analysis 20 Priors from formal rules Because of difficulties in encoding a vague degree of belief in a prior, one often attempts to derive the prior from formal rules, e.g., to satisfy certain invariance principles or to provide maximum information gain for a certain set of measurements. Often called “objective priors” Form basis of Objective Bayesian Statistics The priors do not reflect a degree of belief (but might represent possible extreme cases). In a Subjective Bayesian analysis, using objective priors can be an important part of the sensitivity analysis.

G. Cowan Lectures on Statistical Data Analysis 21 Priors from formal rules (cont.) In Objective Bayesian analysis, can use the intervals in a frequentist way, i.e., regard Bayes’ theorem as a recipe to produce an interval with certain coverage properties. For a review see: Formal priors have not been widely used in HEP, but there is recent interest in this direction; see e.g. L. Demortier, S. Jain and H. Prosper, Reference priors for high energy physics, arxiv: (Feb 2010)

G. Cowan Lectures on Statistical Data Analysis 22 Jeffreys’ prior According to Jeffreys’ rule, take prior according to where is the Fisher information matrix. One can show that this leads to inference that is invariant under a transformation of parameters. For a Gaussian mean, the Jeffreys’ prior is constant; for a Poisson mean  it is proportional to 1/√ .

G. Cowan Lectures on Statistical Data Analysis 23 Jeffreys’ prior for Poisson mean Suppose n ~ Poisson(  ). To find the Jeffreys’ prior for , So e.g. for  = s + b, this means the prior  (s) ~ 1/√(s + b), which depends on b. But this is not designed as a degree of belief about s.