Computing and Statistical Data Analysis / Stat 10

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

1 Statistical Methods in HEP, in particular... Nuisance parameters and systematic uncertainties Glen Cowan Royal Holloway, University of London
1 Bayesian statistical methods for parton analyses Glen Cowan Royal Holloway, University of London DIS 2006.
G. Cowan Statistics for HEP / LAL Orsay, 3-5 January 2012 / Lecture 2 1 Statistical Methods for Particle Physics Lecture 2: Tests based on likelihood ratios.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem, random variables, pdfs 2Functions.
G. Cowan Statistics for HEP / NIKHEF, December 2011 / Lecture 3 1 Statistical Methods for Particle Physics Lecture 3: Limits for Poisson mean: Bayesian.
1 Nuisance parameters and systematic uncertainties Glen Cowan Royal Holloway, University of London IoP Half.
G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan 2011 CERN Summer Student Lectures on Statistics / Lecture 41 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability.
G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 1 Statistical Methods for Particle Physics (2) CERN-FNAL.
G. Cowan Lectures on Statistical Data Analysis Lecture 14 page 1 Statistical Data Analysis: Lecture 14 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 13 page 1 Statistical Data Analysis: Lecture 13 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan RHUL Physics Moriond QCD 2007 page 1 Bayesian analysis and problems with the frequentist approach Rencontres de Moriond (QCD) La Thuile,
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
G. Cowan Discovery and limits / DESY, 4-7 October 2011 / Lecture 3 1 Statistical Methods for Discovery and Limits Lecture 3: Limits for Poisson mean: Bayesian.
G. Cowan RHUL Physics Bayesian Higgs combination page 1 Bayesian Higgs combination using shapes ATLAS Statistics Meeting CERN, 19 December, 2007 Glen Cowan.
G. Cowan SUSSP65, St Andrews, August 2009 / Statistical Methods 1 page 1 Statistical Methods in Particle Physics Lecture 1: Bayesian methods SUSSP65.
G. Cowan 2009 CERN Summer Student Lectures on Statistics1 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability densities,
G. Cowan Lectures on Statistical Data Analysis Lecture 3 page 1 Lecture 3 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
G. Cowan Lectures on Statistical Data Analysis Lecture 1 page 1 Lectures on Statistical Data Analysis London Postgraduate Lectures on Particle Physics;
G. Cowan RHUL Physics Bayesian Higgs combination page 1 Bayesian Higgs combination based on event counts (follow-up from 11 May 07) ATLAS Statistics Forum.
G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 41 Statistics for the LHC Lecture 4: Bayesian methods and further topics Academic.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #24.
G. Cowan Lectures on Statistical Data Analysis Lecture 8 page 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem 2Random variables and.
1 Introduction to Statistics − Day 3 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
G. Cowan Lectures on Statistical Data Analysis Lecture 4 page 1 Lecture 4 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
G. Cowan CERN-JINR 2009 Summer School / Topics in Statistical Data Analysis page 1 Topics in Statistical Data Analysis for HEP Lecture 1: Bayesian Methods.
G. Cowan Computing and Statistical Data Analysis / Stat 9 1 Computing and Statistical Data Analysis Stat 9: Parameter Estimation, Limits London Postgraduate.
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Statistical Methods in Particle Physics1 Statistical Methods in Particle Physics Day 5: Bayesian Methods 清华大学高能物理研究中心 2010 年 4 月 12—16 日 Glen.
G. Cowan SUSSP65, St Andrews, August 2009 / Statistical Methods 1 page 1 Statistical Methods in Particle Physics Lecture 1: Bayesian methods SUSSP65.
G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1 Lecture 5 1 Probability Definition, Bayes’ theorem, probability densities and their properties,
1 Nuisance parameters and systematic uncertainties Glen Cowan Royal Holloway, University of London IoP Half.
G. Cowan SOS 2010 / Statistical Tests and Limits -- lecture 31 Statistical Tests and Limits Lecture 3 IN2P3 School of Statistics Autrans, France 17—21.
G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1 Lecture 5 1 Probability Definition, Bayes’ theorem, probability densities and their properties,
Data Modeling Patrice Koehl Department of Biological Sciences
Statistics for HEP Lecture 1: Introduction and basic formalism
Statistics for HEP Lecture 1: Introduction and basic formalism
Computing and Statistical Data Analysis / Stat 11
Statistics for HEP Lecture 3: More on Discovery and Limits
Statistics for the LHC Lecture 3: Setting limits
Lecture 4 1 Probability (90 min.)
Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.
Graduierten-Kolleg RWTH Aachen February 2014 Glen Cowan
Computing and Statistical Data Analysis / Stat 8
Discussion on fitting and averages including “errors on errors”
Statistical Methods for Particle Physics (II)
TAE 2017 / Statistics Lecture 3
Statistical Methods for Particle Physics Lecture 3: further topics
Computing and Statistical Data Analysis Stat 3: The Monte Carlo Method
HCPSS 2016 / Statistics Lecture 1
Lecture 3 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Statistics for Particle Physics Lecture 2: Statistical Tests
Lecture 4 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Statistics for Particle Physics Lecture 3: Parameter Estimation
TAE 2018 Benasque, Spain 3-15 Sept 2018 Glen Cowan Physics Department
TAE 2018 / Statistics Lecture 4
Statistical Tests and Limits Lecture 2: Limits
Some Prospects for Statistical Data Analysis in High Energy Physics
Computing and Statistical Data Analysis / Stat 7
TAE 2017 / Statistics Lecture 2
Statistics for HEP Lecture 1: Introduction and basic formalism
Introduction to Statistics − Day 4
Lecture 4 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Bayesian statistical methods for parton analyses
Statistical Methods for Particle Physics Lecture 3: further topics
Presentation transcript:

Computing and Statistical Data Analysis / Stat 10 Computing and Statistical Data Analysis Stat 10: Limits, nuisance parameters London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan Course web page: www.pp.rhul.ac.uk/~cowan/stat_course.html G. Cowan Computing and Statistical Data Analysis / Stat 10

Frequentist upper limit on Poisson parameter Consider again the case of observing n ~ Poisson(s + b). Suppose b = 4.5, nobs = 5. Find upper limit on s at 95% CL. Relevant alternative is s = 0 (critical region at low n) p-value of hypothesized s is P(n ≤ nobs; s, b) Upper limit sup at CL = 1 – α found from G. Cowan Computing and Statistical Data Analysis / Stat 10

n ~ Poisson(s+b): frequentist upper limit on s For low fluctuation of n formula can give negative result for sup; i.e. confidence interval is empty. G. Cowan Computing and Statistical Data Analysis / Stat 10

Limits near a physical boundary Suppose e.g. b = 2.5 and we observe n = 0. If we choose CL = 0.9, we find from the formula for sup Physicist: We already knew s ≥ 0 before we started; can’t use negative upper limit to report result of expensive experiment! Statistician: The interval is designed to cover the true value only 90% of the time — this was clearly not one of those times. Not uncommon dilemma when testing parameter values for which one has very little experimental sensitivity, e.g., very small s. G. Cowan Computing and Statistical Data Analysis / Stat 10

Computing and Statistical Data Analysis / Stat 10 Expected limit for s = 0 Physicist: I should have used CL = 0.95 — then sup = 0.496 Even better: for CL = 0.917923 we get sup = 10-4 ! Reality check: with b = 2.5, typical Poisson fluctuation in n is at least √2.5 = 1.6. How can the limit be so low? Look at the mean limit for the no-signal hypothesis (s = 0) (sensitivity). Distribution of 95% CL limits with b = 2.5, s = 0. Mean upper limit = 4.44 G. Cowan Computing and Statistical Data Analysis / Stat 10

The Bayesian approach to limits In Bayesian statistics need to start with ‘prior pdf’ p(q), this reflects degree of belief about q before doing the experiment. Bayes’ theorem tells how our beliefs should be updated in light of the data x: Integrate posterior pdf p(q | x) to give interval with any desired probability content. For e.g. n ~ Poisson(s+b), 95% CL upper limit on s from G. Cowan Computing and Statistical Data Analysis / Stat 10

Bayesian prior for Poisson parameter Include knowledge that s ≥ 0 by setting prior p(s) = 0 for s < 0. Could try to reflect ‘prior ignorance’ with e.g. Not normalized but this is OK as long as L(s) dies off for large s. Not invariant under change of parameter — if we had used instead a flat prior for, say, the mass of the Higgs boson, this would imply a non-flat prior for the expected number of Higgs events. Doesn’t really reflect a reasonable degree of belief, but often used as a point of reference; or viewed as a recipe for producing an interval whose frequentist properties can be studied (coverage will depend on true s). G. Cowan Computing and Statistical Data Analysis / Stat 10

Bayesian interval with flat prior for s Solve to find limit sup: where For special case b = 0, Bayesian upper limit with flat prior numerically same as one-sided frequentist case (‘coincidence’). G. Cowan Computing and Statistical Data Analysis / Stat 10

Bayesian interval with flat prior for s For b > 0 Bayesian limit is everywhere greater than the (one sided) frequentist upper limit. Never goes negative. Doesn’t depend on b if n = 0. G. Cowan Computing and Statistical Data Analysis / Stat 10

Priors from formal rules Because of difficulties in encoding a vague degree of belief in a prior, one often attempts to derive the prior from formal rules, e.g., to satisfy certain invariance principles or to provide maximum information gain for a certain set of measurements. Often called “objective priors” Form basis of Objective Bayesian Statistics The priors do not reflect a degree of belief (but might represent possible extreme cases). In Objective Bayesian analysis, can use the intervals in a frequentist way, i.e., regard Bayes’ theorem as a recipe to produce an interval with certain coverage properties. G. Cowan Computing and Statistical Data Analysis / Stat 10

Priors from formal rules (cont.) For a review of priors obtained by formal rules see, e.g., Formal priors have not been widely used in HEP, but there is recent interest in this direction, especially the reference priors of Bernardo and Berger; see e.g. L. Demortier, S. Jain and H. Prosper, Reference priors for high energy physics, Phys. Rev. D 82 (2010) 034002, arXiv:1002.1111. D. Casadei, Reference analysis of the signal + background model in counting experiments, JINST 7 (2012) 01012; arXiv:1108.4270. G. Cowan Computing and Statistical Data Analysis / Stat 10

Computing and Statistical Data Analysis / Stat 10 Jeffreys’ prior According to Jeffreys’ rule, take prior according to where is the Fisher information matrix. One can show that this leads to inference that is invariant under a transformation of parameters. For a Gaussian mean, the Jeffreys’ prior is constant; for a Poisson mean m it is proportional to 1/√m. G. Cowan Computing and Statistical Data Analysis / Stat 10

Jeffreys’ prior for Poisson mean Suppose n ~ Poisson(m). To find the Jeffreys’ prior for m, So e.g. for m = s + b, this means the prior p(s) ~ 1/√(s + b), which depends on b. But this is not designed as a degree of belief about s. G. Cowan Computing and Statistical Data Analysis / Stat 10

Systematic uncertainties and nuisance parameters L (x|θ) In general our model of the data is not perfect: model: truth: x Can improve model by including additional adjustable parameters. Nuisance parameter ↔ systematic uncertainty. Some point in the parameter space of the enlarged model should be “true”. Presence of nuisance parameter decreases sensitivity of analysis to the parameter of interest (e.g., increases variance of estimate). G. Cowan Computing and Statistical Data Analysis / Stat 10

Example: fitting a straight line Data: Model: yi independent and all follow yi ~ Gauss(μ(xi ), σi ) assume xi and si known. Goal: estimate q0 Here suppose we don’t care about q1 (example of a “nuisance parameter”) G. Cowan Computing and Statistical Data Analysis / Stat 10

Maximum likelihood fit with Gaussian data In this example, the yi are assumed independent, so the likelihood function is a product of Gaussians: Maximizing the likelihood is here equivalent to minimizing i.e., for Gaussian data, ML same as Method of Least Squares (LS) G. Cowan Computing and Statistical Data Analysis / Stat 10

Computing and Statistical Data Analysis / Stat 10 θ1 known a priori For Gaussian yi, ML same as LS Minimize χ2 → estimator Come up one unit from to find G. Cowan Computing and Statistical Data Analysis / Stat 10

Computing and Statistical Data Analysis / Stat 10 ML (or LS) fit of θ0 and θ1 Standard deviations from tangent lines to contour Correlation between causes errors to increase. G. Cowan Computing and Statistical Data Analysis / Stat 10

If we have a measurement t1 ~ Gauss (θ1, σt1) The information on θ1 improves accuracy of G. Cowan Computing and Statistical Data Analysis / Stat 10

Computing and Statistical Data Analysis / Stat 10 Bayesian method We need to associate prior probabilities with q0 and q1, e.g., ‘non-informative’, in any case much broader than ← based on previous measurement Putting this into Bayes’ theorem gives: posterior  likelihood  prior G. Cowan Computing and Statistical Data Analysis / Stat 10

Bayesian method (continued) We then integrate (marginalize) p(q0, q1 | x) to find p(q0 | x): In this example we can do the integral (rare). We find Usually need numerical methods (e.g. Markov Chain Monte Carlo) to do integral. G. Cowan Computing and Statistical Data Analysis / Stat 10

Digression: marginalization with MCMC Bayesian computations involve integrals like often high dimensionality and impossible in closed form, also impossible with ‘normal’ acceptance-rejection Monte Carlo. Markov Chain Monte Carlo (MCMC) has revolutionized Bayesian computation. MCMC (e.g., Metropolis-Hastings algorithm) generates correlated sequence of random numbers: cannot use for many applications, e.g., detector MC; effective stat. error greater than if all values independent . Basic idea: sample multidimensional look, e.g., only at distribution of parameters of interest. G. Cowan Computing and Statistical Data Analysis / Stat 10

MCMC basics: Metropolis-Hastings algorithm Goal: given an n-dimensional pdf generate a sequence of points Proposal density e.g. Gaussian centred about 1) Start at some point 2) Generate 3) Form Hastings test ratio 4) Generate 5) If move to proposed point else old point repeated 6) Iterate G. Cowan Computing and Statistical Data Analysis / Stat 10

Metropolis-Hastings (continued) This rule produces a correlated sequence of points (note how each new point depends on the previous one). For our purposes this correlation is not fatal, but statistical errors larger than if points were independent. The proposal density can be (almost) anything, but choose so as to minimize autocorrelation. Often take proposal density symmetric: Test ratio is (Metropolis-Hastings): I.e. if the proposed step is to a point of higher , take it; if not, only take the step with probability If proposed step rejected, hop in place. G. Cowan Computing and Statistical Data Analysis / Stat 10

Example: posterior pdf from MCMC Sample the posterior pdf from previous example with MCMC: Summarize pdf of parameter of interest with, e.g., mean, median, standard deviation, etc. Although numerical values of answer here same as in frequentist case, interpretation is different (sometimes unimportant?) G. Cowan Computing and Statistical Data Analysis / Stat 10

Bayesian method with alternative priors Suppose we don’t have a previous measurement of q1 but rather, e.g., a theorist says it should be positive and not too much greater than 0.1 "or so", i.e., something like From this we obtain (numerically) the posterior pdf for q0: This summarizes all knowledge about q0. Look also at result from variety of priors. G. Cowan Computing and Statistical Data Analysis / Stat 10