G. Cowan Systematic uncertainties in statistical data analysis page 1 Systematic uncertainties in statistical data analysis for particle physics DESY Seminar.

Slides:



Advertisements
Similar presentations
1 Statistical Methods in HEP, in particular... Nuisance parameters and systematic uncertainties Glen Cowan Royal Holloway, University of London
Advertisements

Using the Profile Likelihood in Searches for New Physics / PHYSTAT 2011 G. Cowan 1 Using the Profile Likelihood in Searches for New Physics arXiv:
Bayesian inference “Very much lies in the posterior distribution” Bayesian definition of sufficiency: A statistic T (x 1, …, x n ) is sufficient for 
1 Bayesian statistical methods for parton analyses Glen Cowan Royal Holloway, University of London DIS 2006.
Bayesian statistics – MCMC techniques
Statistical Data Analysis Stat 3: p-values, parameter estimation
G. Cowan RHUL Physics Profile likelihood for systematic uncertainties page 1 Use of profile likelihood to determine systematic uncertainties ATLAS Top.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem, random variables, pdfs 2Functions.
G. Cowan Statistics for HEP / NIKHEF, December 2011 / Lecture 3 1 Statistical Methods for Particle Physics Lecture 3: Limits for Poisson mean: Bayesian.
1 Nuisance parameters and systematic uncertainties Glen Cowan Royal Holloway, University of London IoP Half.
G. Cowan RHUL Physics Comment on use of LR for limits page 1 Comment on definition of likelihood ratio for limits ATLAS Statistics Forum CERN, 2 September,
G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
G. Cowan 2011 CERN Summer Student Lectures on Statistics / Lecture 41 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability.
G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 1 Statistical Methods for Particle Physics (2) CERN-FNAL.
G. Cowan RHUL Physics Higgs combination note status page 1 Status of Higgs Combination Note ATLAS Statistics/Higgs Meeting Phone, 7 April, 2008 Glen Cowan.
G. Cowan Lectures on Statistical Data Analysis Lecture 14 page 1 Statistical Data Analysis: Lecture 14 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 13 page 1 Statistical Data Analysis: Lecture 13 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan RHUL Physics Moriond QCD 2007 page 1 Bayesian analysis and problems with the frequentist approach Rencontres de Moriond (QCD) La Thuile,
Frequently Bayesian The role of probability in data analysis
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
G. Cowan Discovery and limits / DESY, 4-7 October 2011 / Lecture 3 1 Statistical Methods for Discovery and Limits Lecture 3: Limits for Poisson mean: Bayesian.
G. Cowan RHUL Physics Bayesian Higgs combination page 1 Bayesian Higgs combination using shapes ATLAS Statistics Meeting CERN, 19 December, 2007 Glen Cowan.
Statistical Analysis of Systematic Errors and Small Signals Reinhard Schwienhorst University of Minnesota 10/26/99.
G. Cowan SUSSP65, St Andrews, August 2009 / Statistical Methods 1 page 1 Statistical Methods in Particle Physics Lecture 1: Bayesian methods SUSSP65.
G. Cowan 2009 CERN Summer Student Lectures on Statistics1 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability densities,
G. Cowan Lectures on Statistical Data Analysis Lecture 3 page 1 Lecture 3 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
G. Cowan Lectures on Statistical Data Analysis Lecture 1 page 1 Lectures on Statistical Data Analysis London Postgraduate Lectures on Particle Physics;
G. Cowan Statistical techniques for systematics page 1 Statistical techniques for incorporating systematic/theory uncertainties Theory/Experiment Interplay.
G. Cowan Statistical Methods in Particle Physics1 Statistical Methods in Particle Physics Day 4: Discovery and limits 清华大学高能物理研究中心 2010 年 4 月 12—16 日 Glen.
G. Cowan RHUL Physics Bayesian Higgs combination page 1 Bayesian Higgs combination based on event counts (follow-up from 11 May 07) ATLAS Statistics Forum.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #25.
G. Cowan RHUL Physics page 1 Status of search procedures for ATLAS ATLAS-CMS Joint Statistics Meeting CERN, 15 October, 2009 Glen Cowan Physics Department.
G. Cowan, RHUL Physics Discussion on significance page 1 Discussion on significance ATLAS Statistics Forum CERN/Phone, 2 December, 2009 Glen Cowan Physics.
G. Cowan RHUL Physics LR test to determine number of parameters page 1 Likelihood ratio test to determine best number of parameters ATLAS Statistics Forum.
G. Cowan Terascale Statistics School 2015 / Combination1 Statistical combinations, etc. Terascale Statistics School DESY, Hamburg March 23-27, 2015 Glen.
G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 41 Statistics for the LHC Lecture 4: Bayesian methods and further topics Academic.
G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 1 Input from Statistics Forum for Exotics ATLAS Exotics Meeting CERN/phone, 22 January,
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #24.
G. Cowan Lectures on Statistical Data Analysis Lecture 8 page 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem 2Random variables and.
1 Introduction to Statistics − Day 3 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
G. Cowan Lectures on Statistical Data Analysis Lecture 4 page 1 Lecture 4 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
G. Cowan Computing and Statistical Data Analysis / Stat 9 1 Computing and Statistical Data Analysis Stat 9: Parameter Estimation, Limits London Postgraduate.
G. Cowan, RHUL Physics Statistics for early physics page 1 Statistics jump-start for early physics ATLAS Statistics Forum EVO/Phone, 4 May, 2010 Glen Cowan.
G. Cowan RHUL Physics Status of Higgs combination page 1 Status of Higgs Combination ATLAS Higgs Meeting CERN/phone, 7 November, 2008 Glen Cowan, RHUL.
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Statistical Methods in Particle Physics1 Statistical Methods in Particle Physics Day 5: Bayesian Methods 清华大学高能物理研究中心 2010 年 4 月 12—16 日 Glen.
G. Cowan SUSSP65, St Andrews, August 2009 / Statistical Methods 1 page 1 Statistical Methods in Particle Physics Lecture 1: Bayesian methods SUSSP65.
G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1 Lecture 5 1 Probability Definition, Bayes’ theorem, probability densities and their properties,
G. Cowan RHUL Physics Statistical Issues for Higgs Search page 1 Statistical Issues for Higgs Search ATLAS Statistics Forum CERN, 16 April, 2007 Glen Cowan.
1 Nuisance parameters and systematic uncertainties Glen Cowan Royal Holloway, University of London IoP Half.
G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 21 Statistics for the LHC Lecture 2: Discovery Academic Training Lectures CERN,
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
G. Cowan SLAC Statistics Meeting / 4-6 June 2012 / Two Developments 1 Two developments in discovery tests: use of weighted Monte Carlo events and an improved.
G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1 Lecture 5 1 Probability Definition, Bayes’ theorem, probability densities and their properties,
Discussion on significance
Lecture 4 1 Probability (90 min.)
Graduierten-Kolleg RWTH Aachen February 2014 Glen Cowan
Computing and Statistical Data Analysis / Stat 8
Lecture 3 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Lecture 4 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Statistics for Particle Physics Lecture 3: Parameter Estimation
Computing and Statistical Data Analysis / Stat 7
Lecture 4 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Computing and Statistical Data Analysis / Stat 10
Bayesian statistical methods for parton analyses
Presentation transcript:

G. Cowan Systematic uncertainties in statistical data analysis page 1 Systematic uncertainties in statistical data analysis for particle physics DESY Seminar Hamburg, 31 March, 2009 Glen Cowan Physics Department Royal Holloway, University of London

G. Cowan Systematic uncertainties in statistical data analysis page 2 Outline Preliminaries Role of probability in data analysis (Frequentist, Bayesian) Systematic errors and nuisance parameters A simple fitting problem Frequentist solution / Bayesian solution When does  tot 2 =  stat 2 +  sys 2 make sense? Systematic uncertainties in a search Example of search for Higgs (ATLAS) Examples of nuisance parameters in fits b→s  with recoil method (BaBar) Towards a general strategy for nuisance parameters Conclusions

G. Cowan Systematic uncertainties in statistical data analysis page 3 Data analysis in HEP Particle physics experiments are expensive e.g. LHC, ~ $10 10 (accelerator and experiments) the competition is intense (ATLAS vs. CMS) vs. Tevatron and the stakes are high: 4 sigma effect 5 sigma effect So there is a strong motivation to know precisely whether one's signal is a 4 sigma or 5 sigma effect.

G. Cowan Systematic uncertainties in statistical data analysis page 4 Frequentist vs. Bayesian approaches In frequentist statistics, probabilities are associated only with the data, i.e., outcomes of repeatable observations. Probability = limiting frequency The preferred hypotheses (theories, models, parameter values,...) are those for which our observations would be considered ‘usual’. In Bayesian statistics, interpretation of probability extended to degree of belief (subjective probability). Use Bayes' theorem to relate (posterior) probability for hypothesis H given data x to probability of x given H (the likelihood): Need prior probability,  (H), i.e., before seeing the data.

G. Cowan Systematic uncertainties in statistical data analysis page 5 Statistical vs. systematic errors Statistical errors: How much would the result fluctuate upon repetition of the measurement? Implies some set of assumptions to define probability of outcome of the measurement. Systematic errors: What is the uncertainty in my result due to uncertainty in my assumptions, e.g., model (theoretical) uncertainty; modelling of measurement apparatus. Usually taken to mean the sources of error do not vary upon repetition of the measurement. Often result from uncertain value of calibration constants, efficiencies, etc.

G. Cowan Systematic uncertainties in statistical data analysis page 6 Systematic errors and nuisance parameters Model prediction (including e.g. detector effects) never same as "true prediction" of the theory: x y model: truth: Model can be made to approximate better the truth by including more free parameters. systematic uncertainty ↔ nuisance parameters

G. Cowan Systematic uncertainties in statistical data analysis page 7 Example: fitting a straight line Data: Model: measured y i independent, Gaussian: assume x i and  i known. Goal: estimate  0 (don’t care about  1 ).

G. Cowan Systematic uncertainties in statistical data analysis page 8 Correlation between causes errors to increase. Standard deviations from tangent lines to contour Frequentist approach

G. Cowan Systematic uncertainties in statistical data analysis page 9 The information on  1 improves accuracy of Frequentist case with a measurement t 1 of  1

G. Cowan Systematic uncertainties in statistical data analysis page 10 Bayesian method We need to associate prior probabilities with  0 and  1, e.g., Putting this into Bayes’ theorem gives: posterior Q likelihood  prior ← based on previous measurement reflects ‘prior ignorance’, in any case much broader than

G. Cowan Systematic uncertainties in statistical data analysis page 11 Bayesian method (continued) Usually need numerical methods (e.g. Markov Chain Monte Carlo) to do integral. We then integrate (marginalize) p(  0,  1 | x) to find p(  0 | x): In this example we can do the integral (rare). We find

G. Cowan Systematic uncertainties in statistical data analysis page 12 Bayesian method with alternative priors Suppose we don’t have a previous measurement of  1 but rather, e.g., a theorist says it should be positive and not too much greater than 0.1 "or so", i.e., something like From this we obtain (numerically) the posterior pdf for  0 : This summarizes all knowledge about  0. Look also at result from variety of priors.

G. Cowan Systematic uncertainties in statistical data analysis page 13 A more general fit (symbolic) Given measurements: and (usually) covariances: Predicted value: control variableparametersbias Often take: Minimize Equivalent to maximizing L(  ) » e  2 /2, i.e., least squares same as maximum likelihood using a Gaussian likelihood function. expectation value

G. Cowan Systematic uncertainties in statistical data analysis page 14 Its Bayesian equivalent and use Bayes’ theorem: To get desired probability for , integrate (marginalize) over b: → Posterior is Gaussian with mode same as least squares estimator,    same as from  2 =  2 min + 1. (Back where we started!) Take Joint probability for all parameters

G. Cowan Systematic uncertainties in statistical data analysis page 15 Alternative priors for systematic errors Gaussian prior for the bias b often not realistic, especially if one considers the "error on the error". Incorporating this can give a prior with longer tails: b(b)b(b) Represents ‘error on the error’; standard deviation of  s (s) is  s. b

G. Cowan Systematic uncertainties in statistical data analysis page 16 A simple test Suppose fit effectively averages four measurements. Take  sys =  stat = 0.1, uncorrelated. Case #1: data appear compatible Posterior p(  |y): Usually summarize posterior p(  |y) with mode and standard deviation: experiment measurement  p(|y)p(|y)

G. Cowan Systematic uncertainties in statistical data analysis page 17 Simple test with inconsistent data Case #2: there is an outlier → Bayesian fit less sensitive to outlier. Posterior p(  |y): experiment measurement  p(|y)p(|y) (See also D'Agostini 1999; Dose & von der Linden 1999)

G. Cowan Systematic uncertainties in statistical data analysis page 18 Example of systematics in a search Combination of Higgs search channels (ATLAS) Expected Performance of the ATLAS Experiment: Detector, Trigger and Physics, arXiv: , CERN-OPEN Standard Model Higgs channels considered (more to be used later): H →   H → WW ( * )  → e   H → ZZ ( * ) → 4l (l = e,  ) H →     → ll, lh Used profile likelihood method for systematic uncertainties: background rates, signal & background shapes.

G. Cowan Systematic uncertainties in statistical data analysis page 19 Statistical model for Higgs search Bin i of a given channel has n i events, expectation value is Expected signal and background are:  is global strength parameter, common to all channels.  = 0 means background only,  = 1 is SM hypothesis. b tot,  s,  b are nuisance parameters

G. Cowan Systematic uncertainties in statistical data analysis page 20 The likelihood function  represents all nuisance parameters, e.g., background rate, shapes here signal rate is only parameter of interest The single-channel likelihood function uses Poisson model for events in signal and control histograms: The full likelihood function is There is a likelihood L i ( ,  i ) for each channel, i = 1, …, N. data in signal histogram data in control histogram

G. Cowan Systematic uncertainties in statistical data analysis page 21 Profile likelihood ratio To test hypothesized value of , construct profile likelihood ratio: Maximized L for given  Maximized L Equivalently use q  =  2 ln (  ): data agree well with hypothesized  → q  small data disagree with hypothesized  → q  large Distribution of q  under assumption of  related to chi-square (Wilks' theorem, here approximation valid for roughly L > 2 fb  ):

G. Cowan Systematic uncertainties in statistical data analysis page 22 p-value / significance of hypothesized  Test hypothesized  by giving p-value, probability to see data with ≤ compatibility with  compared to data observed: Equivalently use significance, Z, defined as equivalent number of sigmas for a Gaussian fluctuation in one direction:

G. Cowan Systematic uncertainties in statistical data analysis page 23 Sensitivity Discovery: Generate data under s+b (  = 1) hypothesis; Test hypothesis  = 0 → p-value → Z. Exclusion: Generate data under background-only (  = 0) hypothesis; Test hypothesis . If  = 1 has p-value < 0.05 exclude m H at 95% CL. Presence of nuisance parameters leads to broadening of the profile likelihood, reflecting the loss of information, and gives appropriately reduced discovery significance, weaker limits.

G. Cowan Systematic uncertainties in statistical data analysis page 24 Combined discovery significance Discovery signficance (in colour) vs. L, m H : Approximations used here not always accurate for L < 2 fb  but in most cases conservative.

G. Cowan Systematic uncertainties in statistical data analysis page 25 Combined 95% CL exclusion limits  p-value of m H (in colour) vs. L, m H :

G. Cowan Systematic uncertainties in statistical data analysis page 26 Fit example: b → s  (BaBar) B. Aubert et al. (BaBar), Phys. Rev. D 77, (R) (2008). Decay of one B fully reconstructed (B tag ). Look for high-energy  from rest of event. Signal and background yields from fit to m ES in bins of E . e-e-e-e- D*D*D*D*  e+e+e+e+ B tag B signal Xs  high-energy  "recoil method"

G. Cowan Systematic uncertainties in statistical data analysis page 27 Fitting m ES distribution for b → s  Fit m ES distribution using superposition of Crystal Ball and Argus functions: Crystal Ball Argus shapes rates obs./pred. events in ith bin log-likelihood:

G. Cowan Systematic uncertainties in statistical data analysis page 28 Simultaneous fit of all m ES distributions Need fits of m ES distributions in 14 bins of E  : At high E , not enough events to constrain shape, so combine all E  bins into global fit: Start with no energy dependence, and include one by one more parameters until data well described. Shape parameters could vary (smoothly) with E . So make Ansatz for shape parameters such as

G. Cowan Systematic uncertainties in statistical data analysis page 29 Finding appropriate model flexibility Here for Argus  parameter, linear dependence gives significant improvement; fitted coefficient of linear term  ± . Inclusion of additional free parameters (e.g., quadratic E dependence for parameter  ) do not bring significant improvement. So including the additional energy dependence for the shape parameters converts the systematic uncertainty into a statistical uncertainty on the parameters of interest. D. Hopkins, PhD thesis, RHUL (2007).  2 (1)   2 (2) = 3.48 p-value of (1) = →data want extra par.

G. Cowan Systematic uncertainties in statistical data analysis page 30 Towards a general strategy (frequentist) Suppose one needs to know the shape of a distribution. Initial model (e.g. MC) is available, but known to be imperfect. Q: How can one incorporate the systematic error arising from use of the incorrect model? A: Improve the model. That is, introduce more adjustable parameters into the model so that for some point in the enlarged parameter space it is very close to the truth. Then use profile the likelihood with respect to the additional (nuisance) parameters. The correlations with the nuisance parameters will inflate the errors in the parameters of interest. Difficulty is deciding how to introduce the additional parameters. In progress together with: S. Caron, S. Horner, J. Sundermann, E. Gross, O Vitells, A. Alam

G. Cowan Systematic uncertainties in statistical data analysis page 31 A simple example The naive model (a) could have been e.g. from MC (here statistical errors suppressed; point is to illustrate how to incorporate systematics.) 0th order model True model (Nature) Data

G. Cowan Systematic uncertainties in statistical data analysis page 32 Comparing model vs. data In the example shown, the model and data clearly don't agree well. To compare, use e.g. Model number of entries n i in ith bin as ~Poisson( i ) Will follow chi-square distribution for N dof for sufficiently large n i.

G. Cowan Systematic uncertainties in statistical data analysis page 33 Model-data comparison with likelihood ratio This is very similar to a comparison based on the likelihood ratio where L( ) = P(n; ) is the likelihood and the hat indicates the ML estimator (value that maximizes the likelihood). Here easy to show that Equivalently use logarithmic variable If model correct, q ~ chi-square for N degrees of freedom.

G. Cowan Systematic uncertainties in statistical data analysis page 34 p-values Using either  2 P or q, state level of data-model agreement by giving the p-value: the probability, under assumption of the model, of obtaining an equal or greater incompatibility with the data relative to that found with the actual data: where (in both cases) the integrand is the chi-square distribution for N degrees of freedom,

G. Cowan Systematic uncertainties in statistical data analysis page 35 Comparison with the 0th order model The 0th order model gives q = 258.8, p  ×  

G. Cowan Systematic uncertainties in statistical data analysis page 36 Enlarging the model Here try to enlarge the model by multiplying the 0th order distribution by a function s: where s(x) is a linear superposition of Bernstein basis polynomials of order m:

G. Cowan Systematic uncertainties in statistical data analysis page 37 Bernstein basis polynomials Using increasingly high order for the basis polynomials gives an increasingly flexible function.

G. Cowan Systematic uncertainties in statistical data analysis page 38 Enlarging the parameter space Using increasingly high order for the basis polynomials gives an increasingly flexible function. At each stage compare the p-value to some threshold, e.g., 0.1 or 0.2, to decide whether to include the additional parameter. Now iterate this procedure, and stop when the data do not require addition of further parameters based on the likelihood ratio test. (And overall goodness-of-fit should also be good.) Once the enlarged model has been found, simply include it in any further statistical procedures, and the statistical errors from the additional parameters will account for the systematic uncertainty in the original model.

G. Cowan Systematic uncertainties in statistical data analysis page 39 Fits using increasing numbers of parameters Stop here

G. Cowan Systematic uncertainties in statistical data analysis page 40 Deciding appropriate level of flexibility Stop here says whether data prefer additional parameter says whether data well described overall When p-value exceeds ~0.1 to 0.2, fit is good enough.

G. Cowan Systematic uncertainties in statistical data analysis page 41 Issues with finding an improved model Sometimes, e.g., if the data set is very large, the total  2 can be very high (bad), even though the absolute deviation between model and data may be small. It may be that including additional parameters "spoils" the parameter of interest and/or leads to an unphysical fit result well before it succeeds in improving the overall goodness-of-fit. Include new parameters in a clever (physically motivated, local) way, so that it affects only the required regions. Use Bayesian approach -- assign priors to the new nuisance parameters that constrain them from moving too far (or use equivalent frequentist penalty terms in likelihood). Unfortunately these solutions may not be practical and one may be forced to use ad hoc recipes (last resort).

G. Cowan Systematic uncertainties in statistical data analysis page 42 Summary and conclusions Key to covering a systematic uncertainty is to include the appropriate nuisance parameters, constrained by all available info. Enlarge model so that for at least one point in its parameter space, its difference from the truth is negligible. In frequentist approach can use profile likelihood (similar with integrated product of likelihood and prior -- not discussed today). Too many nuisance parameters spoils information about parameter(s) of interest. In Bayesian approach, need to assign priors to (all) parameters. Can provide important flexibility over frequentist methods. Can be difficult to encode uncertainty in priors. Exploit recent progress in Bayesian computation (MCMC). Finally, when the LHC announces a 5 sigma effect, it's important to know precisely what the "sigma" means.

G. Cowan Systematic uncertainties in statistical data analysis page 43 Extra slides

G. Cowan Systematic uncertainties in statistical data analysis page 44 Digression: marginalization with MCMC Bayesian computations involve integrals like often high dimensionality and impossible in closed form, also impossible with ‘normal’ acceptance-rejection Monte Carlo. Markov Chain Monte Carlo (MCMC) has revolutionized Bayesian computation. MCMC (e.g., Metropolis-Hastings algorithm) generates correlated sequence of random numbers: cannot use for many applications, e.g., detector MC; effective stat. error greater than naive √n. Basic idea: sample multidimensional look, e.g., only at distribution of parameters of interest.

G. Cowan Systematic uncertainties in statistical data analysis page 45 Although numerical values of answer here same as in frequentist case, interpretation is different (sometimes unimportant?) Example: posterior pdf from MCMC Sample the posterior pdf from previous example with MCMC: Summarize pdf of parameter of interest with, e.g., mean, median, standard deviation, etc.

G. Cowan Systematic uncertainties in statistical data analysis page 46 MCMC basics: Metropolis-Hastings algorithm Goal: given an n-dimensional pdf generate a sequence of points 1) Start at some point 2) Generate Proposal density e.g. Gaussian centred about 3) Form Hastings test ratio 4) Generate 5) If else move to proposed point old point repeated 6) Iterate

G. Cowan Systematic uncertainties in statistical data analysis page 47 Metropolis-Hastings (continued) This rule produces a correlated sequence of points (note how each new point depends on the previous one). For our purposes this correlation is not fatal, but statistical errors larger than naive The proposal density can be (almost) anything, but choose so as to minimize autocorrelation. Often take proposal density symmetric: Test ratio is (Metropolis-Hastings): I.e. if the proposed step is to a point of higher, take it; if not, only take the step with probability If proposed step rejected, hop in place.

G. Cowan Systematic uncertainties in statistical data analysis page 48 Metropolis-Hastings caveats Actually one can only prove that the sequence of points follows the desired pdf in the limit where it runs forever. There may be a “burn-in” period where the sequence does not initially follow Unfortunately there are few useful theorems to tell us when the sequence has converged. Look at trace plots, autocorrelation. Check result with different proposal density. If you think it’s converged, try starting from a different point and see if the result is similar.