1 Bayesian statistical methods for parton analyses Glen Cowan Royal Holloway, University of London DIS 2006.

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

1 Statistical Methods in HEP, in particular... Nuisance parameters and systematic uncertainties Glen Cowan Royal Holloway, University of London
Bayesian Estimation in MARK
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
A Bayesian Analysis of Parton Distribution Uncertainties Clare Quarman Atlas UK Physics meeting – UCL 15 th Dec 2003.
G. Cowan Statistical Data Analysis / Stat 5 1 Statistical Data Analysis Stat 5: More on nuisance parameters, Bayesian methods London Postgraduate Lectures.
G. Cowan RHUL Physics Profile likelihood for systematic uncertainties page 1 Use of profile likelihood to determine systematic uncertainties ATLAS Top.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem, random variables, pdfs 2Functions.
1 Nuisance parameters and systematic uncertainties Glen Cowan Royal Holloway, University of London IoP Half.
G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 1 Statistical Methods for Particle Physics (2) CERN-FNAL.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
G. Cowan Lectures on Statistical Data Analysis Lecture 14 page 1 Statistical Data Analysis: Lecture 14 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 13 page 1 Statistical Data Analysis: Lecture 13 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan RHUL Physics Moriond QCD 2007 page 1 Bayesian analysis and problems with the frequentist approach Rencontres de Moriond (QCD) La Thuile,
Frequently Bayesian The role of probability in data analysis
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
G. Cowan RHUL Physics Bayesian Higgs combination page 1 Bayesian Higgs combination using shapes ATLAS Statistics Meeting CERN, 19 December, 2007 Glen Cowan.
Lecture II-2: Probability Review
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
G. Cowan SUSSP65, St Andrews, August 2009 / Statistical Methods 1 page 1 Statistical Methods in Particle Physics Lecture 1: Bayesian methods SUSSP65.
G. Cowan 2009 CERN Summer Student Lectures on Statistics1 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability densities,
G. Cowan Lectures on Statistical Data Analysis Lecture 3 page 1 Lecture 3 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
G. Cowan Lectures on Statistical Data Analysis Lecture 1 page 1 Lectures on Statistical Data Analysis London Postgraduate Lectures on Particle Physics;
G. Cowan Statistical techniques for systematics page 1 Statistical techniques for incorporating systematic/theory uncertainties Theory/Experiment Interplay.
G. Cowan RHUL Physics Bayesian statistics at the LHC / Cambridge seminar page 1 Bayesian Statistics at the LHC (and elsewhere) Cambridge HEP Seminar 7.
G. Cowan RHUL Physics Bayesian Higgs combination page 1 Bayesian Higgs combination based on event counts (follow-up from 11 May 07) ATLAS Statistics Forum.
G. Cowan, RHUL Physics Discussion on significance page 1 Discussion on significance ATLAS Statistics Forum CERN/Phone, 2 December, 2009 Glen Cowan Physics.
G. Cowan RHUL Physics LR test to determine number of parameters page 1 Likelihood ratio test to determine best number of parameters ATLAS Statistics Forum.
G. Cowan Terascale Statistics School 2015 / Combination1 Statistical combinations, etc. Terascale Statistics School DESY, Hamburg March 23-27, 2015 Glen.
G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 41 Statistics for the LHC Lecture 4: Bayesian methods and further topics Academic.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
G. Cowan Lectures on Statistical Data Analysis Lecture 8 page 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem 2Random variables and.
1 Introduction to Statistics − Day 3 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
G. Cowan Lectures on Statistical Data Analysis Lecture 4 page 1 Lecture 4 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
G. Cowan CERN-JINR 2009 Summer School / Topics in Statistical Data Analysis page 1 Topics in Statistical Data Analysis for HEP Lecture 1: Bayesian Methods.
G. Cowan Computing and Statistical Data Analysis / Stat 9 1 Computing and Statistical Data Analysis Stat 9: Parameter Estimation, Limits London Postgraduate.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Bayesian Modelling Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
G. Cowan Systematic uncertainties in statistical data analysis page 1 Systematic uncertainties in statistical data analysis for particle physics DESY Seminar.
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Statistical Methods in Particle Physics1 Statistical Methods in Particle Physics Day 5: Bayesian Methods 清华大学高能物理研究中心 2010 年 4 月 12—16 日 Glen.
G. Cowan SUSSP65, St Andrews, August 2009 / Statistical Methods 1 page 1 Statistical Methods in Particle Physics Lecture 1: Bayesian methods SUSSP65.
G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1 Lecture 5 1 Probability Definition, Bayes’ theorem, probability densities and their properties,
Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.
1 Nuisance parameters and systematic uncertainties Glen Cowan Royal Holloway, University of London IoP Half.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1 Lecture 5 1 Probability Definition, Bayes’ theorem, probability densities and their properties,
Markov Chain Monte Carlo in R
Statistics for HEP Lecture 1: Introduction and basic formalism
Discussion on significance
MCMC Output & Metropolis-Hastings Algorithm Part I
Jun Liu Department of Statistics Stanford University
Lecture 4 1 Probability (90 min.)
Graduierten-Kolleg RWTH Aachen February 2014 Glen Cowan
Computing and Statistical Data Analysis / Stat 8
Discussion on fitting and averages including “errors on errors”
Lecture 3 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Lecture 4 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Statistics for Particle Physics Lecture 3: Parameter Estimation
Computing and Statistical Data Analysis / Stat 7
Lecture 4 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Bayesian Analysis of Parton Densities
Computing and Statistical Data Analysis / Stat 10
Bayesian statistical methods for parton analyses
Presentation transcript:

1 Bayesian statistical methods for parton analyses Glen Cowan Royal Holloway, University of London DIS 2006 Tsukuba 22 April, 2006 DIS2006, Tsukuba, 22 April, 2006 In collaboration with Clare Quarman

2 Outline Glen CowanDIS2006, Tsukuba, 22 April, 2006 I. Data analysis difficulties for prediction of LHC observables Some problems with frequentist statistical methods II. Bayesian statistics Quick review of basic formalism and tools Application to: incompatible data sets, model (theoretical) uncertainties. III. Prospects for LHC predictions

3 Some uncertainties in predicted cross sections Glen CowanDIS2006, Tsukuba, 22 April, 2006 I. PDFs based on fits to data with: imperfectly understood systematics, not all data compatible. II. Perturbative prediction only to limited order PDF evolution & cross sections to NLO, NNLO... III. Modelling of nonperturbative physics parametrization of PDF at low Q 2, details of flavour composition,...

4 LHC game plan Glen CowanDIS2006, Tsukuba, 22 April, 2006 Understanding uncertainties in predicted cross sections is a recognized Crucial Issue for LHC analyses, e.g. extra dimensions, parton substructure, sin 2  W For LHC observables we have uncertainties in PDFs uncertainties in parton cross sections

5 PDF fit (symbolic) Glen CowanDIS2006, Tsukuba, 22 April, 2006 Given measurements: and (usually) covariances: Predicted value: control variable PDF parameters,  s, etc. bias Often take: Minimize Equivalent to maximizing L(  ) » e  2 /2, i.e., least squares same as maximum likelihood using a Gaussian likelihood function. expectation value

6 Uncertainties from PDF fits Glen CowanDIS2006, Tsukuba, 22 April, 2006 If we have incompatible data or an incorrect model, then minimized  2 will be high, but this does not automatically result in larger estimates of the PDF parameter errors. Frequentist statistics provides a rule to obtain standard deviation of estimators (1  statistical errors):  2 =  2 min + 1 but in PDF fits this results in unrealistically small uncertainties. Try e.g.  2 =  2 min + 50, 75, 100? The problem lies in the application of a rule for statistical errors to a situation dominated by systematics & model uncertainties. → Try Bayesian statistical approach

7 Glen Cowan The Bayesian approach In Bayesian statistics we can associate a probability with a hypothesis, e.g., a parameter value . Interpret probability of  as ‘degree of belief’ (subjective). Need to start with ‘prior pdf’  (  ), this reflects degree of belief about  before doing the experiment. Our experiment has data x, → likelihood function L(x|  ). Bayes’ theorem tells how our beliefs should be updated in light of the data x: Posterior pdf p(  |x) contains all our knowledge about . DIS2006, Tsukuba, 22 April, 2006 Rev. Thomas Bayes

8 A possible Bayesian analysis Glen CowanDIS2006, Tsukuba, 22 April, 2006 and use Bayes’ theorem: To get desired probability for , integrate (marginalize) over b: → Posterior is Gaussian with mode same as least squares estimator,    same as from  2 =  2 min + 1. (Back where we started!) Take Joint probability for all parameters

9 Marginalizing with Markov Chain Monte Carlo Glen CowanDIS2006, Tsukuba, 22 April, 2006 In a Bayesian analysis we usually need to integrate over some (or all) of the parameters, e.g., Probability density for prediction of observable  (  ) Integrals often high dimension, usually cannot be done in closed form or with acceptance-rejection Monte Carlo. Markov Chain Monte Carlo (MCMC) has revolutionized Bayesian computation. (Google words: Metropolis-Hastings, MCMC) Produces a correlated sequence of points in the sampled space. Correlations here not fatal, but stat. error larger than naive √n.

10 Low Q 2 PDF: Unknown cofficients of higher order, higher twist terms,... Experimental biases,... Systematic uncertainty and nuisance parameters Glen CowanDIS2006, Tsukuba, 22 April, 2006 In general we can describe the data better by including more parameters in the model (nuisance parameters), e.g., But, more parameters → correlations → bigger errors. Bayesian approach: include more parameters along with prior probabilities that reflect how widely they can vary. Difficult (impossible) to agree on priors but remember ‘if-then’ nature of result. Usefulness to community comes from sensitivity analysis: Vary prior, see what effect this has on posterior.

11 The Full Bayesian Machine Glen CowanDIS2006, Tsukuba, 22 April, 2006 A full Bayesian PDF analysis could involve: the usual two dozen PDF parameters, a bias parameter for each systematic, more parameters to quantify model uncertainties,... as well as a meaningful assignment of priors consultation with experimenters/theorists and finally an integration over the entire parameter space to extract the posterior probability for a parameter of interest, e.g., a predicted cross section: ongoing effort, primary difficulties with MCMC

12 The error on the error Glen CowanDIS2006, Tsukuba, 22 April, 2006 Some systematic errors are well determined Error from finite Monte Carlo sample Some are less obvious Do analysis in n ‘equally valid’ ways and extract systematic error from ‘spread’ in results. Some are educated guesses Guess possible size of missing terms in perturbation series; vary renormalization scale Can we incorporate the ‘error on the error’? (cf. G. D’Agostini 1999; Dose & von der Linden 1999)

13 Motivating a non-Gaussian prior  b (b) Glen CowanDIS2006, Tsukuba, 22 April, 2006 Suppose now the experiment is characterized by where s i is an (unreported) factor by which the systematic error is over/under-estimated. Assume correct error for a Gaussian  b (b) would be s i  i sys, so Width of  s (s i ) reflects ‘error on the error’.

14 Error-on-error function  s (s) Glen CowanDIS2006, Tsukuba, 22 April, 2006 A simple unimodal probability density for 0 < s < 1 with adjustable mean and variance is the Gamma distribution: Want e.g. expectation value of 1 and adjustable standard deviation  s, i.e., mean = b/a variance = b/a 2 In fact if we took  s (s) » inverse Gamma, we could integrate  b (b) in closed form (cf. D’Agostini, Dose, von Linden). But Gamma seems more natural & numerical treatment not too painful. s(s)s(s) s

15 Prior for bias  b (b) now has longer tails Glen CowanDIS2006, Tsukuba, 22 April, 2006 Gaussian (  s = 0) P(|b| > 4  sys ) = 6.3 £  s = 0.5 P(|b| > 4  sys ) = 0.65% b(b)b(b) b

16 A simple test Glen CowanDIS2006, Tsukuba, 22 April, 2006 Suppose fit effectively averages four measurements. Take  sys =  stat = 0.1, uncorrelated. Case #1: data appear compatible Posterior p(  |y): Usually summarize posterior p(  |y) with mode and standard deviation: experiment measurement  p(|y)p(|y)

17 Simple test with inconsistent data Glen CowanDIS2006, Tsukuba, 22 April, 2006 Case #2: there is an outlier → Bayesian fit less sensitive to outlier. → Error now connected to goodness-of-fit. Posterior p(  |y): experiment measurement  p(|y)p(|y)

18 Goodness-of-fit vs. size of error Glen CowanDIS2006, Tsukuba, 22 April, 2006 In LS fit, value of minimized  2 does not affect size of error on fitted parameter. In Bayesian analysis with non-Gaussian prior for systematics, a high  2 corresponds to a larger error (and vice versa) repetitions of experiment,  s = 0.5, here no actual bias. posterior   22   from least squares

19 Is this workable for PDF fits? Glen CowanDIS2006, Tsukuba, 22 April, 2006 Straightforward to generalize to include correlations Prior on correlation coefficients ~  (  ) (Myth:  = 1 is “conservative”) Can separate out different systematic for same measurement Some will have small  s, others larger. Remember the “if-then” nature of a Bayesian result: We can (should) vary priors and see what effect this has on the conclusions.

20 Uncertainty from parametrization of PDFs Glen CowanDIS2006, Tsukuba, 22 April, 2006 Try e.g. (MRST) (CTEQ) or The form should be flexible enough to describe the data; frequentist analysis has to decide how many parameters are justified. In a Bayesian analysis we can insert as many parameters as we want, but constrain them with priors. Suppose e.g. based on a theoretical bias for things not too bumpy, that a certain parametrization ‘should hold to 2%’. How to translate this into a set of prior probabilites?

21 Residual function Glen CowanDIS2006, Tsukuba, 22 April, 2006 Try e.g. ‘residual function’ where r(x) is something very flexible, e.g., superposition of Bernstein polynomials, coefficients i : mathworld.wolfram.com Assign priors for the i centred around 0, width chosen to reflect the uncertainty in xf(x) (e.g. a couple of percent). → Ongoing effort.

22 Wrapping up Glen CowanRHUL HEP seminar, 22 March, 2006 A discovery at the LHC may depend crucially on assessing the uncertainty a predicted cross section. Systematic uncertainties difficult to treat in frequentist statistics, often wind in with ad hoc recipes. Bayesian approach tries to encapsulate the uncertainties in prior probabilities for an enlarged set of model parameters Bayes’ theorem says how where these parameters should lie in light of the data Marginalize to give probability of parameter of interest (new tool: MCMC). Very much still ongoing effort!

23 Extra slides Glen CowanDIS2006, Tsukuba, 22 April, 2006

24 Marginalizing the posterior probability density Glen CowanDIS2006, Tsukuba, 22 April, 2006 Bayes’ theorem gives the joint probability for all the parameters. Crucial difference compared to freqentist analysis is ability to marginalize over the nuisance parameters, e.g., Do e.g. with MC: sample full ( , b) space and look at distribution only of those parameters of interest. In the end we are interested not in probability of  but of some observable  (  ) (e.g. a cross section), i.e., Similarly do with MC: sample ( , b) and look at distribution of  (  ).

25 Glen Cowan Digression: marginalization with MCMC Bayesian computations involve integrals like often high dimensionality and impossible in closed form, also impossible with ‘normal’ acceptance-rejection Monte Carlo. Markov Chain Monte Carlo (MCMC) has revolutionized Bayesian computation. Google for ‘MCMC’, ‘Metropolis’, ‘Bayesian computation’,... MCMC generates correlated sequence of random numbers: cannot use for many applications, e.g., detector MC; effective stat. error greater than √n. Basic idea: sample multidimensional look, e.g., only at distribution of parameters of interest. RHUL HEP seminar, 22 March, 2006

26 Glen Cowan MCMC basics: Metropolis-Hastings algorithm Goal: given an n-dimensional pdf generate a sequence of points 1) Start at some point 2) Generate Proposal density e.g. Gaussian centred about 3) Form Hastings test ratio 4) Generate 5) If else move to proposed point old point repeated 6) Iterate RHUL HEP seminar, 22 March, 2006

27 Glen Cowan Metropolis-Hastings (continued) This rule produces a correlated sequence of points (note how each new point depends on the previous one). For our purposes this correlation is not fatal, but statistical errors larger than naive The proposal density can be (almost) anything, but choose so as to minimize autocorrelation. Often take proposal density symmetric: Test ratio is (Metropolis-Hastings): I.e. if the proposed step is to a point of higher, take it; if not, only take the step with probability If proposed step rejected, hop in place. RHUL HEP seminar, 22 March, 2006

28 Glen Cowan Metropolis-Hastings caveats Actually one can only prove that the sequence of points follows the desired pdf in the limit where it runs forever. There may be a “burn-in” period where the sequence does not initially follow Unfortunately there are few useful theorems to tell us when the sequence has converged. Look at trace plots, autocorrelation. Check result with different proposal density. If you think it’s converged, try it again with 10 times more points. RHUL HEP seminar, 22 March, 2006

29 Although numerical values of answer here same as in frequentist case, interpretation is different (sometimes unimportant?) Glen Cowan Example: posterior pdf from MCMC Sample the posterior pdf from previous example with MCMC: Summarize pdf of parameter of interest with, e.g., mean, median, standard deviation, etc. RHUL HEP seminar, 22 March, 2006

30 Bayesian approach to model uncertainty Glen CowanDIS2006, Tsukuba, 22 April, 2006 x y (measurement) model: truth: Model can be made to approximate the truth better by including more free parameters. systematic uncertainty ↔ nuisance parameters In a frequentist analysis, the correlations between the fitted parameters will result in large errors for the parameters of interest. In Bayesian approach, constrain nuisance parameters with priors.