# Practical Statistics for LHC Physicists Bayesian Inference Harrison B. Prosper Florida State University CERN Academic Training Lectures 9 April, 2015.

## Presentation on theme: "Practical Statistics for LHC Physicists Bayesian Inference Harrison B. Prosper Florida State University CERN Academic Training Lectures 9 April, 2015."— Presentation transcript:

Practical Statistics for LHC Physicists Bayesian Inference Harrison B. Prosper Florida State University CERN Academic Training Lectures 9 April, 2015

Confidence Intervals – Recap 2 sample space parameter space Any questions about this figure? s u l

Outline  Frequentist Hypothesis Tests Continued…  Bayesian Inference 3

Hypothesis Tests In order to perform a realistic hypothesis test we need first to rid ourselves of nuisance parameters. Here are the two primary ways: 1.Use a likelihood averaged over the nuisance parameters. 2.Use the profile likelihood. 4

5 Recall, that for B = 3.0 events (ignoring the uncertainty) observed count we found Example 1: W ± W ± jj Production (ATLAS) (sometimes called the Z-value)

Method 1: We eliminate b from the problem as follows*: where, L(s) = P(12|s) is the average likelihood. ( *R.D. Cousins and V.L. Highland. Nucl. Instrum. Meth., A320:331–335, 1992 ) Exercise 10: Show this 6 Example 1: W ± W ± jj Production (ATLAS)

Background, B = 3.0 ± 0.6 events D is observed count This is equivalent to 3.5 σ which may be compared with the 3.8 σ obtained earlier. Exercise 11: Verify this calculation 7 Example 1: W ± W ± jj Production (ATLAS)

An Aside on s / √b The quantity s / √b is often used as a rough measure of significance on the “n-σ” scale. But, it should be used with caution. In our example, s ~ 12 – 3.0 = 9.0 events. So according to this measure, the ATLAS W ± W ± jj result is a 9.0/√3 ~ 5.2σ effect, which is to be compared with 3.8σ! Beware of s / √b! 8

The Profile Likelihood Revisited Recall that the profile likelihood is just the likelihood with all nuisance parameters replaced by their conditional maximum likelihood estimates (CMLE). In our example, We also saw that the quantity can be used to compute approximate confidence intervals. 9

10 t(s) can also be used to test hypotheses, in particular, s = 0. Wilks’ theorem, applied to our example, states that for large samples the density of the signal estimate will be approximately, if s is the true expected signal count. Then, will be distributed approximately as a χ 2 density of one degree of freedom, that is, as a density that is independent of all the parameters of the problem! The Profile Likelihood Revisited

11 Since we now know the analytical form of the probability density of t, we can calculate the observed p-value = P[t(0) ≥ t obs (0)] given the observed value t(0), t obs (0), for the s = 0 hypothesis. Then if we find that the p-value < α, the significance of our test, we reject the s = 0 hypothesis. Furthermore, Z = √t obs (0), so we can skip the p-value calculation! The Profile Likelihood Revisited

Background, B = 3.0 ± 0.6 events. For this example, t obs (0) = 12.65 therefore, Z = 3.6 Exercise 12: Verify this calculation 12 Example 1: W ± W ± jj Production (ATLAS)

Example 4: Higgs to γγ (CMS)

This example mimics part of the CMS Run 1 Higgs γγ data. We simulate 20,000 background di-photon masses and 200 signal masses. The signal is chosen to be a Gaussian bump with standard deviation 1.5 GeV and mean of 125 GeV. background model signal model Example 4: Higgs to γγ (CMS) 14

Example 4: Higgs to γγ (CMS) Fitting using Minuit (via RooFit) yields: 15

Bayesian Inference

Bayesian Inference – 1 Definition: A method is Bayesian if 1.it is based on the degree of belief interpretation of probability and if 2.it uses Bayes’ theorem for all inferences. Dobserved data θparameter of interest ωnuisance parameters πprior density 17

Bayesian Inference – 2 Nuisance parameters are removed by marginalization: in contrast to profiling, which can be viewed as marginalization with the data-dependent prior 18

Bayesian Inference – 3 Bayes’ theorem can be used to compute the probability of a model. First compute the posterior density: Dobserved data Hmodel or hypothesis θ H parameters of model H ωnuisance parameters πprior density 19

Bayesian Inference – 4 1.Factorize the priors:  (  , ω, H) =  (θ H, ω | H)  (H) 2.Then, for each model, H, compute the function 3.Then, compute the probability of each model, H 20

Bayesian Inference – 5 In order to compute p(H |D), however, two things are needed: 1.Proper priors over the parameter spaces 2.The priors  (H). In practice, we compute the Bayes factor: which is the ratio in the first bracket, B 10. 21

Example 1: W ± W ± jj Production (ATLAS)

Step 1: Construct a probability model for the observations and insert the data D = 12 events B = 3.0 ± 0.6 background events B = Q / k δB = √Q / k to arrive at the likelihood. 23 Example 1: W ± W ± jj Production (ATLAS)

Step 2: Write down Bayes’ theorem: and specify the prior: It is often convenient first to compute the marginal likelihood by integrating over the nuisance parameters 24 Example 1: W ± W ± jj Production (ATLAS)

The Prior: What do and represent? They encode what we know, or assume, about the expected background and signal in the absence of new observations. We shall assume that s and b are non-negative and finite. After a century of argument, the consensus today is that there is no unique way to represent such vague information. 25 Example 1: W ± W ± jj Production (ATLAS)

For simplicity, we shall take π(b | s) = 1*. We may now eliminate b from the problem: which, of course, is exactly the same function we found earlier! H 1 represents the background + signal hypothesis. *Luc Demortier, Supriya Jain, Harrison B. Prosper, Reference priors for high energy physics, Phys.Rev.D82:034002,2010 26 Example 1: W ± W ± jj Production (ATLAS)

L(s) = P(12 | s, H 1 ) is marginal likelihood for the expected signal s. Here we compare the marginal and profile likelihoods. For this problem they are found to be almost identical. But, this happy thing does not always happen! 27 Example 1: W ± W ± jj Production (ATLAS)

Given the likelihood we can compute the posterior density where 28 Example 1: W ± W ± jj Production (ATLAS)

Assuming a flat prior for the signal π (s | H 1 ) = 1, the posterior density is given by The posterior density of the parameter (or parameters) of interest is the complete answer to the inference problem and should be made available. Better still, publish the likelihood and the prior. Exercise 13: Derive an expression for p(s | D, H 1 ) assuming a gamma prior Gamma(qs, U +1) for π(s | H 1 ) 29 Example 1: W ± W ± jj Production (ATLAS)

By solving we obtain Since this is a Bayesian calculation, this statement means: the probability (that is, the degree of belief) that s lies in [6.3, 13.5] is 0.68. 30 Example 1: W ± W ± jj Production (ATLAS)

As noted, the number can be used to perform a hypothesis test. But, to do so, we need to specify a proper prior for the signal, that is, a prior π(s| H 1 ) that integrates to one. The simplest such prior is a δ-function, e.g.: π (s | H 1 ) = δ(s – 9), which yields p(D | H 1 ) = p(D |9, H 1 ) =1.13 x 10 -1 31 Example 1: W ± W ± jj Production (ATLAS)

From p(D | H 1 )= 1.13 x 10 -1 and p(D | H 0 )= 2.23 x 10 -4 we conclude that the odds in favor of the hypothesis s = 9 has increased by 506 relative to the whatever prior odds you started with. It is useful to convert this Bayes factor into a (signed) measure akin to the “n-sigma” (Sezen Sekmen, HBP) Exercise 14: Verify this number 32 Example 1: W ± W ± jj Production (ATLAS)

Here is a plot of Z vs. m H as we scan through different hypotheses about the expected signal s. For simplicity, the signal width and background parameters have been fixed to their maximum likelihood estimates. 33 Example 4: Higgs to γγ (CMS)

Summary – 1 Probability Two main interpretations: 1.Degree of belief 2.Relative frequency Likelihood Function Main ingredient in any full scale statistical analysis Frequentist Principle Construct statements such that a fraction f ≥ C.L. of them will be true over a specified ensemble of statements. 34

Summary – 2 Frequentist Approach 1.Use likelihood function only. 2.Eliminate nuisance parameters by profiling. 3.Decide on a fixed threshold α for rejection and reject null if p-value < α, but do so only if rejecting the null makes scientific sense, e.g.: the probability of the alternative is judged to be high enough. Bayesian Approach 1.Model all uncertainty using probabilities and use Bayes’ theorem to make all inferences. 2.Eliminate nuisance parameters through marginalization. 35

The End “Have the courage to use your own understanding!” Immanuel Kant 36

Download ppt "Practical Statistics for LHC Physicists Bayesian Inference Harrison B. Prosper Florida State University CERN Academic Training Lectures 9 April, 2015."

Similar presentations