Practical Statistics for Particle Physicists Lecture 3 Harrison B. Prosper Florida State University European School of High-Energy Physics Parádfürdő,

Slides:



Advertisements
Similar presentations
Week 11 Review: Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution.
Advertisements

CHAPTER 8 More About Estimation. 8.1 Bayesian Estimation In this chapter we introduce the concepts related to estimation and begin this by considering.
Bayesian inference “Very much lies in the posterior distribution” Bayesian definition of sufficiency: A statistic T (x 1, …, x n ) is sufficient for 
Practical Statistics for LHC Physicists Bayesian Inference Harrison B. Prosper Florida State University CERN Academic Training Lectures 9 April, 2015.
Chap 9: Testing Hypotheses & Assessing Goodness of Fit Section 9.1: INTRODUCTION In section 8.2, we fitted a Poisson dist’n to counts. This chapter will.
Introduction  Bayesian methods are becoming very important in the cognitive sciences  Bayesian statistics is a framework for doing inference, in a principled.
G. Cowan Statistics for HEP / NIKHEF, December 2011 / Lecture 3 1 Statistical Methods for Particle Physics Lecture 3: Limits for Poisson mean: Bayesian.
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
Business Statistics for Managerial Decision
G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
G. Cowan Discovery and limits / DESY, 4-7 October 2011 / Lecture 3 1 Statistical Methods for Discovery and Limits Lecture 3: Limits for Poisson mean: Bayesian.
G. Cowan RHUL Physics Bayesian Higgs combination page 1 Bayesian Higgs combination using shapes ATLAS Statistics Meeting CERN, 19 December, 2007 Glen Cowan.
Statistical Analysis of Systematic Errors and Small Signals Reinhard Schwienhorst University of Minnesota 10/26/99.
Statistical aspects of Higgs analyses W. Verkerke (NIKHEF)
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Chap 8-1 Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall Chapter 8 Confidence Interval Estimation Business Statistics: A First Course.
1 Probability and Statistics  What is probability?  What is statistics?
Bayesian Inference, Basics Professor Wei Zhu 1. Bayes Theorem Bayesian statistics named after Thomas Bayes ( ) -- an English statistician, philosopher.
Harrison B. Prosper Workshop on Top Physics, Grenoble Bayesian Statistics in Analysis Harrison B. Prosper Florida State University Workshop on Top Physics:
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
G. Cowan Lectures on Statistical Data Analysis Lecture 1 page 1 Lectures on Statistical Data Analysis London Postgraduate Lectures on Particle Physics;
Week 71 Hypothesis Testing Suppose that we want to assess the evidence in the observed data, concerning the hypothesis. There are two approaches to assessing.
Practical Statistics for Particle Physicists Lecture 3 Harrison B. Prosper Florida State University European School of High-Energy Physics Anjou, France.
Maximum Likelihood - "Frequentist" inference x 1,x 2,....,x n ~ iid N( ,  2 ) Joint pdf for the whole random sample Maximum likelihood estimates.
Statistics In HEP Helge VossHadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP 1 How do we understand/interpret our measurements.
Correlation Assume you have two measurements, x and y, on a set of objects, and would like to know if x and y are related. If they are directly related,
1 TTU report “Dijet Resonances” Kazim Gumus and Nural Akchurin Texas Tech University Selda Esen and Robert M. Harris Fermilab Jan. 26, 2006.
Confidence Intervals First ICFA Instrumentation School/Workshop At Morelia, Mexico, November 18-29, 2002 Harrison B. Prosper Florida State University.
Fall 2002Biostat Statistical Inference - Confidence Intervals General (1 -  ) Confidence Intervals: a random interval that will include a fixed.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #25.
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
Bayes Theorem. Prior Probabilities On way to party, you ask “Has Karl already had too many beers?” Your prior probabilities are 20% yes, 80% no.
Experience from Searches at the Tevatron Harrison B. Prosper Florida State University 18 January, 2011 PHYSTAT 2011 CERN.
Practical Statistics for Particle Physicists Lecture 2 Harrison B. Prosper Florida State University European School of High-Energy Physics Parádfürdő,
G. Cowan Lectures on Statistical Data Analysis Lecture 8 page 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem 2Random variables and.
1 Introduction to Statistics − Day 3 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
Study of pair-produced doubly charged Higgs bosons with a four muon final state at the CMS detector (CMS NOTE 2006/081, Authors : T.Rommerskirchen and.
DIJET STATUS Kazim Gumus 30 Aug Our signal is spread over many bins, and the background varies widely over the bins. If we were to simply sum up.
G. Cowan Lectures on Statistical Data Analysis Lecture 4 page 1 Lecture 4 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
G. Cowan Computing and Statistical Data Analysis / Stat 9 1 Computing and Statistical Data Analysis Stat 9: Parameter Estimation, Limits London Postgraduate.
1 27 March 2000CL Workshop, Fermilab, Harrison B. Prosper The Reverend Bayes and Solar Neutrinos Harrison B. Prosper Florida State University 27 March,
In Bayesian theory, a test statistics can be defined by taking the ratio of the Bayes factors for the two hypotheses: The ratio measures the probability.
1 CMS Sensitivity to Quark Contact Interactions with Dijets Selda Esen (Brown) Robert M. Harris (Fermilab) DPF Meeting Nov 1, 2006.
G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Systematics in Hfitter. Reminder: profiling nuisance parameters Likelihood ratio is the most powerful discriminant between 2 hypotheses What if the hypotheses.
From Small-N to Large Harrison B. Prosper SCMA IV, June Bayesian Methods in Particle Physics: From Small-N to Large Harrison B. Prosper Florida State.
Outline Historical note about Bayes’ rule Bayesian updating for probability density functions –Salary offer estimate Coin trials example Reading material:
Practical Statistics for LHC Physicists Frequentist Inference Harrison B. Prosper Florida State University CERN Academic Training Lectures 8 April, 2015.
Status of the Higgs to tau tau
arXiv:physics/ v3 [physics.data-an]
Exploring SUSY Parameter Space A New Bayesian Approach
The expected confident intervals for triple gauge coupling parameter
ESTIMATION.
BAYES and FREQUENTISM: The Return of an Old Controversy
Proposals for near-future BG determinations from control regions
Confidence Intervals and Limits
Chapter 8: Inference for Proportions
Lecture 4 1 Probability (90 min.)
Bayesian Inference, Basics
Confidence Interval Estimation
Lecture 4 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Statistical Tests and Limits Lecture 2: Limits
Computing and Statistical Data Analysis / Stat 7
Lecture 4 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Computing and Statistical Data Analysis / Stat 10
Presentation transcript:

Practical Statistics for Particle Physicists Lecture 3 Harrison B. Prosper Florida State University European School of High-Energy Physics Parádfürdő, Hungary 5 – 18 June, ESHEP2013 Practical Statistics Harrison B. Prosper

Outline  Lecture 1  Descriptive Statistics  Probability & Likelihood  Lecture 2  The Frequentist Approach  The Bayesian Approach  Lecture 3  The Bayesian Approach  Analysis Examples 2ESHEP2013 Practical Statistics Harrison B. Prosper

The Bayesian Approach

The Bayesian Approach – 1 Definition: A method is Bayesian if 1.it is based on the degree of belief interpretation of probability and if 2.it uses Bayes’ theorem for all inferences. Dobserved data θparameter of interest ωnuisance parameters πprior density 4ESHEP2013 Practical Statistics Harrison B. Prosper

The Bayesian Approach – 2 Nuisance parameters are removed by marginalization: in contrast to profiling, which can be viewed as marginalization with the (data-dependent) prior 5ESHEP2013 Practical Statistics Harrison B. Prosper

The Bayesian Approach – 3 Bayes’ theorem can be used to compute the probability of a model. First compute the posterior density: Dobserved data θ H parameters of model, or hypothesis, H Hmodel or hypothesis ωnuisance parameters πprior density ESHEP2013 Practical Statistics Harrison B. Prosper6

7 The Bayesian Approach – 4 1.Factorize the priors:  (  , ω, H) =  (θ H, ω | H)  (H) 2.Then, for each model, H, compute the function 3.Then, compute the probability of each model, H

8 The Bayesian Approach – 5 In order to compute p(H |D), however, two things are needed: 1.Proper priors over the parameter spaces 2.The priors  (H). In practice, we compute the Bayes factor: which is the ratio in the first bracket, B 10. ESHEP2013 Practical Statistics Harrison B. Prosper

Examples 1. Top Quark Discovery 2. Search for Contact Interactions

Example – Top Quark Discovery – 1 D Top Discovery Data D = 17 events B= 3.8 ± 0.6 events 10

Step 1: Construct a probability model for the observations then put in the data D = 17 events B = 3.8 ± 0.6 background events B = Q / k δB = √Q / k to arrive at the likelihood. 11 Example – Top Quark Discovery – 2

Example – Top Quark Discovery – 3 Step 2: Write down Bayes’ theorem: and specify the prior: It is useful to compute the following marginal likelihood: sometimes referred to as the evidence for s. 12

Example – Top Quark Discovery – 4 The Prior: What do and represent? They encode what we know, or assume, about the mean background and signal in the absence of new observations. We shall assume that s and b are non-negative. After a century of argument, the consensus today is that there is no unique way to represent such vague information. 13

14 Example – Top Quark Discovery – 5 For simplicity, we take π(b | s) = 1. We may now eliminate b from the problem: where, and where we have introduced the symbol H 1 to denote the background + signal hypothesis. Exercise 10: Show this

p(17|s, H 1 ) as a function of the expected signal s. 15 Example – Top Quark Discovery – 6

16 Example – Top Quark Discovery – 7 Given the marginal likelihood we can compute the the posterior density and the evidence for hypothesis H 1

Assuming a flat prior for the signal π (s | H 1 ) = 1, the posterior density is given by The posterior density of the parameter (or parameters) of interest is the complete answer to the inference problem and should be made available. Better still, publish the likelihood and the prior Example – Top Quark Discovery – 8 17 Exercise 11: Derive an expression for p(s | D, H 1 ) assuming a gamma prior Gamma(qs, U +1) for π(s | H 1 )

The current practice is to report summaries of the posterior density, such as Note, since this is a Bayesian calculation, this statement means: the probability (that is, the degree of belief) that s lies in [9.9, 19.8] is 0.95 Example – p(s | 17, H 1 ) ESHEP2013 Practical Statistics Harrison B. Prosper18

As noted, the number can be used to perform a hypothesis test. But, to do so, we need to specify a proper prior for the signal, that is, a prior π(s| H 1 ) that integrates to one. The simplest such prior is a δ-function, e.g.: π (s | H 1 ) = δ(s – 14), which yields p(D | H 1 ) = p(D |14, H 1 )= 9.28 x Example – Top Quark Discovery – 9 19

Since, p(D | H 1 )= 9.28 x and p(D | H 0 )= 3.86 x we conclude that the hypothesis s = 14 events is favored over the hypothesis s = 0 by 24,000 to 1. To avoid big numbers, the Bayes factor can be mapped to a (signed) measure akin to “n-sigma” (Sezen Sekmen, HBP) Example – Top Quark Discovery – Exercise 12: Compute Z for the D0 results

Example – Z vs m H for pp to γγ events 21 Here is a plot of Z(m H ) as we scan through different hypotheses about the expected signal s. The signal width and background parameters have been fixed to their maximum likelihood estimates

Hypothesis Testing – A Hybrid Approach 22 Background, B = 3.8 ± 0.6 events D is observed count This is equivalent to 4.4 σ which may be compared with the 4.5 σ obtained with B 10 Exercise 13: Verify this calculation

Example 2 CMS Search for Contact Interactions using Inclusive Jet Events CMS Exotica/QCD Group PhD work of Jeff Haas (FSU, PhD, 2013)

In our current theories, all interactions are said to arise through the exchange of bosons: But,… Contact Interactions – 1 24ESHEP2013 Practical Statistics Harrison B. Prosper

Contact Interactions – 2 … when the experimentally available energies are << than the mass of the exchanged particles, the interactions can be modeled as contact interactions (CI). Here is the most famous example: 25ESHEP2013 Practical Statistics Harrison B. Prosper

Contact Interactions – 3 The modern view of the Standard Model (SM) is that it is an effective theory: the low-energy limit of a more general (unknown) theory. For the strong interactions, we assume that the Lagrangian of the unknown theory can be approximated as follows where the O i are a set of dim-6 operators, λ = 1/Λ 2 defines the scale of the new physics, and β i are coefficients defined by the new theory. 26ESHEP2013 Practical Statistics Harrison B. Prosper

Contact Interactions – 4 The CMS contact interaction analysis, using inclusive jet events, that is, events of the form where X can be any collection of particles, was a search for deviations from the prediction of QCD, calculated at next- to-leading order (NLO) accuracy We searched for new QCD-like physics that can be modeled with a set of dim-6 operators of the form* *Eichten, Hinchliffe, Lane, Quigg, Rev. Mod. Phys. 56, 579 (1984) 27ESHEP2013 Practical Statistics Harrison B. Prosper

Contact Interactions – 5 At NLO* the cross section per jet p T bin can be written as where, c, b, a, b ’, a ’ are calculable and μ 0 is p T –dependent scale. At leading order (LO) the primed terms vanish. The 7 TeV CMS jet data, however, were analyzed using the model with c and CI(Λ) computed at NLO at LO accuracy, respectively *J. Gao, CIJET, arXiv: ESHEP2013 Practical Statistics Harrison B. Prosper

Contact Interactions – 6 The CI spectra were calculated with PYTHIA and the QCD spectrum with fastNLO This is an instructive example of physics in which the signal can be both positive and negative 29ESHEP2013 Practical Statistics Harrison B. Prosper

Analysis

Inclusive Jet Data Data M = 20 bins 507 ≤ p T ≤ 2116 GeV D = 73,792 to 3 The plot compares the observed dN/dp T spectrum with the NLO QCD prediction (using CTEQ6.6 PDFs) convolved with the CMS jet response function ESHEP2013 Practical Statistics Harrison B. Prosper31

Analysis Goal Data/QCD spectrum compared with (QCD+CI)/QCD spectra for several values of the scale Λ Analysis Goal: Determine if there is a significant deviation from QCD and, if so, measure it; if not, set a lower bound on the scale Λ ESHEP2013 Practical Statistics Harrison B. Prosper32

Analysis – 1 First Attempt Assume the following probability model for the observations where ESHEP2013 Practical Statistics Harrison B. Prosper33

Analysis Issues 1.Counts range from ~70,000 to 3! This causes the limits on Λ to be very sensitive to the normalization α. For example, increasing α by 1% decreases the limit by 25%! 2.Spectrum sensitive to the jet energy scale (JES) 3.And to the parton distribution functions (PDF) 4.Simulated CI models (using PYTHIA) were available for only 4 values of Λ, namely, Λ = 3, 5, 8, and 12 TeV for destructive interference models only 5.Insane deadlines and the need, occasionally, to sleep! ESHEP2013 Practical Statistics Harrison B. Prosper34

Analysis Issues Solution: (Channel the Reverend Thomas Bayes!) 1.Integrate the likelihood over the scale factor α 2.Integrate the likelihood over the JES 3.Integrate the likelihood over the PDF parameters 4.Interpolate over the 4 PYTHIA CI models 1.Ignore insane deadlines and sleep as needed! ESHEP2013 Practical Statistics Harrison B. Prosper35

Analysis – 2 Step 1: Re-write as where ESHEP2013 Practical Statistics Harrison B. Prosper36 Exercise 14: Show this

Analysis – 3 Step 2: Now eliminate α by integrating with respect to α. To do so, we need a prior density for α. In the absence of reliable information about this parameter, we use which is an example of a reference prior*. *L. Demortier, S. Jain, HBP, arxiv: (2010) ESHEP2013 Practical Statistics Harrison B. Prosper37

Analysis – 4 Step 3: The integration with respect to α yields But, after more thought, we realized that almost all the information about the models is contained in the shapes of their jet p T spectra, especially given that the total jet count is large (~200,000). This causes the multinomial to be particularly sensitive to the spectral shapes Therefore, we could simply start with the multinomial and sidestep the normalization problem ESHEP2013 Practical Statistics Harrison B. Prosper38

Analysis – 5 Step 4: Fit a 4-parameter interpolation function f to the four spectral ratios (QCD NLO +CI LO )/QCD NLO simultaneously. The cross section (per p T bin) is then modeled with σ = f (λ, p 1,…, p 4 ) σ QCD where ESHEP2013 Practical Statistics Harrison B. Prosper39

Analysis – 6 List of nuisance parameters (“systematics” in HEP jargon): 1.the jet energy scale (JES), 2.jet energy resolution (JER), 3.the PDF parameters (PDF), 4.the factorization an renormalization scales (μ F, μ R ) 5.the parameters ω = p 1 …p 4 of the function f (λ, ω) ESHEP2013 Practical Statistics Harrison B. Prosper40

Analysis – 7 Step 5: We use Bayes’ theorem to calculate the posterior density of the parameter of interest λ, VIP (Very Important Point): whatever the nature or provenance of nuisance parameters, whatever words we use to describe them, statistical, systematic, best guess, gut feeling…, in a Bayesian calculation we “simply” integrate them out of the problem. ESHEP2013 Practical Statistics Harrison B. Prosper41

Analysis – 8 Bayesian Hierarchical Modeling The parameters ω = p 1 …p 4 that appear in the likelihood depend on φ = JES, JER, PDFs, μ F, and μ R. This fact can be modeled hierarchically as follows and the density π(ω | λ, φ) models how ω depends on φ ESHEP2013 Practical Statistics Harrison B. Prosper42

Analysis – 9 Step 6: Simultaneously sample: 1.the jet energy scale, 2.the jet energy resolution, 3.the (CTEQ6.6) PDF parameters, 4.the factorization and renormalization scales and, for each set of parameters, fit the parameters ω = p 1 …p 4 thereby creating points {ω i } that constitute the prior π(ω | λ) ESHEP2013 Practical Statistics Harrison B. Prosper43

Analysis – Ensemble of Coefficients Ensemble of coefficients c, b, a, as a function of jet p T, created by simultaneous sampling of all “systematics” ESHEP2013 Practical Statistics Harrison B. Prosper44 c b a

Analysis – 10 Step 7: Finally, we compute a 95% Bayesian interval by solving for λ UP, from which we compute Λ = 1/√λ UP. The published limits were calculated for π(λ) = 1 and for π(λ) = a reference prior* (and annoyingly, using CL s ): Λ > 10.1 TeV or Λ > % C.L. for models with destructive or constructive interference, respectively *L. Demortier, S. Jain, HBP, arxiv: (2010) ESHEP2013 Practical Statistics Harrison B. Prosper45

46 Summary – 1 Probability Two main interpretations: 1.Degree of belief 2.Relative frequency Likelihood Function Main ingredient in any non-trivial statistical analysis Frequentist Principle Construct statements such that a fraction p ≥ CL of them will be true over a specified ensemble of statements. ESHEP2013 Practical Statistics Harrison B. Prosper

Summary – 2 Frequentist Approach 1.Use likelihood function only 2.Eliminate nuisance parameters by profiling 3.Fisher: Reject null if p-value is judged to be small enough 4.Neyman: Decide on a fixed threshold α for rejection and reject null if p-value < α, but do so only if the probability of the alternative is judged to be high enough Bayesian Approach 1.Model all uncertainty using probabilities and use Bayes’ theorem to make inferences 2.Eliminate nuisance parameters through marginalization ESHEP2013 Practical Statistics Harrison B. Prosper47

The End ESHEP2013 Practical Statistics Harrison B. Prosper48 “Have the courage to use your own understanding!” Immanuel Kant