Easy Limit Statistics Andreas Hoecker CAT Physics, Mar 25, 2011.

Slides:

Advertisements

Similar presentations

Using the Profile Likelihood in Searches for New Physics / PHYSTAT 2011 G. Cowan 1 Using the Profile Likelihood in Searches for New Physics arXiv:

Advertisements

Probability and Statistics Basic concepts II (from a physicist point of view) Benoit CLEMENT – Université J. Fourier / LPSC

27 th March CERN Higgs searches: CL s W. J. Murray RAL.

An Answer and a Question Limits: Combining 2 results Significance: Does  2 give  2 ? Roger Barlow BIRS meeting July 2006.

G. Cowan Statistics for HEP / LAL Orsay, 3-5 January 2012 / Lecture 2 1 Statistical Methods for Particle Physics Lecture 2: Tests based on likelihood ratios.

Statistics In HEP 2 Helge VossHadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP 1 How do we understand/interpret our measurements.

G. Cowan RHUL Physics Profile likelihood for systematic uncertainties page 1 Use of profile likelihood to determine systematic uncertainties ATLAS Top.

8. Statistical tests 8.1 Hypotheses K. Desch – Statistical methods of data analysis SS10 Frequent problem: Decision making based on statistical information.

G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem, random variables, pdfs 2Functions.

G. Cowan Statistics for HEP / NIKHEF, December 2011 / Lecture 3 1 Statistical Methods for Particle Physics Lecture 3: Limits for Poisson mean: Bayesian.

G. Cowan RHUL Physics Comment on use of LR for limits page 1 Comment on definition of likelihood ratio for limits ATLAS Statistics Forum CERN, 2 September,

G. Cowan Statistical methods for HEP / Birmingham 9 Nov Recent developments in statistical methods for particle physics Particle Physics Seminar.

G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.

G. Cowan 2011 CERN Summer Student Lectures on Statistics / Lecture 41 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability.

G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 1 Statistical Methods for Particle Physics (2) CERN-FNAL.

G. Cowan RHUL Physics Higgs combination note status page 1 Status of Higgs Combination Note ATLAS Statistics/Higgs Meeting Phone, 7 April, 2008 Glen Cowan.

G. Cowan Statistics for HEP / NIKHEF, December 2011 / Lecture 2 1 Statistical Methods for Particle Physics Lecture 2: Tests based on likelihood ratios.

G. Cowan Discovery and limits / DESY, 4-7 October 2011 / Lecture 2 1 Statistical Methods for Discovery and Limits Lecture 2: Tests based on likelihood.

G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.

G. Cowan Discovery and limits / DESY, 4-7 October 2011 / Lecture 3 1 Statistical Methods for Discovery and Limits Lecture 3: Limits for Poisson mean: Bayesian.

G. Cowan RHUL Physics Bayesian Higgs combination page 1 Bayesian Higgs combination using shapes ATLAS Statistics Meeting CERN, 19 December, 2007 Glen Cowan.

Statistical aspects of Higgs analyses W. Verkerke (NIKHEF)

Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department.

Discovery Experience: CMS Giovanni Petrucciani (UCSD)

880.P20 Winter 2006 Richard Kass 1 Confidence Intervals and Upper Limits Confidence intervals (CI) are related to confidence limits (CL). To calculate.

1 Glen Cowan Statistics Forum News Glen Cowan Eilam Gross ATLAS Statistics Forum CERN, 3 December, 2008.

Statistical problems in network data analysis: burst searches by narrowband detectors L.Baggio and G.A.Prodi ICRR TokyoUniv.Trento and INFN IGEC time coincidence.

G. Cowan 2009 CERN Summer Student Lectures on Statistics1 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability densities,

Why do Wouter (and ATLAS) put asymmetric errors on data points ? What is involved in the CLs exclusion method and what do the colours/lines mean ? ATLAS.

1 Methods of Experimental Particle Physics Alexei Safonov Lecture #23.

Statistics In HEP Helge VossHadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP 1 How do we understand/interpret our measurements.

G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 21 Topics in Statistical Data Analysis for HEP Lecture 2: Statistical Tests CERN.

1 Proposal to change limits for exclusion N.Andari, L. Fayard, M.Kado, F.Polci LAL- Orsay Y.Fang, Y.Pan, H.Wang, Sau Lan Wu University of Wisconsin-Madison.

1 Methods of Experimental Particle Physics Alexei Safonov Lecture #25.

G. Cowan RHUL Physics page 1 Status of search procedures for ATLAS ATLAS-CMS Joint Statistics Meeting CERN, 15 October, 2009 Glen Cowan Physics Department.

G. Cowan ATLAS Statistics Forum / Minimum Power for PCL 1 Minimum Power for PCL ATLAS Statistics Forum EVO, 10 June, 2011 Glen Cowan* Physics Department.

G. Cowan, RHUL Physics Discussion on significance page 1 Discussion on significance ATLAS Statistics Forum CERN/Phone, 2 December, 2009 Glen Cowan Physics.

G. Cowan RHUL Physics LR test to determine number of parameters page 1 Likelihood ratio test to determine best number of parameters ATLAS Statistics Forum.

Experience from Searches at the Tevatron Harrison B. Prosper Florida State University 18 January, 2011 PHYSTAT 2011 CERN.

G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 1 Input from Statistics Forum for Exotics ATLAS Exotics Meeting CERN/phone, 22 January,

1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.

G. Cowan Lectures on Statistical Data Analysis Lecture 8 page 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem 2Random variables and.

1 Introduction to Statistics − Day 3 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.

Study of pair-produced doubly charged Higgs bosons with a four muon final state at the CMS detector (CMS NOTE 2006/081, Authors : T.Rommerskirchen and.

G. Cowan Lectures on Statistical Data Analysis Lecture 4 page 1 Lecture 4 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.

G. Cowan Statistical methods for particle physics / Cambridge Recent developments in statistical methods for particle physics HEP Phenomenology.

G. Cowan Computing and Statistical Data Analysis / Stat 9 1 Computing and Statistical Data Analysis Stat 9: Parameter Estimation, Limits London Postgraduate.

G. Cowan, RHUL Physics Statistics for early physics page 1 Statistics jump-start for early physics ATLAS Statistics Forum EVO/Phone, 4 May, 2010 Glen Cowan.

G. Cowan Statistical methods for particle physics / Wuppertal Recent developments in statistical methods for particle physics Particle Physics.

G. Cowan RHUL Physics Status of Higgs combination page 1 Status of Higgs Combination ATLAS Higgs Meeting CERN/phone, 7 November, 2008 Glen Cowan, RHUL.

In Bayesian theory, a test statistics can be defined by taking the ratio of the Bayes factors for the two hypotheses: The ratio measures the probability.

G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.

G. Cowan Statistical methods for HEP / Freiburg June 2011 / Lecture 2 1 Statistical Methods for Discovery and Limits in HEP Experiments Day 2: Discovery.

1 Studies on Exclusion Limits N.Andari, L. Fayard, M.Kado, F.Polci LAL- Orsay LAL meeting 23/07/2009.

Systematics in Hfitter. Reminder: profiling nuisance parameters Likelihood ratio is the most powerful discriminant between 2 hypotheses What if the hypotheses.

G. Cowan RHUL Physics Statistical Issues for Higgs Search page 1 Statistical Issues for Higgs Search ATLAS Statistics Forum CERN, 16 April, 2007 Glen Cowan.

S. Ferrag, G. Steele University of Glasgow. RooStats and MClimit comparison Exercise to use RooStats by an MClimit-formatted person: – Use two programs.

G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 21 Statistics for the LHC Lecture 2: Discovery Academic Training Lectures CERN,

G. Cowan SLAC Statistics Meeting / 4-6 June 2012 / Two Developments 1 Two developments in discovery tests: use of weighted Monte Carlo events and an improved.

G. Cowan CERN Academic Training 2012 / Statistics for HEP / Lecture 21 Statistics for HEP Lecture 2: Discovery and Limits Academic Training Lectures CERN,

Discussion on significance

Status of the Higgs to tau tau

Statistical Methods used for Higgs Boson Searches

Lecture 4 1 Probability (90 min.)

School on Data Science in (Astro)particle Physics

TAE 2018 / Statistics Lecture 3

Lecture 4 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.

Statistical Methods for the LHC

Lecture 4 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.

Why do Wouter (and ATLAS) put asymmetric errors on data points ?

Presentation transcript:

Easy Limit Statistics Andreas Hoecker CAT Physics, Mar 25, 2011

The Goals N obs N obs for 5  discovery P = –7 Discovery test Upper limit test Expected number of events distribution for background-only hypothesis (4) Expected number of events distribution for sigma + background hypothesis (10) N obs for 95% upper limit In a discovery test one wants to measure the probability of an upward fluctuation for background only In an upper limit test, one wants to measure the probability of a downward fluctuation of signal + background B only S + B

Funny Parameters

Likelihood Function L can be very simple, eg, for a counting experiment: – Number counting: – Number counting with background uncertainty (nuisance parameter) – Signal prediction (expected numbers of events) usually also has nuisance parameters: cross section, selection efficiency, luminosity uncertainties, etc.

Likelihood Function L can be very simple, eg, for a counting experiment: L can also be complex – Several distinct signal and background contributions – Several discriminating variables (use product of PDFs) – Some variables may have event-by-event scaling factors – Signal, background and PDF shape parameters may be floating – Physical parameters may be number of events but also signal properties – Likelihood may be split into categories with different subpopulations of events with common and non-common parameters Most ATLAS search analyses so far dealt with counting likelihoods in presence of signal cross section and efficiency uncertainties, as well as background abundance uncertainties

One-sided Test Statistics Large values of correspond to increasing disagreement between data and hypothesis  This test statistics behaves asymptotically similar to a  2 for large data samples and Gaussian nuisance parameters

One-sided Test Statistics “ratio of likelihoods”, why ? Why not simply using L( ,   ) as test statistics ? The number of degrees of freedom of the fit would be N  + 1  However, we are not interested in the values of  (  they are nuisance !) Additional degrees of freedom dilute interesting information on  The “profile likelihood” (= ratio of maximum likelihoods) concentrates the information on what we are interested in It is just as we usually do for chi-squared:  2 (m) =  2 (m,  best’  ) –  2 (m best,  best ) N d.o.f. of  2 (m) is 1, and value of  2 (m best,  best ) measures “Goodness-of-fit”

One-sided Test Statistics “one-sided” upper limit condition, why ?

Consider Discovery Case Want to test significance of signal excess ➠ Test p-value of background-only hypothesis Produce toy experiments with  = 0 (fluctuate N obs around B, and fluctuate   ), maximise both likelihoods, determine PDF( |B ) and compute:

Consider Discovery Case Want to test significance of signal excess ➠ Test p-value of background-only hypothesis If new physics cannot destructively interfere with SM (background), can inject that S ≥ 0

Example: N obs = 120, B = 100 no uncertainty on B Injecting S ≥ 0 information has reduced p-value by factor of ≈2 and thus enhanced discovery reach S < 0 solution represents a dilution of the statistical information in the data Consider Discovery Case Experiments with N obs ≤ B Toy experiments with |N obs – B| > 20 Experiments with N obs – B > p 2-sided = 0.058p 1-sided = 0.028

Example: N obs = 120, B = 100 no uncertainty on B Difference between factor = 2 due to asymmetric Poisson statistics Compare: N obs = 1062, B = 1000  p 1-sided = 0.027, p 2-sided = N obs = 15, B = 9  p 1-sided = 0.041, p 2-sided = Consider Discovery Case p 2-sided = 0.058p 1-sided = Experiments with N obs ≤ B Toy experiments with |N obs – B| > 20 Experiments with N obs – B >

Upper Limit Case No signal excess, want to obtain upper limit ➠ Test p-value of signal + background hypothesis Produce toy experiments with  =  hypo (fluctuate N obs around  S + B, and fluctuate   ), maximise likelihoods, determine PDF( |  S + B ) and compute:

Example: N obs = 100, B = 100 (no error), S hypo = 20 Again, injecting S ≥ 0 information has improved sensitivity of analysis (95% CL limits of sided vs sided ) Upper Limit Case CL s+b = 0.062CL s+b = Exps with N obs ≤ B & ≥ B + 2S hypo Experiments with N obs ≤ B

Nuisance Parameters So far only discrete cases considered: nothing else than Poisson probability summation The problems come when maximising likelihoods with respect to nuisance parameters Additional Gaussian terms make L continuous

Example: N obs = 120, B = 100 ± 0.1 Half of experiments with q ,toy ≈ q ,obs unaccounted in continuous case This gives a better (!) discovery reach, and also a more stringent upper limit Discovery Case with Error on B Experiments with 0 0 p 2-sided = 0.025p 1-sided = Experiments with

Example: N obs = 120, B = 100 ± 1 Half of experiments with q ,toy ≈ q ,obs unaccounted in continuous case With increasing background uncertainty the p-value gets larger again Discovery Case with Error on B Experiments with 0 0 p 2-sided = 0.025p 1-sided = Experiments with

Example: N obs = 120, B = 100 ± 5 Half of experiments with q ,toy ≈ q ,obs unaccounted in continuous case With increasing background uncertainty the p-value gets larger again Discovery Case with Error on B Experiments with 0 0 p 2-sided = 0.041p 1-sided = Experiments with

Example: N obs = 120, B = 100 ± 10 Half of experiments with q ,toy ≈ q ,obs unaccounted in continuous case Eventually, the discovery reach becomes worse than in the discrete case Discovery Case with Error on B Experiments with 0 0 p 2-sided = 0.084p 1-sided = Experiments with

Discrete vs. Continues Test Statistics To bring discrete and continuous case together for negligible error on B, compute p-value as follows: In that case, p-value of previous example decreases from to (= continuous case with small  B ) Justification: discrete case “overcovers” Will get back to coverage later… See: document on discreteness problem (Glen + Eilam)document on discreteness problem (Glen + Eilam) P is Poisson probability

Upper Limit with Null Observation Naïve solution: With new prescription: Example for N obs = 0, B = 0 ±  B ) Discrete limit without background uncertainty Discrete limit with new prescription

Expected Limits – Median Sensitivity Prescription to compute “green & yellow bands” – Median sensitivity is based on background only hypothesis 1.Create toy experiments where N obs fluctuates around B only 2.Scan through S hypo 3.For each toy experiment compute CL s+b (S hypo ) [from another toy !] 4.Determine median and 68%, 95% error bands for CL s+b (S hypo ) 5.Plot bands and publish yet another limit 95% CL limit Median sensitivity: S 95 = 18.1 Standard example: B = 100,  (B) = 0

Expected Limits – Median Sensitivity Prescription to compute “green & yellow bands” – Median sensitivity is based on background only hypothesis 1.Create toy experiments where N obs fluctuates around B only 2.Scan through S hypo 3.For each toy experiment compute CL s+b (S hypo ) [from another toy !] 4.Determine median and 68%, 95% error bands for CL s+b (S hypo ) 5.Plot bands and publish yet another limit 95% CL limit Median : S 95 ≈ 37 New example: B = 100,  (B) = 20

Being a Good Citizen Our CL s+b UL takes benefit from upwards fluctuations in background (remember the N obs = 0 case: S 95 = 2.3 – B) [ Would not be the case for:  null observation limit increases with B ! ] With some luck, limits (far) better than sensitivity could be obtained Discuss two remedies here: CL s and PCL

Modified Frequentist Method LEP (A. Read) & Tevatron: CL s = CL s+b / CL b, where: – This is not a statistical method in the proper sense: the ratio of two probabilities is not a probability – CL s (S 95,obs ) = 0.05 determines 95% CL upper limit S 95,obs – Dividing by CL b is a penalty: in case of a fluctuation away from expected B, both CL s+b and CL b will be small, but not CL s – CL s has overcoverage in general

Reuse previous example to illustrate CL s 95% CL limit Median : S 95 ≈ 18.1 New example: B = 100,  (B) = 0 95% CL limit Median : S 95 ≈ 21 New example: B = 100,  (B) = 0 CL s CL s+b

Reuse previous example to illustrate CL s 95% CL limit Median : S 95 ≈ 37 New example: B = 100,  (B) = 20 95% CL limit Median : S 95 ≈ 44 New example: B = 100,  (B) = 20 CL s CL s+b

Power-Constrained Limit (PCL) Keep CL s+b and solve problem of over-exclusion by introducing a “power constraint” – CL s+b (S 95 ) = 0.05 determines 95% CL upper limit S 95,obs – However, use constraint: S 95,obs = Max(S 95,obs, S 95,median – 1  ) – Choice of power constraint is arbitrary, but fixed – PCL has advantage of proper coverage, and protects against excluding non-testable hypotheses – CL s is also arbitrary and overcovers, but has advantage of being smooth  may appear less ad hoc to non-experts (at conferences)

Remark on Coverage “CL s+b, if obtained from toy experiments has correct coverage”. Correct ? No ! – It only has proper coverage if the nuisance parameters used to create the toys correspond to the truth – This assumption can only be wrong – Limits obtained will depend on  truth values used – Custom but not unique choice is to use best fit values  fit – A conservative limit should include  truth  variations, but full Neyman construction impossible because  truth unbound – Try ad hoc variation  truth  fit  ± 1  and redetermine limits ➠ Effect on standard example very small (N obs = 100, B = 100 ± 20) :  S 95 = 1.3%

How to Generate Toy Experiments The way how toy experiments are generated matters To obtain upper limit for given signal hypothesis: 1.Compute observed test statistics in data 2.Generate for toy {i } N obs,i around expected background + signal hypothesis using best fit values for nuisance parameters (unsmeared!) 3.Generate Gaussian-smeared nuisance parameters  i around best fit values for hypothesis (“unconditional ensemble”) 4.Compute test statistics using N obs,i and smeared  i, representing the measurements of that toy experiment 5.Count how often toy test statistics is larger or equal than data test statistics and compute CL s+b

Short Cuts – Asymptotic Behaviour One could not want to bother with toys and use “Wilk’s theorem” instead, ie, postulate:, and compute CL s+b (  ) = TMath::Prob(  2 (  ), 1) – Usually not good in presence of small numbers – Should preferably not be used for the observed limit or small evidence p-value – For 5  discovery, one would need at least 10M toys to see a few events, impractical – Could be used to derive median sensitivity and error bands, which may be necessary in case of very complex, CPU- intensive fits

Short Cuts – Asymptotic Behaviour The test statistics has well defined asymptotic behaviour for sufficiently large data samples – Asymptotic PDF for given  hypothesis known analytically – PDF requires standard deviation of floating signal strength parameter, which can be obtained for given  – Very useful for expected limit (“yellow & green band”) computation – This is nicely described in G. Cowan et al. arXiv: G. Cowan et al. arXiv:

References ATLAS SCs Frequentist Limit Recommendation Document on discreteness problem (Glen + Eilam) Paper on asymptotic formulas (G. Cowan et al) Paper on asymptotic formulas (G. Cowan et al) 1 st ATLAS Physics & Statistics meeting, Mar 15, st ATLAS Physics & Statistics meeting, Mar 15, ATLAS Physics & Statistics workshop, April 15, Nicolas Berger’s asymptotic behaviour study for H  Nicolas Berger’s asymptotic behaviour study for H  Most recent CDF + D0 Higgs combination paper