Statistical Issues in Searches for New Physics

Slides:



Advertisements
Similar presentations
DO’S AND DONT’S WITH LIKELIHOODS Louis Lyons Oxford (CDF)
Advertisements

1 BAYES and FREQUENTISM: The Return of an Old Controversy Louis Lyons Oxford University LBL, January 2008.
Group A: Limits and p-values with nuisance parameters Banff, July 2006.
27 th March CERN Higgs searches: CL s W. J. Murray RAL.
BAYES and FREQUENTISM: The Return of an Old Controversy 1 Louis Lyons Imperial College and Oxford University CERN Latin American School March 2015.
1 LIMITS Why limits? Methods for upper limits Desirable properties Dealing with systematics Feldman-Cousins Recommendations.
BAYES and FREQUENTISM: The Return of an Old Controversy 1 Louis Lyons Imperial College and Oxford University Heidelberg April 2013.
1 χ 2 and Goodness of Fit Louis Lyons IC and Oxford SLAC Lecture 2’, Sept 2008.
1 BAYES versus FREQUENTISM The Return of an Old Controversy The ideologies, with examples Upper limits Systematics Louis Lyons, Oxford University and CERN.
1 Do’s and Dont’s with L ikelihoods Louis Lyons Oxford and IC CDF IC, February 2008.
Statistics In HEP 2 Helge VossHadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP 1 How do we understand/interpret our measurements.
1 Do’s and Dont’s with L ikelihoods Louis Lyons Oxford CDF Mexico, November 2006.
1 BAYES and FREQUENTISM: The Return of an Old Controversy Louis Lyons Oxford University CERN, October 2006 Lecture 4.
Current Statistical Issues in Particle Physics Louis Lyons Particle Physics Oxford U.K. Future of Statistical Theory Hyderabad December 2004.
1 Practical Statistics for Physicists Dresden March 2010 Louis Lyons Imperial College and Oxford CDF experiment at FNAL CMS expt at LHC
1 BAYES and FREQUENTISM: The Return of an Old Controversy Louis Lyons Oxford University SLAC, January 2007.
1 Do’s and Dont’s with L ikelihoods Louis Lyons Oxford CDF CERN, October 2006.
G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.
1 Do’s and Dont’s with L ikelihoods Louis Lyons Oxford CDF Manchester, 16 th November 2005.
1 BAYES and FREQUENTISM: The Return of an Old Controversy Louis Lyons Imperial College and Oxford University Vienna May 2011.
Sensitivity of searches for new signals and its optimization
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
1 Statistical Inference Problems in High Energy Physics and Astronomy Louis Lyons Particle Physics, Oxford BIRS Workshop Banff.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
G. Cowan RHUL Physics Bayesian Higgs combination page 1 Bayesian Higgs combination using shapes ATLAS Statistics Meeting CERN, 19 December, 2007 Glen Cowan.
1 Practical Statistics for Physicists Stockholm Lectures Sept 2008 Louis Lyons Imperial College and Oxford CDF experiment at FNAL CMS expt at LHC
Statistical aspects of Higgs analyses W. Verkerke (NIKHEF)
1 Probability and Statistics  What is probability?  What is statistics?
1 Is there evidence for a peak in this data?. 2 “Observation of an Exotic S=+1 Baryon in Exclusive Photoproduction from the Deuteron” S. Stepanyan et.
Frequentistic approaches in physics: Fisher, Neyman-Pearson and beyond Alessandro Palma Dottorato in Fisica XXII ciclo Corso di Probabilità e Incertezza.
1 Is there evidence for a peak in this data?. 2 “Observation of an Exotic S=+1 Baryon in Exclusive Photoproduction from the Deuteron” S. Stepanyan et.
1 Practical Statistics for Physicists CERN Latin American School March 2015 Louis Lyons Imperial College and Oxford CMS expt at LHC
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #25.
1 Is there evidence for a peak in this data?. 2 “Observation of an Exotic S=+1 Baryon in Exclusive Photoproduction from the Deuteron” S. Stepanyan et.
BAYES and FREQUENTISM: The Return of an Old Controversy 1 Louis Lyons Imperial College and Oxford University CERN Summer Students July 2014.
Experience from Searches at the Tevatron Harrison B. Prosper Florida State University 18 January, 2011 PHYSTAT 2011 CERN.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #24.
G. Cowan Lectures on Statistical Data Analysis Lecture 4 page 1 Lecture 4 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
1 Is there evidence for a peak in this data?. 2 “Observation of an Exotic S=+1 Baryon in Exclusive Photoproduction from the Deuteron” S. Stepanyan et.
G. Cowan, RHUL Physics Statistics for early physics page 1 Statistics jump-start for early physics ATLAS Statistics Forum EVO/Phone, 4 May, 2010 Glen Cowan.
1 χ 2 and Goodness of Fit & L ikelihood for Parameters Louis Lyons Imperial College and Oxford CERN Summer Students July 2014.
In Bayesian theory, a test statistics can be defined by taking the ratio of the Bayes factors for the two hypotheses: The ratio measures the probability.
G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan RHUL Physics Statistical Issues for Higgs Search page 1 Statistical Issues for Higgs Search ATLAS Statistics Forum CERN, 16 April, 2007 Glen Cowan.
1 Practical Statistics for Physicists CERN Summer Students July 2014 Louis Lyons Imperial College and Oxford CMS expt at LHC
1 BAYES and FREQUENTISM: The Return of an Old Controversy Louis Lyons Imperial College and Oxford University Amsterdam Oct 2009.
1 Is there evidence for a peak in this data?. 2 “Observation of an Exotic S=+1 Baryon in Exclusive Photoproduction from the Deuteron” S. Stepanyan et.
BAYES and FREQUENTISM: The Return of an Old Controversy
Status of the Higgs to tau tau
arXiv:physics/ v3 [physics.data-an]
Is there evidence for a peak in this data?
Is there evidence for a peak in this data?
What is Probability? Bayes and Frequentism
χ2 and Goodness of Fit & Likelihood for Parameters
Practical Statistics for Physicists
Statistical Issues for Dark Matter Searches
BAYES and FREQUENTISM: The Return of an Old Controversy
Confidence Intervals and Limits
Likelihoods 1) Introduction 2) Do’s & Dont’s
Statistics for the LHC Lecture 3: Setting limits
p-values and Discovery
Likelihoods 1) Introduction . 2) Do’s & Dont’s
Lecture 4 1 Probability (90 min.)
Likelihood intervals vs. coverage intervals
Lecture 4 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Statistical Methods for the LHC
TAE 2018 Benasque, Spain 3-15 Sept 2018 Glen Cowan Physics Department
Statistical Tests and Limits Lecture 2: Limits
Lecture 4 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Presentation transcript:

Statistical Issues in Searches for New Physics Louis Lyons Imperial College, London and Oxford Hypothesis Testing Warwick, Sept 2016 Version of 12/9/2016

Theme: Using data to make judgements about H1 (New Physics) versus H0 (S.M. with nothing new) Why? HEP is expensive and time-consuming so Worth investing effort in statistical analysis  better information from data Topics: Examples of Particle Physics hypotheses Why we don’t use Bayes for HT Are p-values useful? Why p  L-ratio LEE = Look Elsewhere Effect Why 5σ for discovery? Why Limits? CLs for excluding H1 Background Systematics Coverage p0 v p1 plots Example: Higgs – Discovery, Mass and Spin Conclusions

Examples of Hypotheses 1) Event selector (Event = particle interaction) Events produced at CERN LHC at enormous rate Online ‘trigger’ to select events for recording (~1 kiloHertz) e.g. events with many particles Offline selection based on required features e.g. H0: At least 2 electrons H1: 0 or 1 electron Possible outcomes: Events assigned as H0 or H1 2) Result of experiment e.g. H0 = nothing new H1 = new particle produced as well (Higgs, SUSY, 4th neutrino,…..) Possible outcomes H0 H1  X Exclude H1 X  Discovery   No decision X X ? .

Bayesian methods? Particle Physicists like Frequentism. “Avoid personal beliefs” “Let the data speak for themselves” For parameter ϕ determination, a) Range Δ of prior for ϕ unimportant, provided…… b) Bayes’ upper limit for Poisson rate (constant prior) agrees with Frequentist UL c) Easier to incorporate systematics / nuisance parameters BUT for Hypothesis Testing (e.g. smooth background versus bgd + peak), Δ does not cancel, so posterior odds etc. depend on Δ. Compare: Frequentist local p  global p via Look Elsewhere Effect Effects similar but not same (not surprising): LEE also can allow for selection options, different plots, etc. Bayesian prior also includes signal strength prior (unless δ-function at expected value) Usually we don’t perform prior sensitivity analysis.

pdf’s and p-values (a) (b) H0 H1 t tobs t tobs p1 p0 (c) With 2 hypotheses, each with own pdf, p-values are defined as tail areas, pointing in towards each other H0 H1 tobs t

Are p-values useful? Particle Physicists use p-values for exclusion and for discovery Have come in for strong criticism: People think it is prob(theory|data) p-values over-emphasize evidence (much smaller than L-ratio) Over 50% of results with p0 < 5% are wrong In Particle Physics, we use L-ratio as data statistic for p-values Can regard this as: p-value method, which just happens to use L-ratio as test statistic or This is a L-ratio method with p-values used as calibration

Are p-values useful? Particle Physicists use p-values for exclusion and for discovery Have come in for strong criticism: People think it is prob(theory|data) Stop using relativity, because misunderstood? p-values over-emphasize evidence (much smaller than L-ratio) Is mass or height `better` for sizes of mice and elephants? Over 50% of results with p0 < 5% are wrong Confusing p(A;B) with p(B;A) In Particle Physics, we use L-ratio as data statistic for p-values Can regard this as: p-value method, which just happens to use L-ratio as test statistic or This is a L-ratio method with p-values used as calibration

Why p ≠ Likelihood ratio Measure different things: p0 refers just to H0; L01 compares H0 and H1 Depends on amount of data: e.g. Poisson counting expt little data: For H0, μ0 = 1.0. For H1, μ1 =10.0 Observe n = 10 p0 ~ 10-7 L01 ~10-5 Now with 100 times as much data, μ0 = 100.0 μ1 =1000.0 Observe n = 160 p0 ~ 10-7 L01 ~10+14

Look Elsewhere Effect (LEE) Prob of bgd fluctuation at that place = local p-value Prob of bgd fluctuation ‘anywhere’ = global p-value Global p > Local p Where is `anywhere’? Any location in this histogram in sensible range Any location in this histogram Also in histogram produced with different cuts, binning, etc. Also in other plausible histograms for this analysis Also in other searches in this PHYSICS group (e.g. SUSY at CMS) In any search in this experiment (e.g. CMS) In all CERN expts (e.g. LHC expts + NA62 + OPERA + ASACUSA + ….) In all HEP expts etc. d) relevant for graduate student doing analysis f) relevant for experiment’s Spokesperson INFORMAL CONSENSUS: Quote local p, and global p according to a) above. Explain which global p

Why 5σ for Discovery? Statisticians ridicule our belief in extreme tails (esp. for systematics) Our reasons: 1) Past history (Many 3σ and 4σ effects have gone away) 2) LEE 3) Worries about underestimated systematics 4) Subconscious Bayes calculation p(H1|x) = p(x|H1) * π(H1) p(H0|x) p(x|H0) π(H0) Posterior Likelihood Priors prob ratio “Extraordinary claims require extraordinary evidence” N.B. Points 2), 3) and 4) are experiment-dependent Alternative suggestion: L.L. “Discovering the significance of 5” http://arxiv.org/abs/1310.1284

How many ’s for discovery? SEARCH SURPRISE IMPACT LEE SYSTEMATICS No. σ Higgs search Medium Very high M 5 Single top No Low 3 SUSY Yes Very large 7 Bs oscillations Medium/Low Δm 4 Neutrino osc High sin22ϑ, Δm2 Bs μ μ Low/Medium Pentaquark High/V. high M, decay mode (g-2)μ anom H spin ≠ 0 4th gen q, l, ν M, mode 6 Dark energy Strength Grav Waves Enormous 8 Suggestions to provoke discussion, rather than `delivered on Mt. Sinai’ Bob Cousins: “2 independent expts each with 3.5σ better than one expt with 5σ”

Combining different p-values Several results quote independent p-values for same effect: p1, p2, p3….. e.g. 0.9, 0.001, 0.3 …….. What is combined significance? Not just p1*p2*p3….. If 10 expts each have p ~ 0.5, product ~ 0.001 and is clearly NOT correct combined p S = z * (-ln z)j /j! , z = p1p2p3……. (e.g. For 2 measurements, S = z * (1 - lnz) ≥ z ) Problems: 1) Recipe is not unique (Uniform dist in n-D hypercube  uniform in 1-D) 2) Formula is not associative Combining {{p1 and p2}, and then p3} gives different answer from {{p3 and p2}, and then p1} , or all together Due to different options for “more extreme than x1, x2, x3”. 3) Small p’s due to different discrepancies 4) Correlated systematics ******* Better to combine data ************

WHY LIMITS? Michelson-Morley experiment  death of aether HEP experiments: If UL on expected rate for new particle  expected, exclude particle CERN CLW (Jan 2000) FNAL CLW (March 2000) Heinrich, PHYSTAT-LHC, “Review of Banff Challenge” Production rate Prediction Exptl UL MX

CMS data, excludes X gg with MX<2800 GeV

Ilya Narsky, FNAL CLW 2000

CLs for exclusion CLs = p1/(1-p0)  diagonal line Provides protection against excluding H1 when little or no sensitivity `Conservative Frequentist’ approach Δµ H0 H1 t N.B. p0 = tail towards H1 p1 = tail towards H0

Background systematics

-2lnL ∆ s

Coverage μbest μ * What it is: For given statistical method applied to many sets of data to extract confidence intervals for param µ, coverage C is fraction of ranges that contain true value of param. Can vary with µ * Does not apply to your data: It is a property of the statistical method used It is NOT a probability statement about whether µtrue lies in your confidence range for µ * Coverage plot for Poisson counting expt Observe n counts Estimate µbest from maximum of likelihood µ L(µ) = e-µ µn/n! and range of µ from ln{L(µbest)/L(µ)}  0.5 For each µtrue calculate coverage C(µtrue), and compare with nominal 68% C(µ) Ideal coverage plot 68%

Coverage : ΔlnL intervals for μ P(n,μ) = e-μμn/n! (Joel Heinrich CDF note 6438) -2 lnλ< 1 λ = p(n,μ)/p(n,μbest) Coverage Discontinuities because data n are discrete UNDERCOVERS because ΔlnL intervals not equiv to Neyman constructiom µtrue

p0 v p1 plots Preprint by Luc Demortier and LL, “Testing Hypotheses in Particle Physics: Plots of p0 versus p1” For hypotheses H0 and H1, p0 and p1 are the tail probabilities for data statistic t Provide insights on: CLs for exclusion Punzi definition of sensitivity Relation of p-values and Likelihoods Probability of misleading evidence Jeffries-Lindley paradox

Search for Higgs: H  : low S/B, high statistics

HZ Z  4 l: high S/B, low statistics

p-value for ‘No Higgs’ versus mH

Mass of Higgs: Likelihood versus mass

Comparing 0+ versus 0- for Higgs (like Neutrino Mass Hierarchy) http://cms.web.cern.ch/news/highlights-cms-results-presented-hcp

Come and help us Conclusions Many interesting open statistical issues Some problems similar to those in other fields e.g. Astrophysics Help from Statisticians welcome. We would rather use your circular “wheels” than our own square ones. Come and help us