Statistical Analysis Tools for Particle Physics

Slides:



Advertisements
Similar presentations
Computing and Statistical Data Analysis / Stat 4
Advertisements

G. Cowan RHUL Physics Profile likelihood for systematic uncertainties page 1 Use of profile likelihood to determine systematic uncertainties ATLAS Top.
8. Statistical tests 8.1 Hypotheses K. Desch – Statistical methods of data analysis SS10 Frequent problem: Decision making based on statistical information.
Optimization of Signal Significance by Bagging Decision Trees Ilya Narsky, Caltech presented by Harrison Prosper.
G. Cowan 2007 CERN Summer Student Lectures on Statistics1 Introduction to Statistics − Day 3 Lecture 1 Probability Random variables, probability densities,
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 6 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
G. Cowan 2011 CERN Summer Student Lectures on Statistics / Lecture 41 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability.
G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 1 Statistical Methods for Particle Physics CERN-FNAL Hadron.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
G. Cowan SUSSP65, St Andrews, August 2009 / Statistical Methods 3 page 1 Statistical Methods in Particle Physics Lecture 3: Multivariate Methods.
G. Cowan Lectures on Statistical Data Analysis Lecture 7 page 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan 2009 CERN Summer Student Lectures on Statistics1 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability densities,
G. Cowan Statistical Methods in Particle Physics1 Statistical Methods in Particle Physics Day 3: Multivariate Methods (II) 清华大学高能物理研究中心 2010 年 4 月 12—16.
B-tagging Performance based on Boosted Decision Trees Hai-Jun Yang University of Michigan (with X. Li and B. Zhou) ATLAS B-tagging Meeting February 9,
G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 11 Statistics for the LHC Lecture 1: Introduction Academic Training Lectures CERN,
Statistics In HEP Helge VossHadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP 1 How do we understand/interpret our measurements.
G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 21 Topics in Statistical Data Analysis for HEP Lecture 2: Statistical Tests CERN.
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
1 Glen Cowan Multivariate Statistical Methods in Particle Physics Machine Learning and Multivariate Statistical Methods in Particle Physics Glen Cowan.
G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1 Lecture 2 1 Probability Definition, Bayes’ theorem, probability densities and their properties,
G. Cowan 2011 CERN Summer Student Lectures on Statistics / Lecture 31 Introduction to Statistics − Day 3 Lecture 1 Probability Random variables, probability.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
G. Cowan S0S 2010 / Statistical Tests and Limits 1 Statistical Tests and Limits Lecture 1: general formalism IN2P3 School of Statistics Autrans, France.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
G. Cowan Lectures on Statistical Data Analysis Lecture 8 page 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem 2Random variables and.
1 Introduction to Statistics − Day 3 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
G. Cowan TAE 2010 / Statistics for HEP / Lecture 11 Statistical Methods for HEP Lecture 1: Introduction Taller de Altas Energías 2010 Universitat de Barcelona.
G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools 1 Statistical Analysis Tools for Particle Physics IDPASC.
Analysis of H  WW  l l Based on Boosted Decision Trees Hai-Jun Yang University of Michigan (with T.S. Dai, X.F. Li, B. Zhou) ATLAS Higgs Meeting September.
1 Introduction to Statistics − Day 2 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 1 Statistical Data Analysis: Lecture 6 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1 Statistical Data Analysis: Lecture 5 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 21 Statistical Methods for Particle Physics Lecture 2: statistical tests, multivariate.
G. Cowan Statistical methods for HEP / Freiburg June 2011 / Lecture 1 1 Statistical Methods for Discovery and Limits in HEP Experiments Day 1: Introduction,
G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 21 Statistics for the LHC Lecture 2: Discovery Academic Training Lectures CERN,
G. Cowan CERN Academic Training 2012 / Statistics for HEP / Lecture 21 Statistics for HEP Lecture 2: Discovery and Limits Academic Training Lectures CERN,
Helge VossAdvanced Scientific Computing Workshop ETH Multivariate Methods of data analysis Helge Voss Advanced Scientific Computing Workshop ETH.
Statistics for HEP Lecture 1: Introduction and basic formalism
Discussion on significance
Lecture 1.31 Criteria for optimal reception of radio signals.
Statistics for HEP Lecture 1: Introduction and basic formalism
Multivariate Methods of
iSTEP 2016 Tsinghua University, Beijing July 10-20, 2016
Tutorial on Statistics TRISEP School 27, 28 June 2016 Glen Cowan
Statistics for the LHC Lecture 3: Setting limits
LAL Orsay, 2016 / Lectures on Statistics
Comment on Event Quality Variables for Multivariate Analyses
Recent progress in multivariate methods for particle physics
Tutorial on Multivariate Methods (TMVA)
ISTEP 2016 Final Project— Project on
School on Data Science in (Astro)particle Physics
TAE 2017 / Statistics Lecture 1
Discrete Event Simulation - 4
Statistical Methods for the LHC Discovering New Physics
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Statistical Methods for Particle Physics (I)
Lectures on Statistics TRISEP School 27, 28 June 2016 Glen Cowan
TAE 2018 / Statistics Lecture 3
Computing and Statistical Data Analysis Stat 5: Multivariate Methods
TAE 2018 Benasque, Spain 3-15 Sept 2018 Glen Cowan Physics Department
Computing and Statistical Data Analysis / Stat 6
Some Prospects for Statistical Data Analysis in High Energy Physics
TRISEP 2016 / Statistics Lecture 2
Statistical Methods for HEP Lecture 3: Discovery and Limits
SUSSP65, St Andrews, August 2009 / Statistical Methods 3
Statistics for HEP Lecture 1: Introduction and basic formalism
Parametric Methods Berlin Chen, 2005 References:
Introduction to Statistics − Day 4
Machine Learning and Multivariate Statistical Methods in Particle Physics Glen Cowan RHUL Physics RHUL Computer Science Seminar.
Presentation transcript:

Statistical Analysis Tools for Particle Physics IDPASC School of Flavour Physics Valencia, 2-7 May, 2013 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAA

Outline Multivariate methods for HEP Event selection as a statistical test Neyman-Pearson lemma and likelihood ratio test Some multivariate classifiers: Linear Neural networks Boosted Decision Trees Support Vector Machines Use of multivariate methods in a search (brief overview) G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

2nd ed., 2009 G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

A simulated SUSY event in ATLAS high pT jets of hadrons high pT muons p p missing transverse energy G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

Background events This event from Standard Model ttbar production also has high pT jets and muons, and some missing transverse energy. → can easily mimic a SUSY event. G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

Event selection as a statistical test For each event we measure a set of numbers: x1 = jet pT x2 = missing energy x3 = particle i.d. measure, ... follows some n-dimensional joint probability density, which depends on the type of event produced, i.e., was it E.g. hypotheses H0, H1, ... Often simply “signal” (s), “background” (b) G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

Finding an optimal decision boundary H0 In particle physics usually start by making simple “cuts”: xi < ci xj < cj H1 Maybe later try some other type of decision boundary: H0 H0 H1 H1 G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

α R1 G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

Constructing a test statistic The Neyman-Pearson lemma states: to obtain the highest signal efficiency for a given background efficiency (highest power for a test of a given size), choose the critical region for a test of the background hypothesis (acceptance region for signal) such that where c is a constant determined by the background efficiency. Equivalently, the optimal test statistic is given by the likelihood ratio: N.B. any monotonic function of this is just as good. G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

2 G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

Overtraining If decision boundary is too flexible it will conform too closely to the training points → overtraining. Monitor by applying classifier to independent validation sample. training sample independent validation sample G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

Choose classifier that minimizes error function for validation sample. G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

Neural network example from LEP II Signal: e+e- → W+W- (often 4 well separated hadron jets) Background: e+e- → qqgg (4 less well separated hadron jets) ← input variables based on jet structure, event shape, ... none by itself gives much separation. Neural network output: (Garrido, Juste and Martinez, ALEPH 96-144) G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

Kernel-based PDE (KDE, Parzen window) Consider d dimensions, N training events, x1, ..., xN, estimate f (x) with bandwidth (smoothing parameter) kernel Use e.g. Gaussian kernel: Need to sum N terms to evaluate function (slow); faster algorithms only count events in vicinity of x (k-nearest neighbor, range search). G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

Particle i.d. in MiniBooNE Detector is a 12-m diameter tank of mineral oil exposed to a beam of neutrinos and viewed by 1520 photomultiplier tubes: Search for nm to ne oscillations required particle i.d. using information from the PMTs. H.J. Yang, MiniBooNE PID, DNP06 G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

Decision trees Out of all the input variables, find the one for which with a single cut gives best improvement in signal purity: where wi. is the weight of the ith event. Resulting nodes classified as either signal/background. Iterate until stop criterion reached based on e.g. purity or minimum number of events in a node. The set of cuts defines the decision boundary. Example by MiniBooNE experiment, B. Roe et al., NIM 543 (2005) 577 G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

Finding the best single cut The level of separation within a node can, e.g., be quantified by the Gini coefficient, calculated from the (s or b) purity as: For a cut that splits a set of events a into subsets b and c, one can quantify the improvement in separation by the change in weighted Gini coefficients: where, e.g., Choose e.g. the cut to the maximize D; a variant of this scheme can use instead of Gini e.g. the misclassification rate: G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

Decision trees (2) The terminal nodes (leaves) are classified a signal or background depending on majority vote (or e.g. signal fraction greater than a specified threshold). This classifies every point in input-variable space as either signal or background, a decision tree classifier, with discriminant function f(x) = 1 if x in signal region, -1 otherwise Decision trees tend to be very sensitive to statistical fluctuations in the training sample. Methods such as boosting can be used to stabilize the tree. G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

1 1 G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

< G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

Monitoring overtraining From MiniBooNE example: Performance stable after a few hundred trees. G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

Comparison of boosting algorithms ,B. Roe et al., NIM 543 (2005) 577 Comparison of boosting algorithms A number of boosting algorithms on the market; differ in the update rule for the weights. G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

earlier G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

Support Vector Machines Map input variables into high dimensional feature space: x → f Maximize distance between separating hyperplanes (margin) subject to constraints allowing for some misclassification. Final classifier only depends on scalar products of f(x): So only need kernel Bishop ch 7 G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

margin margin. G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

Summary on multivariate methods Particle physics has used several multivariate methods for many years: linear (Fisher) discriminant neural networks naive Bayes and has in recent years started to use a few more k-nearest neighbour boosted decision trees support vector machines The emphasis is often on controlling systematic uncertainties between the modeled training data and Nature to avoid false discovery. Although many classifier outputs are "black boxes", a discovery at 5s significance with a sophisticated (opaque) method will win the competition if backed up by, say, 4s evidence from a cut-based method. G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

Quotes I like “Keep it simple. As simple as possible. Not any simpler.” – A. Einstein “If you believe in something you don't understand, you suffer,...” – Stevie Wonder G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

Extra slides G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

Frequentist statistical tests – quick review Consider a hypothesis H0 and alternative H1, which each specify the probability for data x., i.e., P(x|H0), P(x|H1). A test of H0 is defined by specifying a critical region w of the data space such that there is no more than some (small) probability a, assuming H0 is correct, to observe the data there, i.e., P(x  w | H0 ) ≤ a Need inequality if data are discrete. α is called the size or significance level of the test. If x is observed in the critical region, reject H0. data space Ω critical region w G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

Role of the alternative hypothesis But in general there are an infinite number of possible critical regions that give the same significance level a. So the choice of the critical region for a test of H0 needs to take into account the alternative hypothesis H1. Roughly speaking, place the critical region where there is a low probability to be found if H0 is true, but high if H1 is true: G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

p-values for a set of Suppose hypothesis H predicts pdf observations We observe a single point in this space: What can we say about the validity of H in light of the data? Express level of compatibility by giving the p-value for H: p = probability, under assumption of H, to observe data with equal or lesser compatibility with H relative to the data we got. This is not the probability that H is true! Requires one to say what part of data space constitutes lesser compatibility with H than the observed data (implicitly this means that region gives better agreement with some alternative). G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

Using a p-value to define test of H0 One can show the distribution of the p-value of H, under assumption of H, is uniform in [0,1]. So the probability to find the p-value of H0, p0, less than a is We can define the critical region of a test of H0 with size a as the set of data space where p0 ≤ a. Formally the p-value relates only to H0, but the resulting test will have a given power with respect to a given alternative H1. G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

Significance from p-value Often define significance Z as the number of standard deviations that a Gaussian variable would fluctuate in one direction to give the same p-value. 1 - TMath::Freq TMath::NormQuantile E.g. Z = 5 (a “5 sigma effect”) corresponds to p = 2.9 × 10-7. G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

Two distinct event selection problems In some cases, the event types in question are both known to exist. Example: separation of different particle types (electron vs muon) Use the selected sample for further study. In other cases, the null hypothesis H0 means "Standard Model" events, and the alternative H1 means "events of a type whose existence is not yet established" (to do so is the goal of the analysis). Many subtle issues here, mainly related to the heavy burden of proof required to establish presence of a new phenomenon. Typically require p-value of background-only hypothesis below ~ 10-7 (a 5 sigma effect) to claim discovery of "New Physics". G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

Using classifier output for discovery signal search region y f(y) y N(y) background background excess? ycut Normalized to unity Normalized to expected number of events Discovery = number of events found in search region incompatible with background-only hypothesis. p-value of background-only hypothesis can depend crucially distribution f(y|b) in the "search region". G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

Prototype search Suppose # of events found in search region: strength parameter expected signal expected background For each event measure x: (classifier output or kin. var.) where each x follows Likelihood function is: G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

Test of signal strength μ Can test a value of μ using: , E.g. to test H0 : μ = 0 versus H1 : μ > 0 use I.e., large q0 means increasing disagreement between data and hypothesis of μ = 0. In this case, we are assuming alternative to background-only hypothesis is a positive signal strength. G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

p-value for discovery Large q0 means increasing incompatibility between the data and hypothesis, therefore p-value for an observed q0,obs is For large sample related to chi-square distribution. From p-value get equivalent significance, G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

Cumulative distribution of q0, significance Cowan, Cranmer, Gross, Vitells, arXiv:1007.1727, EPJC 71 (2011) 1554 Cumulative distribution of q0, significance From the pdf, the cumulative distribution of q0 is found to be The special case μ′ = 0 is The p-value of the μ = 0 hypothesis is Therefore the discovery significance Z is simply G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

Alternative likelihood ratio for full experiment We can define a test statistic Q monotonic in the likelihood ratio as To compute p-values for the b and s+b hypotheses given an observed value of Q we need the distributions f(Q|b) and f(Q|s+b). Note that the term –s in front is a constant and can be dropped. The rest is a sum of contributions for each event, and each term in the sum has the same distribution. Can exploit this to relate distribution of Q to that of single event terms using (Fast) Fourier Transforms (Hu and Nielsen, physics/9906010). G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools

Distribution of Q Suppose in real experiment Q is observed here. Take e.g. b = 100, s = 20. f (Q|b) f (Q|s+b) p-value of s+b p-value of b only If ps+b < α, reject signal model s at confidence level 1 – α. If pb < 2.9 × 10-7, reject background-only model (signif. Z = 5). G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools