Presentation is loading. Please wait.

Presentation is loading. Please wait.

G. Cowan RHUL Physics Statistical Issues for Higgs Search page 1 Statistical Issues for Higgs Search ATLAS Statistics Forum CERN, 16 April, 2007 Glen Cowan.

Similar presentations


Presentation on theme: "G. Cowan RHUL Physics Statistical Issues for Higgs Search page 1 Statistical Issues for Higgs Search ATLAS Statistics Forum CERN, 16 April, 2007 Glen Cowan."— Presentation transcript:

1 G. Cowan RHUL Physics Statistical Issues for Higgs Search page 1 Statistical Issues for Higgs Search ATLAS Statistics Forum CERN, 16 April, 2007 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan

2 G. Cowan RHUL Physics Statistical Issues for Higgs Search page 2 Outline 1 General framework 2 Histogram based analysis 3 “LEP-style” analysis 4 Fit method 5 Systematic uncertainties 6 Thoughts on Feldman-Cousins limits

3 G. Cowan RHUL Physics Statistical Issues for Higgs Search page 3 Initial thoughts PHYSTAT papers, LEP, FNAL, ATLAS notes etc. already contain a lot of well worked out material on statistics for LHC searches. Much of the draft note I posted just summarizes well-known things (→ skim quickly). But many areas still not completely clear (to me) and important choices remain to be made.

4 G. Cowan RHUL Physics Statistical Issues for Higgs Search page 4 General framework Assume N channels, data from each are sets of numbers: Joint set of all data: x Joint pdf for full experiment: (if all channels statistically independent). is a set of parameters m = m H is the parameter of interest, are nuisance parameters.

5 G. Cowan RHUL Physics Statistical Issues for Higgs Search page 5 Test of hypothesized mass m The likelihood function is: Define likelihood ratio: Can use this to construct a test of the hypothesized value m (and then do this for all m). Take critical region for test (region with low compatibility with the hypothesis) to correspond to low values of l(m). Set size of critical region such that probability for data to be there under null hypothesis =  (significance level of test). If data fall in critical region, reject the hypothesis m.

6 G. Cowan RHUL Physics Statistical Issues for Higgs Search page 6 Confidence interval from test Now carry out the test for all m. The set of values not rejected at significance  is a confidence interval at confidence level  Often e.g. from a lower limit m lo to ∞.

7 G. Cowan RHUL Physics Statistical Issues for Higgs Search page 7 Discovery, p-values To discover the Higgs, try to reject the background-only (null) hypothesis (H 0 ). Define a statistic t whose value reflects compatibility of data with H 0. p-value = Prob(data with ≤ compatibility with H 0 when compared to the data we got | H 0 ) For example, if high values of t mean less compatibility, If p-value comes out small, then this is evidence against the background-only hypothesis → discovery made!

8 G. Cowan RHUL Physics Statistical Issues for Higgs Search page 8 Significance from p-value Define significance Z as the number of standard deviations that a Gaussian variable would fluctuate in one direction to give the same p-value. TMath::Prob TMath::NormQuantile

9 G. Cowan RHUL Physics Statistical Issues for Higgs Search page 9 When to publish HEP folklore is to claim discovery when p = 2.85  10 -7, corresponding to a significance Z = 5. This is very subjective and really should depend on the prior probability of the phenomenon in question, e.g., phenomenon reasonable p-value for discovery D 0 D 0 mixing~0.05 Higgs~ 10 -7 (?) Life on Mars~10  Astrology   Note some groups have defined 5  to refer to a two-sided fluctuation, i.e., p = 5.7  10 -7

10 G. Cowan RHUL Physics Statistical Issues for Higgs Search page 10 Likelihood ratio as test statistic Take as test statistic: Sampling distribution for q(m) depends on hypothesized mass. We need e.g. for and (signal plus background) Assume for now that these pdfs can be determined with MC and clever tricks.

11 G. Cowan RHUL Physics Statistical Issues for Higgs Search page 11 Histogram-based analysis Unlike LEP expect lots of background, so put data in histogram:

12 G. Cowan RHUL Physics Statistical Issues for Higgs Search page 12 Histogram-based analysis (2) Assume n i ~ Poisson (s i + b i ), so the likelihood is or the log-likelihood (up to a constant), For N independent channels this becomes

13 G. Cowan RHUL Physics Statistical Issues for Higgs Search page 13 Histogram-based analysis (3) From the likelihood construct as before This is used to construct tests and intervals as before.

14 G. Cowan RHUL Physics Statistical Issues for Higgs Search page 14 LEP-style analysis: CL b Same basic idea: L(m) → l(m) → q(m) → test of m, etc. For a chosen m, find p-value of background-only hypothesis:

15 G. Cowan RHUL Physics Statistical Issues for Higgs Search page 15 LEP-style analysis: CL s+b ‘Normal’ way to get interval would be to reject hypothesized m if By construction this interval will cover the true value of m with probability 1 .

16 G. Cowan RHUL Physics Statistical Issues for Higgs Search page 16 LEP-style analysis: CL s The problem with the CL s+b method is that for high m, the distribution of q approaches that of the background-only hypothesis: So a low fluctuation in the number of background events can give CL s+b <  This rejects a high m value even though we are not sensitive to Higgs production with that mass; the reason was a low fluctuation in the background.

17 G. Cowan RHUL Physics Statistical Issues for Higgs Search page 17 CL s A solution is to define: and reject the hypothesized m if: So the CL s intervals ‘over-cover’; they are conservative. This method avoids the unwanted exclusion of high masses, but it is not obvious to me that there is not a better way, i.e., intervals that have correct (or close) coverage but are on average more stringent. I want to think about this more.

18 G. Cowan RHUL Physics Statistical Issues for Higgs Search page 18 Fit method Treat m and s as independent parameters (not related à la SM). Maximize L: Now consider background-only hypothesis, i.e., s = 0 (m doesn’t enter): Define test statistic and find its pdf. Use this to get p-values, limits (regions in m, s plane) as before.

19 G. Cowan RHUL Physics Statistical Issues for Higgs Search page 19 Systematics Response of measurement apparatus is never modelled perfectly: x (true value) y (measured value) model: truth: Model can be made to approximate better the truth by including more free parameters. systematic uncertainty ↔ nuisance parameters

20 G. Cowan RHUL Physics Statistical Issues for Higgs Search page 20 Nuisance parameters Techniques for treating nuisance parameters discussed at recent PHYSTAT meetings (Cranmer, Cousins, Reid,...) Here consider two methods: Profile likelihood Modified profile likelihood (~ Cousins-Highland)

21 G. Cowan RHUL Physics Statistical Issues for Higgs Search page 21 Profile likelihood Suppose the likelihood contains a parameter of interest, m, and some number of nuisance parameters. Define the profile likelihood as: Using this construct: and construct p-values, intervals, etc. as before. See e.g. 2003 and 2005 PHYSTAT papers by Kyle Cranmer.

22 G. Cowan RHUL Physics Statistical Issues for Higgs Search page 22 Modified profile likelihood Treat as random in Bayesian sense, i.e. having a prior: (e.g. based on other measurements) Define modified profile likelihood: Use this to find (modified profile) likelihood ratio, determine tests, p-values, intervals, etc. Equivalent to having Nature repeat the experiment by resampling each time from  ( ), and is essentially (I believe) the ‘prior predictive ensemble’ approach used by CDF.

23 G. Cowan RHUL Physics Statistical Issues for Higgs Search page 23 Modified profile likelihood (2) This approach effectively averages over p-values, which is essentially the Cousins-Highland method. Kyle Cranmer has pointed out that the intervals derived from this approach undercover, i.e., one would need more data to exclude the background-only hypothesis that otherwise needed. This issue needs to be understood in detail.

24 G. Cowan RHUL Physics Statistical Issues for Higgs Search page 24 Extra slides

25 G. Cowan RHUL Physics Statistical Issues for Higgs Search page 25 Likelihood ratio limits (Feldman-Cousins) Define likelihood ratio for hypothesized parameter value s: Here is the ML estimator, note Critical region defined by low values of likelihood ratio. Resulting intervals can be one- or two-sided (depending on n). (Re)discovered for HEP by Feldman and Cousins, Phys. Rev. D 57 (1998) 3873.

26 G. Cowan RHUL Physics Statistical Issues for Higgs Search page 26 More on intervals from LR test (Feldman-Cousins) Caveat with coverage: suppose we find n >> b. Usually one then quotes a measurement: If, however, n isn’t large enough to claim discovery, one sets a limit on s. FC pointed out that if this decision is made based on n, then the actual coverage probability of the interval can be less than the stated confidence level (‘flip-flopping’). FC intervals remove this, providing a smooth transition from 1- to 2-sided intervals, depending on n. But, suppose FC gives e.g. 0.1 < s < 5 at 90% CL, p-value of s=0 still substantial. Part of upper-limit ‘wasted’?

27 G. Cowan RHUL Physics Statistical Issues for Higgs Search page 27 Properties of upper limits Upper limit s up vs. nMean upper limit vs. s Example: take b = 5.0, 1 -  = 0.95

28 G. Cowan RHUL Physics Statistical Issues for Higgs Search page 28 Upper limit versus b b If n = 0 observed, should upper limit depend on b? Classical: yes Bayesian: no FC: yes Feldman & Cousins, PRD 57 (1998) 3873

29 G. Cowan RHUL Physics Statistical Issues for Higgs Search page 29 Coverage probability of confidence intervals Because of discreteness of Poisson data, probability for interval to include true value in general > confidence level (‘over-coverage’)

30 G. Cowan RHUL Physics Statistical Issues for Higgs Search page 30 Discussion on limits Different sorts of limits answer different questions. A frequentist confidence interval does not (necessarily) answer, “What do we believe the parameter’s value is?” Coverage — nice, but crucial? Look at sensitivity, e.g., E[s up | s = 0]. Consider also: politics, need for consensus/conventions; convenience and ability to combine results,... For any result, consumer will compute (mentally or otherwise): Need likelihood (or summary thereof). consumer’s prior

31 G. Cowan RHUL Physics Statistical Issues for Higgs Search page 31 Cousins-Highland method Regard b as ‘random’, characterized by pdf  (b). Makes sense in Bayesian approach, but in frequentist model b is constant (although unknown). A measurement b meas is random but this is not the mean number of background events, rather, b is. Compute anyway This would be the probability for n if Nature were to generate a new value of b upon repetition of the experiment with  b (b). Now e.g. use this P(n;s) in the classical recipe for upper limit at CL = 1  : Result has hybrid Bayesian/frequentist character.

32 G. Cowan RHUL Physics Statistical Issues for Higgs Search page 32 ‘Integrated likelihoods’ Consider again signal s and background b, suppose we have uncertainty in b characterized by a prior pdf  b (b). Define integrated likelihood as also called modified profile likelihood, in any case not a real likelihood. Now use this to construct likelihood ratio test and invert to obtain confidence intervals. Feldman-Cousins & Cousins-Highland (FHC 2 ), see e.g. J. Conrad et al., Phys. Rev. D67 (2003) 012002 and Conrad/Tegenfeldt PHYSTAT05 talk. Calculators available (Conrad, Tegenfeldt, Barlow).

33 G. Cowan RHUL Physics Statistical Issues for Higgs Search page 33 Interval from inverting profile LR test Suppose we have a measurement b meas of b. Build the likelihood ratio test with profile likelihood: and use this to construct confidence intervals. See PHYSTAT05 talks by Cranmer, Feldman, Cousins, Reid.


Download ppt "G. Cowan RHUL Physics Statistical Issues for Higgs Search page 1 Statistical Issues for Higgs Search ATLAS Statistics Forum CERN, 16 April, 2007 Glen Cowan."

Similar presentations


Ads by Google