Presentation is loading. Please wait.

Presentation is loading. Please wait.

Program for evaluation of the significance, confidence intervals and limits by direct probabilities calculations S.Bityukov (IHEP,Protvino), S.Erofeeva(MSA.

Similar presentations


Presentation on theme: "Program for evaluation of the significance, confidence intervals and limits by direct probabilities calculations S.Bityukov (IHEP,Protvino), S.Erofeeva(MSA."— Presentation transcript:

1 Program for evaluation of the significance, confidence intervals and limits by direct probabilities calculations S.Bityukov (IHEP,Protvino), S.Erofeeva(MSA IECS,Moscow), N.Krasnikov(INR RAS, Moscow), A.Nikitenko(IC, London) September, 2005 PhyStat 2005 Oxford, UK S.Bityukov

2 During planning or processing of experiment we often consider a statistical hypothesis H 0 : new physics is present in Nature against hypothesis H 1 : new physics is absent in Nature. The value of uncertainty in our conclusion is defined by the probabilities  = P(reject H 0 | H 0 is true) - Type I error and b = P(accept H 0 | H 0 is false) - Type II error There are many definitions of significance as a measure of excess of signal events above background. Many approaches exist also to methods of construction of intervals and limits: confidence, tolerant, fiducial and so on. During one of the CMS meetings Gunter Quast formulated the problem of practicians “the only remaining problem: make a choice … chosen method should be “as simple as possible, but not wrong!” Introduction Introduction

3 September, 2005 PhyStat 2005 Oxford, UK S.Bityukov 1.Gaussian limit gives the wrong answer for low value of β (tail of Poisson distribution is heavier than tail of Gaussian) 2. The statistics like S L (a likelihood-ratio-based test statistic) have poor statistical properties as estimator of significance (S L = √2(ln L1-ln L2)= √2(ln Q), where Q is the ratio of binned/unbinned likelihood fits for hypotheses H 0 and H 1 H 0 : signal present and H 1 : no signal present) The simplest significance is the significance S_cP described at the next slide. The S_cP is quite natural significance, which allows to take into account any uncertainties by direct calculation of probabilities. Motivation of the work (significance) Motivation of the work (significance)

4 Definition of the significance S_cP Definition of the significance S_cP September, 2005 PhyStat 2005 Oxford, UK S.Bityukov S_cP - probability from Poisson distribution with mean μ_b to observe equal or greater than Nobs events, converted to equivalent number of sigmas of a Gaussian distribution (see report (page 8) by G. Quast in CMS Physics analysis days, May 9-12, 2005, CERN http://cmsdoc.cern.ch/~bityukov/talks/talks.html also, see I.Narsky, NIM A450(2000)444). The presented program ScP allows to calculate this significance with taking into account experimental systematics with statistical properties (Gaussian approximation) and theoretical systematics without any statistical properties. Also, the program calculates (if option is on) the combining significance of several channels. As is assumed all channels are independent.

5 September, 2005 PhyStat 2005 Oxford, UK S.Bityukov Conception Conception Conception. The probability of making a Type II error (β) in hypotheses test about presence of signal in experiment (H0) is used for determination number of sigmas (of background distribution) between expected background and observed number of events Nobs (formula 8 in CMS CR 2002/05 ).CMS CR 2002/05 This probability is used for determination of signal significance, i.e. the significance S_cP will be found under resolving of equations, where It can be used in combining of results. Let us consider two possible approaches.

6 Combining of observed results: Combining of observed results: Approach 1 September, 2005 PhyStat 2005 Oxford, UK S.Bityukov Approach 1. Suppose that observed value is greater than expected background Let β_1 be Type II error for channel 1 (event A = background  Nobs_1 with P(A)= β_1) and β_2 be Type II error for channel 2 (event B = background  Nobs_2 with P(B)= β_2). Because event A is independent from event B then probability of simultaneous appearance of A and B equals β_12 = P(AB) = P(A)*P(B) = β_1* β_2. After determination of β_12 one can calculate the S_cP.

7 Combining of expected signals & backgrounds: Combining of expected signals & backgrounds: Approach 2 September, 2005 PhyStat 2005 Oxford, UK S.Bityukov Approach 2. Suppose that Nobs is expected sum of expected signal (μ_s) and expected background (μ_b), i.e. Nobs = μ_s+μ_b (the case of planned experiment). Then the sums of expected numbers of signal (μ_s_i ) and background (μ_b_i ) events in each channel are used as summary μ_s and μ_b for calculation of combined significance. Note that we take into account in this case as fluctuation of expected background and fluctuation of expected signal. After determination of corresponding β one can calculate the S_cP.

8 Uncertainties Uncertainties September, 2005 PhyStat 2005 Oxford, UK S.Bityukov The program takes into account two types of uncertainties: a)experimental systematics with statistical properties (we assume that this systematics has Gaussian distribution with known variance σ_b**2 in according with formula μ_b = expected background + N(0,σ _b)). Appr.2: In the case of the combining of channels the summary variance σ_b**2 is the sum of partial variance σ_b_i**2. b) theoretical systematics (δ_b) without any statistical properties (we assume: the worst case takes place when the background is maximal, i.e. μ_b*(1+ δ_b), but we take the signal plus the background as Nobs; more information can be found in S.Bityukov, N.Krasnikov, CMS CR 2002/05 orS.Bityukov, N.Krasnikov, CMS CR 2002/05 S.Bityukov, N.Krasnikov, Mod.Phys.Lett.A 13 (1998)3235) Appr.2: The combining δ_b is the sum of partial δ_b_i.

9 Main input and output parameters Main input and output parameters September, 2005 PhyStat 2005 Oxford, UK S.Bityukov Main input parameters: 1. expected background – μ_b 2. signal = observed value (Nobs) - expected background (μ_b) – μ_s 3. experimental uncertainty (r.m.s.) of background with statistical properties – σ_b 4. systematics of theoretical origin in background – δ_b Output parameters: 1. significance S_cP, calculated by formula - dsgnf 2. significance S_cP_MC, calculated by Monte Carlo - dsgnfm

10 Auxiliary input parameters Auxiliary input parameters September, 2005 PhyStat 2005 Oxford, UK S.Bityukov 1. switch for choosing of type calculations - iflag iflag = 1 calculations by formula (quick calculations) iflag = 2 Monte Carlo calculations iflag = 12 calculations by formula and by Monte Carlo 2. number of channels for calculations - nchan (from 1 up to 10) 3. number of channels for combined S_cP - ncombi (from 1 up to nchan) 4. parameter for Monte Carlo calculation - over over - parameter for Monte Carlo calculations. It is a number of Monte Carlo trials which will give value of number events over or equal Nobs. This parameter (and internal value dbeta) determines the number of trials for given μ_s, μ_b, σ_b and δ_b in routine SCPMC.

11 The structure of program The structure of program September, 2005 PhyStat 2005 Oxford, UK S.Bityukov Language: Fortran 77 iflag Three different types 1. SCPFOR - calculations by formula of calculations: 2. SCPMC - Monte Carlo calculations 12. SCPFOR + SCPMC Main program processes the user requirements (defined in operators DATA) and calls routines SCPFOR and/or SCPMC.

12 Problem and approximation Problem and approximation September, 2005 PhyStat 2005 Oxford, UK S.Bityukov The problem which takes place during calculations is the restricted range of applicability of standard procedure DGAUSN in CERNLIB. For values of S_cP>6.2-7 the procedure gives non correct result. In this case we use as a good approximation the significance (MPL A13 (1998)3235) S_c12 = 2 (  (μ_s+μ_b) -  μ_b). The account of the uncertainties is very simple: theoretical systematics (δ_b) S_c12t = 2 (  (μ_s+μ_b) -  (μ_b(1+δ_b)). experimental systematics ( σ_b**2 )  μ_b S_c12e = 2 (  (μ_s+μ_b) -  μ_b) -------------------.  (μ_b+σ_b**2)

13 Simplest example of program S_cP output Simplest example of program S_cP output September, 2005 PhyStat 2005 Oxford, UK S.Bityukov Example of G.Quast: bkg=2 sig=5.4. Here S 1 =3.8 S 12 =2.6 S L =2.7 Significance S_cP and/or S_cP_MC: NN of channels = 1, Combining channels from 1 up to 1 calculation type = 12 types: (1) S_cP by formula and (2) S_cP_MC by Monte Carlo σ_b-experimental uncertainty, i.e. μ_b = background + N(0, σ_b) δ_b - systematics of theoretical origin without statistical properties #ch backgr. signal σ_b δ_b S_cP S_cP_MC S_c12 1 2.00 5.40 0.0000 0.0000 2.6095 2.5923 2.612 1.4142 0.0000 1.8581 1.8759 1.847 1.4142.50000E-01 1.8373 1.8677 1.811

14 Program output with combining of channels Program output with combining of channels September, 2005 PhyStat 2005 Oxford, UK S.Bityukov NN of channels = 2, Combining channels from 1 up to 2 #ch backgr. signal σ_b δ_b S_cP S_cP_MC S_c12 1 1.00 5.00 0.0000 0.0000 3.2417 3.2363 2.899 1.0000 0.0000 2.2798 2.3116 2.050 1.0000.50000E-01 2.2547 2.3177 1.990 2 5.00 1.00 0.0000 0.0000.29489.30434.4248 2.2361 0.0000.24597.25775.3018 2.2361.25000.17006.17654.2210 COMBINING of OBSERVED RESULTS Combined channels(1-2) without errors 3.5051 3.5026 Combined channels(1-2) with stat. errors 2.6078 2.6402 Combined channels(1-2) both types of err. 2.5607 2.6198 COMBINING for EXPECTED SIGNAL and BACKGROUND Sum 6.00 6.00 0.000 0.000 2.052 2.045 2.029 2.449 0.000 1.486 1.500 1.435 2.449 0.300 1.406 1.398 1.333

15 Range of applicability of the program ScP Range of applicability of the program ScP September, 2005 PhyStat 2005 Oxford, UK S.Bityukov NN of channels = 3, Combining channels from 1 up to 1 calculation type = 12 #ch backgr. signal σ_b δ_b S_cP S_cP_MC S_c12 1 500.00 100.00 0.0000 0.0000 4.3205 4.3144 4.268 22.361 0.0000 3.1131 3.0378 3.018 22.361 25.000 2.3040 2.3390 2.210 2 300.00 120.00 0.0000 0.0000 6.5145 0.0 6.347 17.321 0.0000 4.8231 4.6070 4.488 17.321 15.000 4.1453 4.0227 3.835 3 15000.0 1000.0 0.0000 0.0000 6.2873 0.0 8.033 122.47 0.0000 6.1213 0.0 5.680 122.47 575.00 2.4477 0.0 2.369

16 Motivation of the work (confidence intervals) Motivation of the work (confidence intervals) September, 2005 PhyStat 2005 Oxford, UK S.Bityukov Suppose f(n ; m ) describes the Poisson distribution of probabilities and g( m ;n) is the density of Gamma-distribution G 1,1+n then (Eq.1) where and n is the observed number of casual events appearing in Poisson flow for certain period of time. This identity shows that in our case the distribution of the probability of a true value of Poisson distribution parameter (the confidence density) for observed value n is the Gamma- distribution with mode n and mean value n+1, i.e. observed value n corresponds to the most probable value of parameter. The Poisson and Gamma distributions are statistically dual distributions. As shown, we for these distributions can reconstruct only single confidence density.

17 Program Limsb Program Limsb September, 2005 PhyStat 2005 Oxford, UK S.Bityukov The unique of confidence density allows to construct the confidence intervals by simplest (and correct) way: we reconstruct for observed value n the correspondent confidence density and by direct calculations of probabilities determine the confidence intervals and/or confidence limits. Now the program Limsb constructs the central confidence interval and the confidence interval of minimal length for observed value n. Input: values EPS, CL and array DLAMB. The testing set of observed values is given in data array DLAMB. The value EPS determines the precision of calculations. The value CL determines the confidence level of intervals.

18 Simplest example of program Limsb output September, 2005 PhyStat 2005 Oxford, UK S.Bityukov Confidence limits: eps, CL = 9.99999975E-05 0.899999976 central and shortes confidence intervals NN ev left bound right bound left tail upper prob. lenght.10000E-01 Central 0.05308130 3.015071 0.04998137 0.9500018 2.961990 Minimal 1.30385E-08 2.319871 1.082756E-08 0.9000000 2.319871.10000 Central 0.07074530 3.186507 0.04999527 0.9500008 3.115762 Minimal 6.053597E-09 2.473754 8.71932E-10 0.9000000 2.473754.50000 Central 0.17588568 3.907293 0.04998513 0.9499978 3.731407 Minimal 0.00512708 3.128773 0.00027532 0.9002753 3.123645 1.0000 Central 0.35530150 4.743777 0.04998505 0.94999635 4.3884754 Minimal 0.08397551 3.932307 0.00333463 0.90290034 3.8483307 10000. Central 9837.07715 10166.06 0.04999995 0.94999993 328.98340 Minimal 9836.24023 10165.22 0.04913389 0.94913387 328.97656

19 Conclusion Conclusion September, 2005 PhyStat 2005 Oxford, UK S.Bityukov Programs ScP and Limsb can be found in Web page http://cmsdoc.cern.ch/~bityukov We are ready to include in program Limsb the calculation of the confidence intervals of Poisson distribution parameter for signal events in presence of background (formula O.Helene, which appears in our approach by natural way, see hep-ex/0108020). We are grateful to Vladimir Gavrilov, Vassili Katchanov, and Albert De Roeck for the interest and support of this work. We would like to thank Bob Cousins, Vladimir Obraztsov and Claudia Wulz for discussions and useful comments.


Download ppt "Program for evaluation of the significance, confidence intervals and limits by direct probabilities calculations S.Bityukov (IHEP,Protvino), S.Erofeeva(MSA."

Similar presentations


Ads by Google