Presentation is loading. Please wait.

Presentation is loading. Please wait.

Provide tools for the statistical comparison of distributions  equivalent reference distributions  experimental measurements  data from reference sources.

Similar presentations


Presentation on theme: "Provide tools for the statistical comparison of distributions  equivalent reference distributions  experimental measurements  data from reference sources."— Presentation transcript:

1 Provide tools for the statistical comparison of distributions  equivalent reference distributions  experimental measurements  data from reference sources  functions deriving from theoretical calculations or fits Detector monitoring Simulation validation Reconstruction vs. expectation Regression testing Physics analysis Data analysis in HEP

2 Qualitative evaluation Quantitative evaluation GoF statistical toolkit A project to develop a statistical comparison system A project to develop a statistical comparison system Comparison of distributions Goodness of fit testing

3 United Software Development Process tailoredUnited Software Development Process, specifically tailored to the project RUP –practical guidance and tools from the RUP –both rigorous and lightweight –mapping onto ISO 15504 ISO 15504Guidance from ISO 15504 Incremental and iterative life cycle model Software process guidelines SPIRAL APPROACH

4 solid architectural approachThe project adopts a solid architectural approach functionalityquality –to offer the functionality and the quality needed by the users maintainable –to be maintainable over a large time scale extensible –to be extensible, to accommodate future evolutions of the requirements Component-based approachComponent-based approach –to facilitate re-use and integration in different frameworks AIDAAIDA –adopt a (HEP) standard –no dependence on any specific analysis tool Architectural guidelines

5

6 The algorithms are specialised on the kind of distribution (binned/unbinned) Every algorithm has been rigorously tested! Documentation available:http://www.ge.infn.it/geant4/analysis/HEPstatistics/

7 binnedApplies to binned distributions It can be useful also in case of unbinned distributions, but the data must be grouped into classes Cannot be applied if the counting of the theoretical frequencies in each class is < 5 –When this is not the case, one could try to unify contiguous classes until the minimum theoretical frequency is reached –Otherwise one could use Yates’ formula Chi-squared test

8 EMPIRICAL DISTRIBUTION FUNCTION ORIGINAL DISTRIBUTIONS Kolmogorov-Smirnov test Goodman approximation of KS test Kuiper test D mn More sophisticated algorithms unbinned distributions SUPREMUMSTATISTICS

9 Cramer-von Mises test Anderson-Darling test These algorithms are so powerful that we decided to implement their equivalent in case of binned distributions: Fisz-Cramer-von Mises test k-sample Anderson-Darling test More powerful algorithms unbinned distributions binned distributions TESTS CONTAINING A WEIGHTING FUNCTION

10  2 loses information in a test for unbinned distribution by grouping the data into cells Kac, Kiefer and Wolfowitz (1955) showed that Kolmogorov-Smirnov test requires n 4/5 observations compared to n observations for  2 to attain the same power Cramer-von Mises and Anderson-Darling statistics are expected to be superior to Kolmogorov-Smirnov’s, since they make a comparison of the two distributions all along the range of x, rather than looking for a marked difference at one point 2222 2222 Supremum statistics tests Tests containing a weight function < < In terms of power: Is Is  2 the most powerful algorithm?

11 shielded The user is completely shielded from both statistical and computing complexity. USER EXTRACTS THE ALGORITHM WRITING ONE LINE OF CODE TOOLKIT STATISTICALRESULT User’s point of view Simple user layerSimple user layer AIDA objects comparison algorithmOnly deal with AIDA objects and choice of comparison algorithm

12 Examples of practical applications

13  2 N-S =0.267 =28 p=1  2 N-L =1.315 =28 p=1  2 N-S =0.532 =28 p=1  2 N-L =1.928 =28 p=1  2 N-S =0.373 =28 p=1  2 N-L = 5.882 =28 p=1 Geant4 simulations are statistically comparable with reference data (NIST database http://www.nist.gov) NIST Geant4 Standard Geant4 LowE Chi-squared test Mi Microscopic validation of physics

14  2 not appropriate (< 5 entries in some bins, physical information would be lost if rebinned) Anderson-Darling A c (95%) =0.752 Test beam at Bessy Bepi-Colombo mission Energy (keV) Counts X-ray fluorescence spectrum in Iceand basalt (E IN =6.5 keV) Very complex distributions Experimental measurements are comparable with Geant4 simulations

15 D EXP-GEANT4 =0.11 p=n.s.  2 EXP-GEANT4 =3.8 =2 p=n.s. KOLMOGOROV-SMIRNOV Goodman approximation KOLMOGOROV-SMIRNOV Medical applications-hadron therapy Experimental measurements are comparable with Geant4 simulations

16 Future developments

17 Real-lifeReal-life distributions are not strictly limited to one-dimension. higherFor this reason the algorithms contained in the GoF Toolkit are going to be generalised to the case of higher dimensional distributions. big step forward statistics physics data analysis This is a big step forward in statistics and in physics data analysis as well. Work in progress (I) Work in progress (I)

18 theoretical referenceThe user will have the possibility to compare its distributions with some theoretical reference distributions, as: -uniform, -gaussian, -Weibull, -gamma, … Data handlingData handling : filtering Treatment of errorsTreatment of errors (uncertainties) Work in progress (II) Work in progress (II)

19 The GoF Toolkit is downloadable from the web: www.ge.infn.it/geant4/analysis/HEPstatistics/index.html Recent developments –added new algorithms, improved design, improved documentation –user examples, unit and system tests –statistical detailed documentation Status

20 newup-to-dateeasy to handlepowerful This is a new up-to-date easy to handle and powerful tool for statistical comparison in particle physics. sophisticated and powerful statistical tests It the first tool supplying such a variety of sophisticated and powerful statistical tests in HEP. AIDA AIDA interfaces allow its integration in any other data analysis tool. Applications in: HEP, astrophysics, medical physics, … Conclusions


Download ppt "Provide tools for the statistical comparison of distributions  equivalent reference distributions  experimental measurements  data from reference sources."

Similar presentations


Ads by Google