Download presentation
Presentation is loading. Please wait.
Published byLester Cunningham Modified over 9 years ago
1
Provide tools for the statistical comparison of distributions equivalent reference distributions experimental measurements data from reference sources functions deriving from theoretical calculations or fits Detector monitoring Simulation validation Reconstruction vs. expectation Regression testing Physics analysis Data analysis in HEP
2
Qualitative evaluation Quantitative evaluation GoF statistical toolkit A project to develop a statistical comparison system A project to develop a statistical comparison system Comparison of distributions Goodness of fit testing
3
United Software Development Process tailoredUnited Software Development Process, specifically tailored to the project RUP –practical guidance and tools from the RUP –both rigorous and lightweight –mapping onto ISO 15504 ISO 15504Guidance from ISO 15504 Incremental and iterative life cycle model Software process guidelines SPIRAL APPROACH
4
solid architectural approachThe project adopts a solid architectural approach functionalityquality –to offer the functionality and the quality needed by the users maintainable –to be maintainable over a large time scale extensible –to be extensible, to accommodate future evolutions of the requirements Component-based approachComponent-based approach –to facilitate re-use and integration in different frameworks AIDAAIDA –adopt a (HEP) standard –no dependence on any specific analysis tool Architectural guidelines
6
The algorithms are specialised on the kind of distribution (binned/unbinned) Every algorithm has been rigorously tested! Documentation available:http://www.ge.infn.it/geant4/analysis/HEPstatistics/
7
binnedApplies to binned distributions It can be useful also in case of unbinned distributions, but the data must be grouped into classes Cannot be applied if the counting of the theoretical frequencies in each class is < 5 –When this is not the case, one could try to unify contiguous classes until the minimum theoretical frequency is reached –Otherwise one could use Yates’ formula Chi-squared test
8
EMPIRICAL DISTRIBUTION FUNCTION ORIGINAL DISTRIBUTIONS Kolmogorov-Smirnov test Goodman approximation of KS test Kuiper test D mn More sophisticated algorithms unbinned distributions SUPREMUMSTATISTICS
9
Cramer-von Mises test Anderson-Darling test These algorithms are so powerful that we decided to implement their equivalent in case of binned distributions: Fisz-Cramer-von Mises test k-sample Anderson-Darling test More powerful algorithms unbinned distributions binned distributions TESTS CONTAINING A WEIGHTING FUNCTION
10
2 loses information in a test for unbinned distribution by grouping the data into cells Kac, Kiefer and Wolfowitz (1955) showed that Kolmogorov-Smirnov test requires n 4/5 observations compared to n observations for 2 to attain the same power Cramer-von Mises and Anderson-Darling statistics are expected to be superior to Kolmogorov-Smirnov’s, since they make a comparison of the two distributions all along the range of x, rather than looking for a marked difference at one point 2222 2222 Supremum statistics tests Tests containing a weight function < < In terms of power: Is Is 2 the most powerful algorithm?
11
shielded The user is completely shielded from both statistical and computing complexity. USER EXTRACTS THE ALGORITHM WRITING ONE LINE OF CODE TOOLKIT STATISTICALRESULT User’s point of view Simple user layerSimple user layer AIDA objects comparison algorithmOnly deal with AIDA objects and choice of comparison algorithm
12
Examples of practical applications
13
2 N-S =0.267 =28 p=1 2 N-L =1.315 =28 p=1 2 N-S =0.532 =28 p=1 2 N-L =1.928 =28 p=1 2 N-S =0.373 =28 p=1 2 N-L = 5.882 =28 p=1 Geant4 simulations are statistically comparable with reference data (NIST database http://www.nist.gov) NIST Geant4 Standard Geant4 LowE Chi-squared test Mi Microscopic validation of physics
14
2 not appropriate (< 5 entries in some bins, physical information would be lost if rebinned) Anderson-Darling A c (95%) =0.752 Test beam at Bessy Bepi-Colombo mission Energy (keV) Counts X-ray fluorescence spectrum in Iceand basalt (E IN =6.5 keV) Very complex distributions Experimental measurements are comparable with Geant4 simulations
15
D EXP-GEANT4 =0.11 p=n.s. 2 EXP-GEANT4 =3.8 =2 p=n.s. KOLMOGOROV-SMIRNOV Goodman approximation KOLMOGOROV-SMIRNOV Medical applications-hadron therapy Experimental measurements are comparable with Geant4 simulations
16
Future developments
17
Real-lifeReal-life distributions are not strictly limited to one-dimension. higherFor this reason the algorithms contained in the GoF Toolkit are going to be generalised to the case of higher dimensional distributions. big step forward statistics physics data analysis This is a big step forward in statistics and in physics data analysis as well. Work in progress (I) Work in progress (I)
18
theoretical referenceThe user will have the possibility to compare its distributions with some theoretical reference distributions, as: -uniform, -gaussian, -Weibull, -gamma, … Data handlingData handling : filtering Treatment of errorsTreatment of errors (uncertainties) Work in progress (II) Work in progress (II)
19
The GoF Toolkit is downloadable from the web: www.ge.infn.it/geant4/analysis/HEPstatistics/index.html Recent developments –added new algorithms, improved design, improved documentation –user examples, unit and system tests –statistical detailed documentation Status
20
newup-to-dateeasy to handlepowerful This is a new up-to-date easy to handle and powerful tool for statistical comparison in particle physics. sophisticated and powerful statistical tests It the first tool supplying such a variety of sophisticated and powerful statistical tests in HEP. AIDA AIDA interfaces allow its integration in any other data analysis tool. Applications in: HEP, astrophysics, medical physics, … Conclusions
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.