Presentation is loading. Please wait.

Presentation is loading. Please wait.

Barbara Mascialino, INFN Genova An update on the Goodness of Fit Statistical Toolkit B. Mascialino, A. Pfeiffer, M.G. Pia, A. Ribon, P. Viarengo

Similar presentations


Presentation on theme: "Barbara Mascialino, INFN Genova An update on the Goodness of Fit Statistical Toolkit B. Mascialino, A. Pfeiffer, M.G. Pia, A. Ribon, P. Viarengo"— Presentation transcript:

1 Barbara Mascialino, INFN Genova An update on the Goodness of Fit Statistical Toolkit B. Mascialino, A. Pfeiffer, M.G. Pia, A. Ribon, P. Viarengo http://www.ge.infn.it/geant4/analysis/HEPstatistics http://www.ge.infn.it/statisticaltoolkit 4 th Geant4 Space Users’ Workshop

2 Barbara Mascialino, INFN Genova Goodness of Fit testing Regression testing –Throughout the software life-cycle Online DAQ –Monitoring detector behaviour w.r.t. a reference Simulation validation –Comparison with experimental data Reconstruction –Comparison of reconstructed vs. expected distributions Physics analysis –Comparison with theoretical distributions –Comparisons of experimental distributions Goodness-of-fitmathematical foundation comparison of data distributions Goodness-of-fit testing is the mathematical foundation for the comparison of data distributions THEORETICAL DISTRIBUTION SAMPLE ONE-SAMPLE PROBLEM SAMPLE 2SAMPLE 1 TWO-SAMPLE PROBLEM Use cases in experimental physics

3 Barbara Mascialino, INFN Genova G.A.P Cirrone, S. Donadio, S. Guatelli, A. Mantero, B. Mascialino, S. Parlati, M.G. Pia, A. Pfeiffer, A. Ribon, P. Viarengo “A Goodness-of-Fit Statistical Toolkit” IEEE- Transactions on Nuclear Science (2004), 51 (5): 2056-2063. B. Mascialino, M.G. Pia, A. Pfeiffer, A. Ribon, P. Viarengo “New developments of the Goodness-of-Fit Statistical Toolkit” IEEE- Transactions on Nuclear Science (2006), 53 (6), to be published http://www.ge.infn.it/statisticaltoolkit/

4 Barbara Mascialino, INFN Genova Software process guidelines Adopt a process – software quality Unified Processtailored Unified Process, specifically tailored to the project RUP –practical guidance and tools from the RUP –both rigorous and lightweight –mapping onto ISO 15504 (and CMM) Incremental and iterative life-cycle 1 st cycle: 2-sample GoF tests –1-sample GoF in preparation

5 Barbara Mascialino, INFN Genova Architectural guidelines architectural The project adopts a solid architectural approach functionalityquality –to offer the functionality and the quality needed by the users maintainable –to be maintainable over a large time scale extensible –to be extensible, to accommodate future evolutions of the requirements Component-based architecture –to facilitate re-use and integration in diverse frameworks –layer architecture pattern –core component for statistical computation –independent components for interface to user analysis environmentsDependencies –no dependence on any specific analysis tool –can be used by any analysis tools, or together with any analysis tools –offer a (HEP) standard (AIDA) for the user layer

6 Barbara Mascialino, INFN Genova

7 The algorithms are specialised on the kind of distribution (binned/unbinned)

8 Barbara Mascialino, INFN Genova GoF algorithms in the Statistical Toolkit Unbinned distributions – Anderson-Darling test – Anderson-Darling approximated test – Cramer-von Mises test – Generalised Girone test – Goodman test (Kolmogorov-Smirnov test in chi-squared approximation) – Kolmogorov-Smirnov test – Kuiper test – Tiku test (Cramer-von Mises test in chi-squared approximation) – Weighted Kolmogorov-Smirnov test (2 flavours) – Weighted Cramer-von Mises test TWO-SAMPLE PROBLEM Binned distributions – Anderson-Darling test – Anderson-Darling approximated test – Chi-squared test – Fisz-Cramer-von Mises test – Tiku test (Cramer-von Mises test in chi-squared approximation) It is the most complete software for the comparison of two distributions, even among commercial/professional statistics tools. It provides all 2-sample (edf) GoF algorithms existing in statistics literature

9 Barbara Mascialino, INFN Genova Simple user layer Shields the user from the complexity of the underlying algorithms and design analysis objects comparison algorithm Only deal with the user’s analysis objects and choice of comparison algorithm First release: user layer for AIDA analysis objects –LCG Architecture Blueprint, Geant4 requirement Second release: added user layer for ROOT analysis objects –in response to user requirements User Layer

10 Barbara Mascialino, INFN Genova Which test to use? Do we really need such a wide collection of GoF tests? Why? most appropriate Which is the most appropriate test to compare two distributions? good How “good” is a test at recognizing real equivalent distributions and rejecting fake ones? The choice of the most suitable GoF test can be performed on the basis of two different criteria: –Computational performance –Statistical performance (power) Which test to use?

11 Barbara Mascialino, INFN Genova AVERAGE CPU TIME Binned Distributions Unbinned Distributions Anderson-Darling (0.69±0.01) ms(16.9±0.2) ms Anderson-Darling (approximated) (0.60±0.01) ms(16.1±0.2) ms Chi-squared (0.55±0.01) ms Cramer-von Mises (0.44±0.01) ms(16.3±0.2) ms Generalised Girone (15.9±0.2) ms Goodman (11.9±0.1) ms Kolmogorov-Smirnov (8.9±0.1) ms Kuiper (12.1±0.1) ms Tiku (0.69±0.01) ms(16.7±0.2) ms Watson (14.2±0.1) ms Weighted Kolmogorov-Smirnov (AD) (14.0±0.1) ms Weighted Kolmogorov-Smirnov (Buning) (14.0±0.1) ms Weighted Cramer-von Mises (14.0±0.1) ms A) Performance of the GoF tests

12 Barbara Mascialino, INFN Genova B) Power of GoF tests Systematicall Systematic study of all existing GoF tests in progress –made possible by the extensive collection of tests in the Statistical Toolkit –GoF tests power evaluated in a variety of alternative situations considered clear winner: No clear winner: the statistical performance of a test depends on the features of the distributions to be compared (skewness and tailweight) and on the sample size Practical recommendations 1)first classify the type of the distributions in terms of skewness and tailweight 2)choose the most appropriate test given the type of distributions evaluating the best test by means of the quantitative model proposed Topic still subject to research activity in the domain of statistics p<0.0001 General recipe The power of a test is the probability of rejecting the null hypothesis correctly

13 Barbara Mascialino, INFN Genova Examples of practical applications

14 Barbara Mascialino, INFN Genova Statistical Toolkit Usage Geant4 physics validation –rigorous approach: quantitative evaluation of Geant4 physics models with respect to established reference data –see for instance: K. Amako et al., Comparison of Geant4 electromagnetic physics models against the NIST reference data IEEE Trans. Nucl. Sci. 52- 4 (2005) 910-918 LCG Simulation Validation project –see for instance: A. Ribon, Testing Geant4 with a simplified calorimeter setup, http://www.ge.infn.it/geant4/events/july2005 CMS –validation of “new” histograms w.r.t. “reference” ones in OSCAR Validation Suite Usage also in space science, medicine, statistics, etc.

15 Barbara Mascialino, INFN Genova Electron Stopping Power H 0 REJECTION AREA p-value stability study centre Experimental set-up Physics models under test: Geant4 Standard Geant4 Low Energy – Livermore Geant4 Low Energy – Penelope Reference data: NIST ESTAR - ICRU 37 p-value Z Geant4 LowE Penelope Geant4 Standard Geant4 LowE EEDL NIST - XCOM The three Geant4 models are equivalent Geant4 LowE Penelope Geant4 Standard Geant4 LowE EEDL Validation of Geant4 e.m. physics models vs. NIST reference data  2 test (to include data uncertainties in the computation of the test statistics value)

16 Barbara Mascialino, INFN Genova Validation of Geant4 Atomic Relaxation vs NIST reference dataShell-end Kolmogorov- Smirnov D p-value 100.01921 110.01751 130.02501 140.02561 180.02941 190.03121 210.14290.997085 220.05881 Geant4 ○ NIST Fluorescence - Shell-start 3

17 Barbara Mascialino, INFN Genova Validation of Geant4 electromagnetic and hadronic models against proton data Low Energy EM – ICRU49: p, ions Low Energy EM – Livermore: , e- Standard EM:e+ HadronElastic with BertiniElastic Bertini Inelastic p-value CvMKSAD Left branch 0.977 Right branch 0.985 Whole curve 0.994 LowE EM – ICRU49 BertiniElastic CvM Cramer-von Mises test KS Kolmogorov-Smirnov test AD Anderson-Darling test 0.5 M events mm Geant4 Experimental data Bertini Inelastic

18 Barbara Mascialino, INFN Genova  2 not appropriate (< 5 entries in some bins, physical information would be lost if rebinned) Anderson-Darling A c (95%) =0.752 Test beam at Bessy Bepi-Colombo mission Energy (keV) Counts X-ray fluorescence spectrum in Iceand basalt (E IN =6.5 keV) Very complex distributions Experimental measurements are comparable with Geant4 simulations

19 Barbara Mascialino, INFN Genova Average energy deposit (MeV) Depth in the phantom (cm) Average energy deposit of GCR p GCR p 4 cm Al - Binary set 4 cm Al - Bertini 10 cm water - Binary 10 cm water – EM 4 cm Al – EM 10 cm water - Bertini Comparison of alternative vehicle concepts in human missions to Mars Reference: rigid structures as in the ISS (2 - 4 cm Al) Kolmogorov-Smirnov test  Multi-layer + 10 cm water equivalent to 4 cm Al  Multi-layer + 5 cm water equivalent to 2.15 cm Al An inflatable habitat exhibits a shielding capability equivalent to a conventional rigid one Shielding material Energy deposited in phantom (MeV) EMBertiniBinary ML + 5 cm water 73.5 ± 0.3130.2 ± 0.5119.3 ± 0.4 ML + 10 cm water 71.9 ± 0.3128.0 ± 0.5117.3 ± 0.5 4 cm Al 72.9 ± 0.3127.5 ± 0.5117.0 ± 0.4 2.15 cm Al 73.9 ± 0.3130.5 ± 0.5119.3 ± 0.5 Inflatable habitat vs a conventional rigid habitat

20 Barbara Mascialino, INFN Genova Conclusions A novel, complete software software toolkit for statistical analysis is being developed –all the two-sample GoF tests available in statistical domain + chi-squared test –rigorous architectural design –rigorous software process It is the most complete software for the comparison of two distributions, even among commercial/professional statistics tools. A systematic study of the power of GoF tests is in progress –unexplored area of research Application in various domains –Geant4, HEP, space science, medicine… Feedback and suggestions are very much appreciated


Download ppt "Barbara Mascialino, INFN Genova An update on the Goodness of Fit Statistical Toolkit B. Mascialino, A. Pfeiffer, M.G. Pia, A. Ribon, P. Viarengo"

Similar presentations


Ads by Google