Maria Grazia Pia, INFN Genova Update on the Goodness of Fit Toolkit M.G. Pia B. Mascialino, A. Pfeiffer, M.G. Pia, A. Ribon, P. Viarengo

Maria Grazia Pia, INFN Genova Update on the Goodness of Fit Toolkit M.G. Pia B. Mascialino, A. Pfeiffer, M.G. Pia, A. Ribon, P. Viarengo http://www.ge.infn.it/geant4/analysis/HEPstatistics http://www.ge.infn.it/statisticaltoolkit PHYSTAT 2005 Oxford, 11-15 September 2005

Maria Grazia Pia, INFN Genova Historical background… Validation of Geant4 physics models through comparison of simulation vs experimental data or reference databases Fluorescence spectrum from Icelandic basalt (Mars-like rock): experimental data and simulation ESA Bepi Colombo mission to Mercury test beam at Bessy Photon attenuation coefficient, Al Geant4 Standard Geant4 LowE NIST data Electromagnetic models in Geant4 w.r.t. NIST reference

Maria Grazia Pia, INFN Genova Some use cases Regression testing –Throughout the software life-cycle Online DAQ –Monitoring detector behaviour w.r.t. a reference Simulation validation –Comparison with experimental data Reconstruction –Comparison of reconstructed vs. expected distributions Physics analysis –Comparisons of experimental distributions (ATLAS vs. CMS Higgs?) –Comparison with theoretical distributions (data vs. Standard Model) The test statistics computation concerns the agreement between the two samples’ empirical distribution functions

Maria Grazia Pia, INFN Genova G.A.P Cirrone, S. Donadio, S. Guatelli, A. Mantero, B. Mascialino, S. Parlati, M.G. Pia, A. Pfeiffer, A. Ribon, P. Viarengo “A Goodness-of-Fit Statistical Toolkit” IEEE- Transactions on Nuclear Science (2004), 51 (5): 2056-2063. More information on the project web site

Maria Grazia Pia, INFN Genova Vision of the project software process Rigorous software process vision Basic vision –General purpose tool –Toolkit approach (choice open to users) –Open source product –Independent from specific analysis tools –Easily usable in analysis and other tools architecture Build on a solid architecture Clearly define scope objectives scope, objectives Flexible, extensible, maintainable Flexible, extensible, maintainable system quality Software quality

Maria Grazia Pia, INFN Genova Software process guidelines Adopt a process –the key to software quality... Unified Processtailored Unified Process, specifically tailored to the project RUP –practical guidance and tools from the RUP –both rigorous and lightweight –mapping onto ISO 15504 (and CMM) Incremental and iterative life-cycle 1 st cycle: 2-sample GoF tests –1-sample GoF in preparation

Maria Grazia Pia, INFN Genova Architectural guidelines architectural The project adopts a solid architectural approach functionalityquality –to offer the functionality and the quality needed by the users maintainable –to be maintainable over a large time scale extensible –to be extensible, to accommodate future evolutions of the requirements Component-based architecture –to facilitate re-use and integration in diverse frameworks –layer architecture pattern –core component for statistical computation –independent components for interface to user analysis environmentsDependencies –no dependence on any specific analysis tool –can be used by any analysis tools, or together with any analysis tools –offer various specific user layers for usage with AIDA, ROOT, etc.

Maria Grazia Pia, INFN Genova

Simple user layer Shields the user from the complexity of the underlying algorithms and design analysis objects comparison algorithm Only deal with the user’s analysis objects and choice of comparison algorithm User Layer

Maria Grazia Pia, INFN Genova GoF algorithms (currently implemented) Algorithms for binned distributions – Anderson-Darling test – Chi-squared test – Fisz-Cramer-von Mises test – Tiku test (Cramer-von Mises test in chi-squared approximation) Algorithms for unbinned distributions – Anderson-Darling test – Cramer-von Mises test – Goodman test (Kolmogorov-Smirnov test in chi-squared approximation) – Kolmogorov-Smirnov test – Kuiper test – Tiku test (Cramer-von Mises test in chi-squared approximation)

Maria Grazia Pia, INFN Genova Recent extensions: algorithms Fisz-Cramer-von Mises test and Anderson-Darling test –exact asymptotic distribution (earlier: critical values) Tiku test –Cramer-von Mises test in a chi-squared approximation New tests: weighted Kolmogorov-Smirnov, weighted Cramer-von Mises –various weighting functions available in literature In preparation –Watson test (can be applied in case of cyclic observations, like Kuiper test) –Girone test It is the most complete software for the comparison of two distributions, even among commercial/professional statistics tools –goal: provide all 2-sample GoF algorithms existing in statistics literature Publication in preparation to describe the new algorithms

Maria Grazia Pia, INFN Genova Recent extensions: user layer First release: user layer for AIDA analysis objects –LCG Architecture Blueprint, Geant4 requirement July 2005: added user layer for ROOT histograms –in response to user requirements Other user layer implementations foreseen –easy to add –sound architecture decouples the mathematical component and the user’s representation of analysis objects –different requirements from various user communities: satisfy them without introducing dependencies on any analysis tools

Maria Grazia Pia, INFN Genova Software release Releases are publicly downloadable from the web –code, documentation etc. For the convenience of LCG users, releases are also distributed with LCG AA software as “external contributions” Also ported to Java, distributed with JAS Release with new algorithms planned in autumn –+ publication on recent extensions Releases include extensive user documentation –statistics algorithms –how to use the software The project is systematically accompanied by publications on refereed journals to document the recognition of its scientific value

Maria Grazia Pia, INFN Genova Examples of Usage Geant4 physics validation –rigorous approach: quantitative evaluation of Geant4 physics models with respect to established reference data –see for instance: K. Amako et al., Comparison of Geant4 electromagnetic physics models against the NIST reference data IEEE Trans. Nucl. Sci. 52- 4 (2005) 910-918 LCG Simulation Validation project –see for instance: A. Ribon, Testing Geant4 with a simplified calorimeter setup, http://www.ge.infn.it/geant4/events/july2005 CMS –validation of “new” histograms w.r.t. “reference” ones in OSCAR Validation Suite Usage also in space science, medicine etc.

Maria Grazia Pia, INFN Genova Power of GoF tests Do we really need such a wide collection of GoF tests? Why? most appropriate Which is the most appropriate test to compare two distributions? good How “good” is a test at recognizing real equivalent distributions and rejecting fake ones? Which test to use?

Maria Grazia Pia, INFN Genova Systematic study of GoF tests No comprehensive study of the relative power of GoF tests exists in literature –novel research in statistics –novel research in statistics (not only in physics data analysis!) Systematicall Systematic study of all existing GoF tests in progress –made possible by the extensive collection of tests in the Statistical Toolkit guidance to the users Provide guidance to the users based on sound quantitative arguments Preliminary results available Publication in preparation

Maria Grazia Pia, INFN Genova Method for the evaluation of power N=1000 Monte Carlo replicas Pseudoexperiment: a random drawing of two samples from two parent distributions For each test, the p-value computed by the GoF Toolkit derives from the analytical calculation of the asymptotic distribution, often depending on the samples sizes Confidence Level = 0.05 Parent distribution 1 Sample 1 n Sample 2 m GoF test Parent distribution 2 Power = # pseudoexperiments with p-value < (1-CL) # pseudoexperiments

Maria Grazia Pia, INFN Genova Parent distributions Uniform Gaussian Double exponential Cauchy Exponential Contaminated Normal Distribution 2Contaminated Normal Distribution 1 Also Breit-Wigner, other distributions being considered

Maria Grazia Pia, INFN Genova Characterization of distributions Parent distribution ST f 1 (x) Uniform 11.267 f 2 (x) Gaussian 11.704 f 3 (x) Double exponential 12.161 f 4 (x) Cauchy 15.263 f 5 (x) Exponential 4.4861.883 f 6 (x) Contamined normal 1 11.991 f 7 (x) Contamined normal 2 1.7691.693 SkewnessTailweight

Maria Grazia Pia, INFN Genova The power increases with the sample size (analytical calculation of the asymptotic distribution) N sample Power Kolmogorov-Smirnov test CL = 0.05 The “location-scale problem” Case: Parent1 = Parent 2 Uniform Normal Exponential Double Exponential Contaminated Normal 1 Contaminated Normal 2 Cauchy small size samples moderate size samples Preliminary

Maria Grazia Pia, INFN Genova The “general shape problem” Distribution1 – Distribution 2KSCVMAD CN2 - Normal55.6±1.815.2±1.186.1±1.1 CN2 - CN124.9±1.425.2±1.144.8±1.6 CN2 - Double Exponential37.6±1.540.2±1.651.6±1.6 T2T2 Case: Parent1 ≠ Parent 2 (S 1 = S 2 = 1) Power Tailweight Distribution 2 CL = 0.05 Kolmogorov-Smirnov Cramér-von Mises Anderson-Darling Distribution 1 and Double Exponential (T 1 = 2.161) A) Symmetric distributions B) Skewed versus symmetric distributionsKSKSCVMCVMADAD ~< For long tailed distributions:KSKSCVMCVMADAD ~~ For short/medium tailed distributions: Preliminary

Maria Grazia Pia, INFN Genova Comparative evaluation of tests Short (T<1.5) Medium (1.5 < T < 2) Long(T>2) S~1S~1S~1S~1KSKS – CVMCVM - AD S>1.5 KS - ADADCVM - AD Skewness Tailweight Preliminary

Maria Grazia Pia, INFN Genova Preliminary results clear winner No clear winner for all the considered distributions in general –the performance of a test depends on its intrinsic features as well as on the features of the distributions to be compared Practical recommendations 1)first classify the type of the distributions in terms of skewness and tailweight 2)choose the most appropriate test given the type of distributions Systematic study of the power in progress –for both binned and unbinned distributions Topic still subject to research activity in the domain of statistics Publication in preparation

Maria Grazia Pia, INFN Genova Outlook 1-sample GoF tests (comparison w.r.t. a function) Comparison of two/multi-dimensional distributions Systematic study of the power of GoF tests Goal to provide an extensive set of algorithms so far published in statistics literature, with a critical evaluation of their relative strengths and applicability Treatment of errors, filtering New release coming soon New papers in preparation Other components beyond GoF? Suggestions are welcome…

Maria Grazia Pia, INFN Genova Conclusions A novel, complete software toolkit for statistical analysis is being developed –rich set of algorithms –sound architectural design –rigorous software process A systematic study of the power of GoF tests is in progress –unexplored area of research Application in various domains –Geant4, HEP, space science, medicine… Feedback and suggestions are very much appreciated The project is open to developers interested in statistical methods

Maria Grazia Pia, INFN Genova Update on the Goodness of Fit Toolkit M.G. Pia B. Mascialino, A. Pfeiffer, M.G. Pia, A. Ribon, P. Viarengo

Similar presentations

Presentation on theme: "Maria Grazia Pia, INFN Genova Update on the Goodness of Fit Toolkit M.G. Pia B. Mascialino, A. Pfeiffer, M.G. Pia, A. Ribon, P. Viarengo"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Maria Grazia Pia, INFN Genova Update on the Goodness of Fit Toolkit M.G. Pia B. Mascialino, A. Pfeiffer, M.G. Pia, A. Ribon, P. Viarengo

Similar presentations

Presentation on theme: "Maria Grazia Pia, INFN Genova Update on the Goodness of Fit Toolkit M.G. Pia B. Mascialino, A. Pfeiffer, M.G. Pia, A. Ribon, P. Viarengo"— Presentation transcript:

Similar presentations

About project

Feedback