Maria Grazia Pia, INFN Genova Statistical Toolkit Recent updates M.G. Pia B. Mascialino, A. Pfeiffer, M.G. Pia, A. Ribon, P. Viarengo

Maria Grazia Pia, INFN Genova Statistical Toolkit Recent updates M.G. Pia B. Mascialino, A. Pfeiffer, M.G. Pia, A. Ribon, P. Viarengo http://www.ge.infn.it/geant4/analysis/HEPstatistics CMS Software Meeting 25 July 2005

Maria Grazia Pia, INFN Genova Goodness-of-Fit component Component for modeling multi-parametric fit problems F. Fabozzi, L. Lista INFN Napoli B. Mascialino, A. Pfeiffer, M.G. Pia, A. Ribon, P. Viarengo CERN, INFN Genova, IST Genova G.A.P. Cirrone et al., A Goodness-of-Fit Statistical Toolkit IEEE Transactions on Nuclear Science, Vol. 51, Issue: 5, Oct. 2004, Pages: 2056 - 2063 F. Fabozzi and L. Lista., A generic toolkit for multivariate fitting designed with template metaprogramming To be published in IEEE Transactions on Nuclear Science, 2005

Maria Grazia Pia, INFN Genova Vision: the basics software process Rigorous software process vision Have a vision for the project –General purpose tool for statistical analysis –Toolkit approach (choice open to users) –Open source product architecture Build on a solid architecture Clearly define scopeobjectives scope, objectives Flexible, extensible, maintainable Flexible, extensible, maintainable system quality Software quality

Maria Grazia Pia, INFN Genova Architectural guidelines architectural The project adopts a solid architectural approach functionalityquality –to offer the functionality and the quality needed by the users maintainable –to be maintainable over a long time scale extensible –to be extensible, to accommodate future evolutions of the requirements Component-based architecture –to facilitate re-use and integration in diverse frameworksDependencies –no dependence on any specific analysis tool –the core mathematical component is independent from the user’s representation of his/her analysis objects –the user layer bridges the core statistical component and the user’s analysis –multiple implementations of the user layer (AIDA, ROOT, FITS, GSL etc.)

Maria Grazia Pia, INFN Genova

Simple user layer Shields the user from the complexity of the underlying algorithms and design user’s analysis objectscomparison algorithm Only deal with user’s analysis objects and choice of comparison algorithm

Maria Grazia Pia, INFN Genova Software process United Software Development Process, specifically tailored to the project –practical guidance and tools from the RUP –both rigorous and lightweight –mapping onto ISO 15504 –significant experience gained in the group from other projects Incremental and iterative life-cycle model

Maria Grazia Pia, INFN Genova GoF algorithms (currently implemented) Algorithms for binned distributions – Anderson-Darling test – Chi-squared test – Fisz-Cramer-von Mises test – Tiku test (Cramer-von Mises test in chi-squared approximation) Algorithms for unbinned distributions – Anderson-Darling test – Cramer-von Mises test – Goodman test (Kolmogorov-Smirnov test in chi-squared approximation) – Kolmogorov-Smirnov test – Kuiper test – Tiku test (Cramer-von Mises test in chi-squared approximation) The most complete statistics software for the comparison of two distributions (also among commercial/professional statistics tools)

Maria Grazia Pia, INFN Genova Recent extensions: algorithms Fisz-Cramer-von Mises test –exact asymptotic distribution (earlier: critical values) Anderson-Darling test –exact formulation (exists for unbinned distributions only, earlier: approximated formulation for both binned and unbinned distributions) Tiku test –Cramer-von Mises test in a chi-squared approximation New tests: weighted Kolmogorov-Smirnov, weighted Cramer-von Mises –various weighting functions available in literature In preparation: Watson test –can be applied in case of cyclic observations (like Kuiper test) It is already the most complete software for the comparison of two distributions (even among commercial/professional statistics tools) –goal: provide all 2-sample GoF algorithms currently existing in statistics literature Publication in preparation to describe the new algorithms

Maria Grazia Pia, INFN Genova Recent extensions: user layer First release: user layer for AIDA analysis objects –LCG Architecture Blueprint, Geant4 requirement July 2005: added user layer for ROOT histograms –CMS requirement (requested by Pedro Arce) Other user layer implementations foreseen –easy to add –sound architecture decouples the mathematical component and the user’s representation of analysis objects –different requirements from various user communities: satisfy all of them w/o introducing dependencies on any analysis tools More in Andreas’ talk

Maria Grazia Pia, INFN Genova Release Releases are publicly downloadable from the web –code, documentation etc. For the convenience of LCG users, releases are also distributed with LCG AA software as “external contributions” New release with algorithm and user layer extension in the next days –thanks to Pedro Arce for  -testing! –new naming of packages for additional user layers Another new release with new algorithms planned in autumn –+ publication on recent extensions Releases include extensive user documentation –statistics algorithms –how to use the software The project is systematically accompanied by publications on refereed journals to document the recognition of its scientific value

Maria Grazia Pia, INFN Genova Power of GoF tests No comprehensive study of the relative power of GoF tests exists in literature –novel research in statistics –novel research in statistics (not only in physics data analysis!) Compare the various GoF tests w.r.t. a set of “typical” functional distributions guidance to the users Provide guidance to the users based on sound quantitative arguments Preliminary results available, publication in preparation Do we really need such a wide collection of GoF tests? Why? Which is the most appropriate test to compare two distributions? How “good” is it to recognize real equivalent distributions and to reject fake ones? Which test to use?

Maria Grazia Pia, INFN Genova Usage Geant4 physics validation –rigorous approach: quantitative evaluation of Geant4 physics models with respect to established reference data –see for instance: K. Amako et al., Comparison of Geant4 electromagnetic physics models against the NIST reference data To be published on IEEE Transactions on Nuclear Science LCG Simulation Validation project –see for instance: A. Ribon, Testing Geant4 with a simplified calorimeter setup, talk at Geant4 Physics Validation Workshop, Genova, July 2005, http://www.ge.infn.it/geant4/events/july2005 http://www.ge.infn.it/geant4/events/july2005 CMS –validation of “new” histograms w.r.t. “reference” ones in OSCAR Validation Suite Geant4 regression testing –prototype developed at Gran Sasso Lab, inclusion in Geant4 regular testing in preparation (discussed at Geant4 Physics validation Workshop last week) Usage also in space science, medicine etc.

Maria Grazia Pia, INFN Genova Outlook Treatment of errors 1-sample GoF tests (comparison w.r.t. a function) Comparison of two/multi-dimensional distributions In all cases: goal to provide an extensive set of algorithms so far published in statistics literature, with a critical evaluation of their relative strengths and applicability Your feedback is very much appreciated Please let us know your requirements –we do our best to satisfy them, compatible with available resources The project is open to developers interested in statistical methods –it is advanced R&D in statistics, not only a useful tool for physics analysis New release coming soon New publications in preparation

Maria Grazia Pia, INFN Genova http://www.ge.infn.it/geant4/analysis/HEPstatistics/ Will be moved to http://www.ge.infn.it/statisticaltoolkit

Maria Grazia Pia, INFN Genova Acknowledgments Work supported and partially funded by the European Space Agency (ESA) under Contract No.16339/02/NL/FM Fred James (CERN) and Louis Lyons (Oxford) –many useful suggestions, discussions, encouragement... Thanks to our statistician (Paolo Viarengo, IST – National Institute for Cancer Research) for his help with the mathematical calculations and the guidance through statistics literature

Maria Grazia Pia, INFN Genova #include #include "TH1.h" #include "StatisticsTesting/StatisticsComparator.h" #include "ComparisonResult.h" #include "Chi2ComparisonAlgorithm.h" using namespace StatisticsTesting; int main (int, char **) { // Create and fill two 1-dimensional histograms with random numbers. TH1 hA("A", 10, 0.0, 1.0); TH1 hB("B", 10, 0.0, 1.0); for (int i = 0; i < 1000; i++ ) { hA.Fill( drand48() ); hB.Fill( drand48() ); }

Maria Grazia Pia, INFN Genova Comparing two root histograms // Do the (binned) statistical test between the two histograms StatisticsComparator comparator; ComparisonResult result = comparator.compare(hA, hB); std::cout << "Result of the Chi2 Statistical Test: " << std::endl << "distance = " << result.distance() << std::endl << ”NDF = " << result.ndf() << std::endl << "p-value = " << result.quality() << std::endl; }

Maria Grazia Pia, INFN Genova Make and run: $ source setup.sh $ make $.objs/exampleChi2 Result of the Chi2 Statistical Test: distance = 4.82038 ndf (number of degree of freedom) = 9 p-value = 0.849677

Maria Grazia Pia, INFN Genova Statistical Toolkit Recent updates M.G. Pia B. Mascialino, A. Pfeiffer, M.G. Pia, A. Ribon, P. Viarengo

Similar presentations

Presentation on theme: "Maria Grazia Pia, INFN Genova Statistical Toolkit Recent updates M.G. Pia B. Mascialino, A. Pfeiffer, M.G. Pia, A. Ribon, P. Viarengo"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Maria Grazia Pia, INFN Genova Statistical Toolkit Recent updates M.G. Pia B. Mascialino, A. Pfeiffer, M.G. Pia, A. Ribon, P. Viarengo

Similar presentations

Presentation on theme: "Maria Grazia Pia, INFN Genova Statistical Toolkit Recent updates M.G. Pia B. Mascialino, A. Pfeiffer, M.G. Pia, A. Ribon, P. Viarengo"— Presentation transcript:

Similar presentations

About project

Feedback