Data analysis in HEP: a statistical toolkit

Data analysis in HEP: a statistical toolkit
S.Donadio, S.Guatelli, B.Mascialino, A.Pfeiffer, M.G.Pia, A.Ribon, P.Viarengo 1st Workshop on Italy-Japan Collaboration on Geant4 Medical Application

Data analysis in HEP Detector monitoring Simulation validation
Provide tools for the statistical comparison of distributions equivalent reference distributions experimental measurements data from reference sources functions deriving from theoretical calculations or fits Detector monitoring Simulation validation Reconstruction vs. expectation Regression testing Physics analysis Detector monitoring in order to check if the behavior is constant in more than one run

GoF statistical toolkit
Qualitative evaluation Quantitative evaluation A project to develop a statistical comparison system Detector monitoring in order to check if the behavior is constant in more than one run Comparison of distributions Goodness of fit testing

Software process guidelines
United Software Development Process, specifically tailored to the project practical guidance and tools from the RUP both rigorous and lightweight mapping onto ISO 15504 Guidance from ISO 15504 Incremental and iterative life cycle model SPIRAL APPROACH

Architectural guidelines
The project adopts a solid architectural approach to offer the functionality and the quality needed by the users to be maintainable over a large time scale to be extensible, to accommodate future evolutions of the requirements Component-based approach to facilitate re-use and integration in different frameworks AIDA adopt a (HEP) standard no dependence on any specific analysis tool

The algorithms are specialised on the kind of distribution
(binned/unbinned) Every algorithm has been rigorously tested! Documentation available:

Chi-squared test Applies to binned distributions
It can be useful also in case of unbinned distributions, but the data must be grouped into classes Cannot be applied if the counting of the theoretical frequencies in each class is < 5 When this is not the case, one could try to unify contiguous classes until the minimum theoretical frequency is reached Otherwise one could use Yates formula

More sophisticated algorithms
unbinned distributions Kolmogorov-Smirnov test Goodman approximation of KS test Kuiper test EMPIRICAL DISTRIBUTION FUNCTION ORIGINAL DISTRIBUTIONS Dmn SUPREMUM STATISTICS

More powerful algorithms
unbinned distributions Cramer-von Mises test Anderson-Darling test TESTS CONTAINING A WEIGHTING FUNCTION These algorithms are so powerful that we decided to implement their equivalent in case of binned distributions: binned distributions Fisz-Cramer-von Mises test k-sample Anderson-Darling test

2 Is 2 the most powerful algorithm? In terms of power:
The power of a test is the probability of rejecting the null hypothesis correctly In terms of power: 2 Supremum statistics tests Tests containing a weight function < 2 loses information in a test for unbinned distribution by grouping the data into cells Kac, Kiefer and Wolfowitz (1955) showed that Kolmogorov-Smirnov test requires n4/5 observations compared to n observations for 2 to attain the same power Cramer-von Mises and Anderson-Darling statistics are expected to be superior to Kolmogorov-Smirnov’s, since they make a comparison of the two distributions all along the range of x, rather than looking for a marked difference at one point

EXTRACTS THE ALGORITHM WRITING ONE LINE OF CODE
User’s point of view Simple user layer Only deal with AIDA objects and choice of comparison algorithm The user is completely shielded from both statistical and computing complexity. STATISTICAL RESULT TOOLKIT USER EXTRACTS THE ALGORITHM WRITING ONE LINE OF CODE

Examples of practical applications

are statistically comparable with
Microscopic validation of physics NIST Geant4 Standard Geant4 LowE 2N-S= =28 p=1 2N-L= =28 p=1 2N-S= =28 p=1 2N-L=1.928 =28 p=1 2N-S=0.373 =28 p=1 2N-L= =28 p=1 Geant4 simulations are statistically comparable with reference data (NIST database Chi-squared test

X-ray fluorescence spectrum in Iceand basalt
Test beam at Bessy Bepi-Colombo mission Energy (keV) Counts X-ray fluorescence spectrum in Iceand basalt (EIN=6.5 keV) Very complex distributions c2 not appropriate (< 5 entries in some bins, physical information would be lost if rebinned) Experimental measurements are comparable with Geant4 simulations Anderson-Darling Ac (95%) =0.752

Medical applications-hadron therapy
DEXP-GEANT4=0.11 p=n.s. 2EXP-GEANT4=3.8 =2 p=n.s. KOLMOGOROV-SMIRNOV Goodman approximation KOLMOGOROV-SMIRNOV Experimental measurements are comparable with Geant4 simulations

Conclusions Applications in: HEP, astrophysics, medical physics, …
This is a new up-to-date easy to handle and powerful tool for statistical comparison in particle physics. It the first tool supplying such a variety of sophisticated and powerful statistical tests in HEP. AIDA interfaces allow its integration in any other data analysis tool. Applications in: HEP, astrophysics, medical physics, …

Data analysis in HEP: a statistical toolkit

Similar presentations

Presentation on theme: "Data analysis in HEP: a statistical toolkit"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Data analysis in HEP: a statistical toolkit

Similar presentations

Presentation on theme: "Data analysis in HEP: a statistical toolkit"— Presentation transcript:

Similar presentations

About project

Feedback