Data analysis in HEP: a statistical toolkit

Slides:



Advertisements
Similar presentations
Statistical Toolkit Power of Goodness-of-Fit tests
Advertisements

Maria Grazia Pia, INFN Genova Statistical Testing Project Maria Grazia Pia, INFN Genova on behalf of the Statistical Testing Team
What is Chi-Square? Used to examine differences in the distributions of nominal data A mathematical comparison between expected frequencies and observed.
Chapter 18: The Chi-Square Statistic
1 Chi-Square Test -- X 2 Test of Goodness of Fit.
Maria Grazia Pia Geant4 LowE Workshop 30-31/5/2002 ow Energy e.m. Workshop CERN, May 2002.
Simulation of X-ray Fluorescence and Application to Planetary Astrophysics A. Mantero, M. Bavdaz, A. Owens, A. Peacock, M. G. Pia IEEE NSS -- Portland,
Maria Grazia Pia, INFN Genova Atomic Relaxation Models A. Mantero, B. Mascialino, Maria Grazia Pia INFN Genova, Italy P. Nieminen ESA/ESTEC
Geant4-Genova Group Validation of Susanna Guatelli, Alfonso Mantero, Barbara Mascialino, Maria Grazia Pia, Valentina Zampichelli INFN Genova, Italy IEEE.
Barbara MascialinoIEEE-NSSOctober 21 th, 2004 Application of statistical methods for the comparison of data distributions Susanna Guatelli, Barbara Mascialino,
Barbara Mascialino, INFN Genova An update on the Goodness of Fit Statistical Toolkit B. Mascialino, A. Pfeiffer, M.G. Pia, A. Ribon, P. Viarengo
Maria Grazia Pia, INFN Genova A Toolkit for Statistical Data Analysis M.G. Pia S. Donadio, F. Fabozzi, L. Lista, S. Guatelli, B. Mascialino, A. Pfeiffer,
Maria Grazia Pia, INFN Genova Test & Analysis Project Maria Grazia Pia, INFN Genova on behalf of the T&A team
Maria Grazia Pia, INFN Genova CERN, 26 July 2004 Background of the Project.
1 M.G. Pia et al. The application of GEANT4 simulation code for brachytherapy treatment Maria Grazia Pia INFN Genova, Italy and CERN/IT
Maria Grazia Pia, INFN Genova Low Energy Electromagnetic Physics Maria Grazia Pia INFN Genova
Version 4 for Windows NEX T. Welcome to SphinxSurvey Version 4,4, the integrated solution for all your survey needs... Question list Questionnaire Design.
The Chi-square Statistic. Goodness of fit 0 This test is used to decide whether there is any difference between the observed (experimental) value and.
Comparison of data distributions: the power of Goodness-of-Fit Tests
Simulation – Stat::Fit
IEEE Nuclear Science Symposium and Medical Imaging Conference Short Course The Geant4 Simulation Toolkit Sunanda Banerjee (Saha Inst. Nucl. Phys., Kolkata,
OOAD… LowE Electrons From HEP computing to medical research and vice versa Bidirectional From HEP computing to medical research and vice versa Bidirectional.
Maria Grazia Pia, INFN Genova Test & Analysis Project aka “statistical testing” Maria Grazia Pia, INFN Genova on behalf of the T&A team
Provide tools for the statistical comparison of distributions  equivalent reference distributions  experimental measurements  data from reference sources.
Alberto Ribon, CERN Statistical Testing Project Alberto Ribon, CERN on behalf of the Statistical Testing Team CLHEP Workshop CERN, 28 January 2003.
Maria Grazia Pia, INFN Genova Statistical Toolkit Recent updates M.G. Pia B. Mascialino, A. Pfeiffer, M.G. Pia, A. Ribon, P. Viarengo
Susanna Guatelli & Barbara Mascialino G.A.P. Cirrone (INFN LNS), G. Cuttone (INFN LNS), S. Donadio (INFN,Genova), S. Guatelli (INFN Genova), M. Maire (LAPP),
Geant4 Space User Workshop 2004 Maria Grazia Pia, INFN Genova Proposal of a Space Radiation Environment Generator interfaced to Geant4 S. Guatelli 1, P.
IEEE Nuclear Science Symposium and Medical Imaging Conference Short Course The Geant4 Simulation Toolkit Sunanda Banerjee (Saha Inst. Nucl. Phys., Kolkata,
1 Ch. 1: Software Development (Read) 5 Phases of Software Life Cycle: Problem Analysis and Specification Design Implementation (Coding) Testing, Execution.
Detector Simulation Presentation # 3 Nafisa Tasneem CHEP,KNU  How to do HEP experiment  What is detector simulation?
An update on the Statistical Toolkit Barbara Mascialino, Maria Grazia Pia, Andreas Pfeiffer, Alberto Ribon, Paolo Viarengo July 19 th, 2005.
Maria Grazia Pia, INFN Genova Update on the Goodness of Fit Toolkit M.G. Pia B. Mascialino, A. Pfeiffer, M.G. Pia, A. Ribon, P. Viarengo
Introduction What is detector simulation? A detector simulation program must provide the possibility of describing accurately an experimental setup (both.
Maria Grazia Pia, INFN Genova Statistics Toolkit Project Maria Grazia Pia, INFN Genova AIDA Workshop.
The Statistical Testing Project Stefania Donadio and Barbara Mascialino January 15 TH, 2003.
Barbara MascialinoMonte Carlo 2005Chattanooga, April 19 th 2005 Monte Carlo Chattanooga, April 2005 B. Mascialino, A. Pfeiffer, M. G. Pia, A. Ribon,
OBJECT ORIENTED VS STRUCTURED WHICH ONE IS YOUR CHOICE.
A Short Course on Geant4 Simulation Toolkit Introduction
1 SLAC simulation workshop, May 2003 Ties Behnke Mokka and LCDG4 Ties Behnke, DESY and SLAC MOKKA: european (france) developed GEANT4 based simulation.
Sokhna Bineta Lo Amar Advisor: Prof. Oumar Ka, UCAD Co-Advisor: Dr. Paul Guèye, Hampton Univ./JLab/FRIB Cheikh Anta Diop University, Dakar (Sénégal) HUGS_2016.
Sokhna Bineta Lo Amar Advisor: Prof. Oumar Ka, UCAD
Chapter 9: Non-parametric Tests
Models for the Simulation of X-Ray Fluorescence and PIXE
Update on the Goodness of Fit Toolkit
Potential use of JAS/JAIDA etc. SAS J2EE Review
Transient Sources Simulation and “GRBSpectrum”
Goodness-of-Fit Tests
A Statistical Toolkit for Data Analysis
Goodness of Fit Tests The goal of goodness of fit tests is to test if the data comes from a certain distribution. There are various situations to which.
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8…
Nuclear Physics Data Management Needs Bruce G. Gibbard
Gamma Ray Satellites Simulations with Geant4
Data analysis in HEP: a statistical toolkit
Modelling Input Data Chapter5.
Basic analysis Process the data validation editing coding data entry
B.Mascialino, A.Pfeiffer, M.G.Pia, A.Ribon, P.Viarengo
Hadronic physics validation of Geant4
Short Course Siena, 5-6 October 2006
The Hadrontherapy Geant4 advanced example
Contingency tables and goodness of fit
An update on the Goodness of Fit Statistical Toolkit
A Short Course on Geant4 Simulation Toolkit Introduction
Validating a Random Number Generator
Chi-Squared AP Biology.
Advanced Examples Alex Howard, Imperial College, UK
Statistical Testing Project
Comparison of data distributions: the power of Goodness-of-Fit Tests
Model selection and fitting
15 Chi-Square Tests Chi-Square Test for Independence
Presentation transcript:

Data analysis in HEP: a statistical toolkit S.Donadio, S.Guatelli, B.Mascialino, A.Pfeiffer, M.G.Pia, A.Ribon, P.Viarengo 1st Workshop on Italy-Japan Collaboration on Geant4 Medical Application

Data analysis in HEP Detector monitoring Simulation validation Provide tools for the statistical comparison of distributions equivalent reference distributions experimental measurements data from reference sources functions deriving from theoretical calculations or fits Detector monitoring Simulation validation Reconstruction vs. expectation Regression testing Physics analysis Detector monitoring in order to check if the behavior is constant in more than one run

GoF statistical toolkit Qualitative evaluation Quantitative evaluation A project to develop a statistical comparison system Detector monitoring in order to check if the behavior is constant in more than one run Comparison of distributions Goodness of fit testing

Software process guidelines United Software Development Process, specifically tailored to the project practical guidance and tools from the RUP both rigorous and lightweight mapping onto ISO 15504 Guidance from ISO 15504 Incremental and iterative life cycle model SPIRAL APPROACH

Architectural guidelines The project adopts a solid architectural approach to offer the functionality and the quality needed by the users to be maintainable over a large time scale to be extensible, to accommodate future evolutions of the requirements Component-based approach to facilitate re-use and integration in different frameworks AIDA adopt a (HEP) standard no dependence on any specific analysis tool

The algorithms are specialised on the kind of distribution (binned/unbinned) Every algorithm has been rigorously tested! Documentation available: http://www.ge.infn.it/geant4/analysis/HEPstatistics/

Chi-squared test Applies to binned distributions It can be useful also in case of unbinned distributions, but the data must be grouped into classes Cannot be applied if the counting of the theoretical frequencies in each class is < 5 When this is not the case, one could try to unify contiguous classes until the minimum theoretical frequency is reached Otherwise one could use Yates formula

More sophisticated algorithms unbinned distributions Kolmogorov-Smirnov test Goodman approximation of KS test Kuiper test EMPIRICAL DISTRIBUTION FUNCTION ORIGINAL DISTRIBUTIONS Dmn SUPREMUM STATISTICS

More powerful algorithms unbinned distributions Cramer-von Mises test Anderson-Darling test TESTS CONTAINING A WEIGHTING FUNCTION These algorithms are so powerful that we decided to implement their equivalent in case of binned distributions: binned distributions Fisz-Cramer-von Mises test k-sample Anderson-Darling test

2 Is 2 the most powerful algorithm? In terms of power: The power of a test is the probability of rejecting the null hypothesis correctly In terms of power: 2 Supremum statistics tests Tests containing a weight function < 2 loses information in a test for unbinned distribution by grouping the data into cells Kac, Kiefer and Wolfowitz (1955) showed that Kolmogorov-Smirnov test requires n4/5 observations compared to n observations for 2 to attain the same power Cramer-von Mises and Anderson-Darling statistics are expected to be superior to Kolmogorov-Smirnov’s, since they make a comparison of the two distributions all along the range of x, rather than looking for a marked difference at one point

EXTRACTS THE ALGORITHM WRITING ONE LINE OF CODE User’s point of view Simple user layer Only deal with AIDA objects and choice of comparison algorithm The user is completely shielded from both statistical and computing complexity. STATISTICAL RESULT TOOLKIT USER EXTRACTS THE ALGORITHM WRITING ONE LINE OF CODE

Examples of practical applications

are statistically comparable with Microscopic validation of physics NIST Geant4 Standard Geant4 LowE 2N-S=0.267 =28 p=1 2N-L=1.315 =28 p=1 2N-S=0.532 =28 p=1 2N-L=1.928 =28 p=1 2N-S=0.373 =28 p=1 2N-L= 5.882 =28 p=1 Geant4 simulations are statistically comparable with reference data (NIST database http://www.nist.gov) Chi-squared test

X-ray fluorescence spectrum in Iceand basalt Test beam at Bessy Bepi-Colombo mission Energy (keV) Counts X-ray fluorescence spectrum in Iceand basalt (EIN=6.5 keV) Very complex distributions c2 not appropriate (< 5 entries in some bins, physical information would be lost if rebinned) Experimental measurements are comparable with Geant4 simulations Anderson-Darling Ac (95%) =0.752

Medical applications-hadron therapy DEXP-GEANT4=0.11 p=n.s. 2EXP-GEANT4=3.8 =2 p=n.s. KOLMOGOROV-SMIRNOV Goodman approximation KOLMOGOROV-SMIRNOV Experimental measurements are comparable with Geant4 simulations

Conclusions Applications in: HEP, astrophysics, medical physics, … This is a new up-to-date easy to handle and powerful tool for statistical comparison in particle physics. It the first tool supplying such a variety of sophisticated and powerful statistical tests in HEP. AIDA interfaces allow its integration in any other data analysis tool. Applications in: HEP, astrophysics, medical physics, …