Download presentation

Presentation is loading. Please wait.

Published byHayden Connolly Modified over 4 years ago

1
**Statistical Toolkit Power of Goodness-of-Fit tests**

B. Mascialino1, A. Pfeiffer2, M.G. Pia1, A. Ribon2, P. Viarengo3 1INFN Genova, Italy 2CERN, Geneva, Switzerland 3IST – National Institute for Cancer Research, Genova, Italy Fluorescence spectrum of Icelandic Basalt 8.3 keV beam Counts Energy (keV) CHEP 2006 Mumbai, February 2006

2
**Historical background…**

Validation of Geant4 physics models through comparison of simulation vs. experimental data or reference databases Some use cases The test statistics computation concerns the agreement between the two samples’ empirical distribution functions Regression testing Throughout the software life-cycle Online DAQ Monitoring detector behaviour w.r.t. a reference Simulation validation Comparison with experimental data Reconstruction Comparison of reconstructed vs. expected distributions Physics analysis Comparisons of experimental distributions (ATLAS vs. CMS Higgs?) Comparison with theoretical distributions (data vs. Standard Model)

3
**“A Goodness-of-Fit Statistical Toolkit”**

Releases are publicly downloadable from the web code, documentation etc. Releases are also distributed with LCG Mathematical Libraries Also ported to Java, distributed with JAS G.A.P Cirrone, S. Donadio, S. Guatelli, A. Mantero, B. Mascialino, S. Parlati, M.G. Pia, A. Pfeiffer, A. Ribon, P. Viarengo “A Goodness-of-Fit Statistical Toolkit” IEEE- Transactions on Nuclear Science (2004), 51 (5):

4
**Flexible, extensible, maintainable system**

Vision of the project Basic vision General purpose tool Toolkit approach (choice open to users) Open source product Independent from specific analysis tools Easily usable in analysis and other tools Clearly define scope, objectives Rigorous software process Software quality Flexible, extensible, maintainable system Build on a solid architecture

6
**GoF algorithms (latest public release)**

Algorithms for binned distributions Anderson-Darling test Chi-squared test Fisz-Cramer-von Mises test Algorithms for unbinned distributions Cramer-von Mises test Goodman test (Kolmogorov-Smirnov test in chi-squared approximation) Kolmogorov-Smirnov test Kuiper test

7
**Recent extensions: algorithms**

Improved tests Fisz-Cramer-von Mises test and Anderson-Darling test exact asymptotic distribution (earlier: critical values) Tiku test Cramer-von Mises test in a chi-squared approximation New tests Weighted Kolmogorov-Smirnov, weighted Cramer-von Mises various weighting functions available in literature Watson test can be applied in case of cyclic observations, like the Kuiper test In preparation Girone test It is the most complete software for the comparison of two distributions, even among commercial/professional statistics tools goal: provide all 2-sample GoF algorithms existing in statistics literature Publication in preparation to describe the new algorithms Software release: March 2006

8
**User Layer Simple user layer**

Shields the user from the complexity of the underlying algorithms and design Only deal with the user’s analysis objects and choice of comparison algorithm First release: user layer for AIDA analysis objects LCG Architecture Blueprint, Geant4 requirement July 2005: added user layer for ROOT histograms in response to user requirements Other user layer implementations foreseen easy to add sound architecture decouples the mathematical component and the user’s representation of analysis objects

9
Power of GoF tests Do we really need such a wide collection of GoF tests? Why? Which is the most appropriate test to compare two distributions? How “good” is a test at recognizing real equivalent distributions and rejecting fake ones? Which test to use? No comprehensive study of the relative power of GoF tests exists in literature novel research in statistics (not only in physics data analysis!) Systematic study of all existing GoF tests in progress made possible by the extensive collection of tests in the Statistical Toolkit

10
**two parent distributions**

Method for the evaluation of power Confidence Level = 0.05 Parent distribution 1 Sample 1 n Sample 2 m GoF test Parent distribution 2 Pseudoexperiment: a random drawing of two samples from two parent distributions N=1000 Monte Carlo replicas Power = # pseudoexperiments with p-value < (1-CL) # pseudoexperiments For each test, the p-value computed by the GoF Toolkit derives from the analytical calculation of the asymptotic distribution, often depending on the samples sizes

11
**Parent distributions Uniform Gaussian Exponential Double exponential**

Cauchy Contaminated Normal Distribution 1 Also Breit-Wigner, other distributions being considered Contaminated Normal Distribution 2

12
**Characterization of distributions**

Skewness Tailweight Parent distribution S T f1(x) Uniform 1 1.267 f2(x) Gaussian 1.704 f3(x) Double exponential 2.161 f4(x) Cauchy 5.263 f5(x) Exponential 4.486 1.883 f6(x) Contamined normal 1 1.991 f7(x) Contamined normal 2 1.769 1.693

13
**General alternative Compare different distributions**

Unbinned distributions General alternative Compare different distributions Parent1 ≠ Parent2

14
**The power increases as a function of the sample size**

FLAT vs EXPONENTIAL Sample size Empirical power (%) Symmetric skewed Short tailed Medium tailed Skewed AD K W KS CvM DOUBLE EXPONENTIAL CN1 Simmetric No clear winner CN2

15
**The power increases as a function of the sample size**

GAUSSIAN vs DOUBLE EXPONENTIAL Samples size Empirical power (%) Very similar distributions Simmetric Medium tailed Long tailed Sample size Empirical power (%) CAUCHY vs EXPONENTIAL Long tailed Medium tailed Symmetric Asymmetric AD GAUSSIAN DOUBLE EXPONENTIAL CvM CN2 skewed KS K W CN1

16
**The power varies as a function of the parent distributions’ characteristics**

Samples size = 15 Tailweight 2ND distribution Empirical power (%) EXPONENTIAL vs OTHER DISTRIBUTIONS Samples size = 5 EXPONENTIAL vs OTHER DISTRIBUTIONS Tailweight 2ND distribution Empirical power (%) Sample size = 15 Tailweight 2ND distribution Empirical power (%) EXPONENTIAL vs OTHER DISTRIBUTIONS Sample size = 5 EXPONENTIAL vs OTHER DISTRIBUTIONS Tailweight 2ND distribution Empirical power (%) Distribution1 asymmetric KS CvM AD K W Samples size = 15 Samples size = 5 Empirical power (%) Tailweight 2ND distribution Empirical power (%) Tailweight 2ND distribution Empirical power (%) Tailweight 2ND distribution Empirical power (%) Tailweight 2ND distribution FLAT vs OTHER DISTRIBUTIONS Distribution1 symmetric FLAT vs OTHER DISTRIBUTIONS FLAT vs OTHER DISTRIBUTIONS FLAT vs OTHER DISTRIBUTIONS

17
**The power varies as a function of parent distributions’ characteristics**

Sample size = 15 Tailweight 2ND distribution Empirical power (%) EXPONENTIAL vs OTHER DISTRIBUTIONS Sample size = 5 EXPONENTIAL vs OTHER DISTRIBUTIONS Tailweight 2ND distribution Empirical power (%) Distribution1 asymmetric KS CvM AD K W Sample size = 15 Sample size = 5 Empirical power (%) Tailweight 2ND distribution Empirical power (%) Tailweight 2ND distribution FLAT vs OTHER DISTRIBUTIONS Distribution1 symmetric FLAT vs OTHER DISTRIBUTIONS

18
**Comparative evaluation of tests**

Preliminary Tailweight Short (T<1.5) Medium (1.5 < T < 2) Long (T>2) S~1 KS KS – CVM CVM - AD S>1.5 KS - AD CVM-AD Skewness

19
**Location-scale alternative Same distribution, shifted or scaled**

Parent1(x) = Parent2 ((x-θ)/τ)

20
**Power increases as a function of sample size**

Empirical power (%) Sample size EXPONENTIAL θ =0.5, τ = 0.5 Empirical power (%) Sample size θ =0.5, τ = 1.5 EXPONENTIAL CvM KS CvM AD K W KS K W Empirical power (%) Sample size EXPONENTIAL θ =1.0, τ = 0.5 Sample size Empirical power (%) EXPONENTIAL θ =1.0, τ = 1.5 CvM No clear winner KS

21
**Power increases as a function of sample size**

θ =0.5, τ = 0.5 θ =0.5, τ = 1.5 CN2 CN2 W K Empirical power (%) Empirical power (%) KS CvM AD K W Sample size Sample size θ =1.0, τ = 0.5 θ =1.0, τ = 1.5 CN2 CN2 Empirical power (%) No clear winner Empirical power (%) Sample size Sample size

22
**Power decreases as a function of tailweight**

Empirical power (%) Tailweight θ =0.5, τ = 1.5 N=10 Tailweight Empirical power (%) θ =0.5, τ = 0.5 N=10 No clear winner KS CvM AD K W Tailweight Empirical power (%) θ =1.0, τ = 0.5 N=10 Tailweight Empirical power (%) θ =1.0, τ = 1.5 N=10 No clear winner

23
**General alternative Compare different distributions**

Binned distributions General alternative Compare different distributions Parent1 ≠ Parent2

24
**Chi-squared test: POWER %**

Norm DoubleExp Cauchy CN1 20 63 35 21 31 16 CN2 100 99 55 86 Samples size = 500 Number of bins = 20

25
Preliminary results No clear winner for all the considered distributions in general the performance of a test depends on its intrinsic features as well as on the features of the distributions to be compared Practical recommendations first classify the type of the distributions in terms of skewness and tailweight choose the most appropriate test given the type of distributions Systematic study of the power in progress for both binned and unbinned distributions Topic still subject to research activity in the domain of statistics Publication in preparation

26
**Surprise… Flat Gaussian Exponential KS KS CvM CvM AD AD K K W W KS**

Inefficiency Flat Inefficiency KS CvM AD K W KS CvM AD K W Gaussian KS CvM AD K W Inefficiency Exponential General alternative, same distributions CL = 95%, expect 5% inefficiency Anderson-Darling exhibits an unexpected inefficiency at low numerosity Not documented anywhere in literature! Limitation of applicability?

27
**Outlook 1-sample GoF tests (comparison w.r.t. a function)**

Comparison of two/multi-dimensional distributions Systematic study of the power of GoF tests Goal to provide an extensive set of algorithms so far published in statistics literature, with a critical evaluation of their relative strengths and applicability Treatment of errors, filtering New release coming soon New papers in preparation Other components beyond GoF? Suggestions are welcome…

28
Conclusions A novel, complete software toolkit for statistical analysis is being developed rich set of algorithms sound architectural design rigorous software process A systematic study of the power of GoF tests is in progress unexplored area of research Application in various domains Geant4, HEP, space science, medicine… Feedback and suggestions are very much appreciated The project is open to developers interested in statistical methods

29
**IEEE Transactions on Nuclear Science http://ieeexplore. ieee**

Prime journal on technology in particle/nuclear physics Review process reorganized about one year ago Associate Editor dedicated to computing papers Various papers associated to CHEP 2004 published on IEEE TNS Papers associated to CHEP 2006 are welcome Manuscript submission: Papers submitted for publication will be subject to the regular review process Publications on refereed journals are beneficial not only to authors, but to the whole community of computing-oriented physicists Our “hardware colleagues” have better established publication habits… Further info:

Similar presentations

OK

1 COMPARISON BETWEEN PLATO ISODOSE DISTRIBUTION OF A 192 IR SOURCE AND THOSE SIMULATED WITH GEANT4 TOOLKIT F. Foppiano 1, S. Agostinelli 1, S. Garelli.

1 COMPARISON BETWEEN PLATO ISODOSE DISTRIBUTION OF A 192 IR SOURCE AND THOSE SIMULATED WITH GEANT4 TOOLKIT F. Foppiano 1, S. Agostinelli 1, S. Garelli.

© 2018 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google