Statistical Toolkit Power of Goodness-of-Fit tests

Statistical Toolkit Power of Goodness-of-Fit tests
B. Mascialino1, A. Pfeiffer2, M.G. Pia1, A. Ribon2, P. Viarengo3 1INFN Genova, Italy 2CERN, Geneva, Switzerland 3IST – National Institute for Cancer Research, Genova, Italy Fluorescence spectrum of Icelandic Basalt 8.3 keV beam Counts Energy (keV) CHEP 2006 Mumbai, February 2006

Historical background…
Validation of Geant4 physics models through comparison of simulation vs. experimental data or reference databases Some use cases The test statistics computation concerns the agreement between the two samples’ empirical distribution functions Regression testing Throughout the software life-cycle Online DAQ Monitoring detector behaviour w.r.t. a reference Simulation validation Comparison with experimental data Reconstruction Comparison of reconstructed vs. expected distributions Physics analysis Comparisons of experimental distributions (ATLAS vs. CMS Higgs?) Comparison with theoretical distributions (data vs. Standard Model)

“A Goodness-of-Fit Statistical Toolkit”
Releases are publicly downloadable from the web code, documentation etc. Releases are also distributed with LCG Mathematical Libraries Also ported to Java, distributed with JAS G.A.P Cirrone, S. Donadio, S. Guatelli, A. Mantero, B. Mascialino, S. Parlati, M.G. Pia, A. Pfeiffer, A. Ribon, P. Viarengo “A Goodness-of-Fit Statistical Toolkit” IEEE- Transactions on Nuclear Science (2004), 51 (5):

Flexible, extensible, maintainable system
Vision of the project Basic vision General purpose tool Toolkit approach (choice open to users) Open source product Independent from specific analysis tools Easily usable in analysis and other tools Clearly define scope, objectives Rigorous software process Software quality Flexible, extensible, maintainable system Build on a solid architecture

GoF algorithms (latest public release)
Algorithms for binned distributions Anderson-Darling test Chi-squared test Fisz-Cramer-von Mises test Algorithms for unbinned distributions Cramer-von Mises test Goodman test (Kolmogorov-Smirnov test in chi-squared approximation) Kolmogorov-Smirnov test Kuiper test

Recent extensions: algorithms
Improved tests Fisz-Cramer-von Mises test and Anderson-Darling test exact asymptotic distribution (earlier: critical values) Tiku test Cramer-von Mises test in a chi-squared approximation New tests Weighted Kolmogorov-Smirnov, weighted Cramer-von Mises various weighting functions available in literature Watson test can be applied in case of cyclic observations, like the Kuiper test In preparation Girone test It is the most complete software for the comparison of two distributions, even among commercial/professional statistics tools goal: provide all 2-sample GoF algorithms existing in statistics literature Publication in preparation to describe the new algorithms Software release: March 2006

User Layer Simple user layer
Shields the user from the complexity of the underlying algorithms and design Only deal with the user’s analysis objects and choice of comparison algorithm First release: user layer for AIDA analysis objects LCG Architecture Blueprint, Geant4 requirement July 2005: added user layer for ROOT histograms in response to user requirements Other user layer implementations foreseen easy to add sound architecture decouples the mathematical component and the user’s representation of analysis objects

Power of GoF tests Do we really need such a wide collection of GoF tests? Why? Which is the most appropriate test to compare two distributions? How “good” is a test at recognizing real equivalent distributions and rejecting fake ones? Which test to use? No comprehensive study of the relative power of GoF tests exists in literature novel research in statistics (not only in physics data analysis!) Systematic study of all existing GoF tests in progress made possible by the extensive collection of tests in the Statistical Toolkit

two parent distributions
Method for the evaluation of power Confidence Level = 0.05 Parent distribution 1 Sample 1 n Sample 2 m GoF test Parent distribution 2 Pseudoexperiment: a random drawing of two samples from two parent distributions N=1000 Monte Carlo replicas Power = # pseudoexperiments with p-value < (1-CL) # pseudoexperiments For each test, the p-value computed by the GoF Toolkit derives from the analytical calculation of the asymptotic distribution, often depending on the samples sizes

Parent distributions Uniform Gaussian Exponential Double exponential
Cauchy Contaminated Normal Distribution 1 Also Breit-Wigner, other distributions being considered Contaminated Normal Distribution 2

Characterization of distributions
Skewness Tailweight Parent distribution S T f1(x) Uniform 1 1.267 f2(x) Gaussian 1.704 f3(x) Double exponential 2.161 f4(x) Cauchy 5.263 f5(x) Exponential 4.486 1.883 f6(x) Contamined normal 1 1.991 f7(x) Contamined normal 2 1.769 1.693

General alternative Compare different distributions
Unbinned distributions General alternative Compare different distributions Parent1 ≠ Parent2

The power increases as a function of the sample size
FLAT vs EXPONENTIAL Sample size Empirical power (%) Symmetric skewed Short tailed Medium tailed Skewed AD K W KS CvM DOUBLE EXPONENTIAL CN1 Simmetric No clear winner CN2

The power increases as a function of the sample size
GAUSSIAN vs DOUBLE EXPONENTIAL Samples size Empirical power (%) Very similar distributions Simmetric Medium tailed Long tailed Sample size Empirical power (%) CAUCHY vs EXPONENTIAL Long tailed Medium tailed Symmetric Asymmetric AD GAUSSIAN DOUBLE EXPONENTIAL CvM CN2 skewed KS K W CN1

The power varies as a function of the parent distributions’ characteristics
Samples size = 15 Tailweight 2ND distribution Empirical power (%) EXPONENTIAL vs OTHER DISTRIBUTIONS Samples size = 5 EXPONENTIAL vs OTHER DISTRIBUTIONS Tailweight 2ND distribution Empirical power (%) Sample size = 15 Tailweight 2ND distribution Empirical power (%) EXPONENTIAL vs OTHER DISTRIBUTIONS Sample size = 5 EXPONENTIAL vs OTHER DISTRIBUTIONS Tailweight 2ND distribution Empirical power (%) Distribution1 asymmetric KS CvM AD K W Samples size = 15 Samples size = 5 Empirical power (%) Tailweight 2ND distribution Empirical power (%) Tailweight 2ND distribution Empirical power (%) Tailweight 2ND distribution Empirical power (%) Tailweight 2ND distribution FLAT vs OTHER DISTRIBUTIONS Distribution1 symmetric FLAT vs OTHER DISTRIBUTIONS FLAT vs OTHER DISTRIBUTIONS FLAT vs OTHER DISTRIBUTIONS

The power varies as a function of parent distributions’ characteristics
Sample size = 15 Tailweight 2ND distribution Empirical power (%) EXPONENTIAL vs OTHER DISTRIBUTIONS Sample size = 5 EXPONENTIAL vs OTHER DISTRIBUTIONS Tailweight 2ND distribution Empirical power (%) Distribution1 asymmetric KS CvM AD K W Sample size = 15 Sample size = 5 Empirical power (%) Tailweight 2ND distribution Empirical power (%) Tailweight 2ND distribution FLAT vs OTHER DISTRIBUTIONS Distribution1 symmetric FLAT vs OTHER DISTRIBUTIONS

Comparative evaluation of tests
Preliminary Tailweight Short (T<1.5) Medium (1.5 < T < 2) Long (T>2) S~1 KS KS – CVM CVM - AD S>1.5 KS - AD CVM-AD Skewness

Location-scale alternative Same distribution, shifted or scaled
Parent1(x) = Parent2 ((x-θ)/τ)

Power increases as a function of sample size
Empirical power (%) Sample size EXPONENTIAL θ =0.5, τ = 0.5 Empirical power (%) Sample size θ =0.5, τ = 1.5 EXPONENTIAL CvM KS CvM AD K W KS K W Empirical power (%) Sample size EXPONENTIAL θ =1.0, τ = 0.5 Sample size Empirical power (%) EXPONENTIAL θ =1.0, τ = 1.5 CvM No clear winner KS

Power increases as a function of sample size
θ =0.5, τ = 0.5 θ =0.5, τ = 1.5 CN2 CN2 W K Empirical power (%) Empirical power (%) KS CvM AD K W Sample size Sample size θ =1.0, τ = 0.5 θ =1.0, τ = 1.5 CN2 CN2 Empirical power (%) No clear winner Empirical power (%) Sample size Sample size

Power decreases as a function of tailweight
Empirical power (%) Tailweight θ =0.5, τ = 1.5 N=10 Tailweight Empirical power (%) θ =0.5, τ = 0.5 N=10 No clear winner KS CvM AD K W Tailweight Empirical power (%) θ =1.0, τ = 0.5 N=10 Tailweight Empirical power (%) θ =1.0, τ = 1.5 N=10 No clear winner

General alternative Compare different distributions
Binned distributions General alternative Compare different distributions Parent1 ≠ Parent2

Chi-squared test: POWER %
Norm DoubleExp Cauchy CN1 20 63 35 21 31 16 CN2 100 99 55 86 Samples size = 500 Number of bins = 20

Preliminary results No clear winner for all the considered distributions in general the performance of a test depends on its intrinsic features as well as on the features of the distributions to be compared Practical recommendations first classify the type of the distributions in terms of skewness and tailweight choose the most appropriate test given the type of distributions Systematic study of the power in progress for both binned and unbinned distributions Topic still subject to research activity in the domain of statistics Publication in preparation

Surprise… Flat Gaussian Exponential KS KS CvM CvM AD AD K K W W KS
Inefficiency Flat Inefficiency KS CvM AD K W KS CvM AD K W Gaussian KS CvM AD K W Inefficiency Exponential General alternative, same distributions CL = 95%, expect 5% inefficiency Anderson-Darling exhibits an unexpected inefficiency at low numerosity Not documented anywhere in literature! Limitation of applicability?

Outlook 1-sample GoF tests (comparison w.r.t. a function)
Comparison of two/multi-dimensional distributions Systematic study of the power of GoF tests Goal to provide an extensive set of algorithms so far published in statistics literature, with a critical evaluation of their relative strengths and applicability Treatment of errors, filtering New release coming soon New papers in preparation Other components beyond GoF? Suggestions are welcome…

Conclusions A novel, complete software toolkit for statistical analysis is being developed rich set of algorithms sound architectural design rigorous software process A systematic study of the power of GoF tests is in progress unexplored area of research Application in various domains Geant4, HEP, space science, medicine… Feedback and suggestions are very much appreciated The project is open to developers interested in statistical methods

IEEE Transactions on Nuclear Science http://ieeexplore. ieee
Prime journal on technology in particle/nuclear physics Review process reorganized about one year ago Associate Editor dedicated to computing papers Various papers associated to CHEP 2004 published on IEEE TNS Papers associated to CHEP 2006 are welcome Manuscript submission: Papers submitted for publication will be subject to the regular review process Publications on refereed journals are beneficial not only to authors, but to the whole community of computing-oriented physicists Our “hardware colleagues” have better established publication habits… Further info:

Statistical Toolkit Power of Goodness-of-Fit tests

Similar presentations

Presentation on theme: "Statistical Toolkit Power of Goodness-of-Fit tests"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Statistical Toolkit Power of Goodness-of-Fit tests

Similar presentations

Presentation on theme: "Statistical Toolkit Power of Goodness-of-Fit tests"— Presentation transcript:

Similar presentations

About project

Feedback