Presentation is loading. Please wait.

Presentation is loading. Please wait.

Maria Grazia Pia, INFN Genova Statistical Toolkit Power of Goodness-of-Fit tests M.G. Pia 1 B. Mascialino 1, A. Pfeiffer 2, M.G. Pia 1, A. Ribon 2, P.

Similar presentations


Presentation on theme: "Maria Grazia Pia, INFN Genova Statistical Toolkit Power of Goodness-of-Fit tests M.G. Pia 1 B. Mascialino 1, A. Pfeiffer 2, M.G. Pia 1, A. Ribon 2, P."— Presentation transcript:

1 Maria Grazia Pia, INFN Genova Statistical Toolkit Power of Goodness-of-Fit tests M.G. Pia 1 B. Mascialino 1, A. Pfeiffer 2, M.G. Pia 1, A. Ribon 2, P. Viarengo 3 1 INFN Genova, Italy 2 CERN, Geneva, Switzerland 3 IST – National Institute for Cancer Research, Genova, Italy CHEP 2006 Mumbai, February 2006 Fluorescence spectrum of Icelandic Basalt 8.3 keV beam Counts Energy (keV)

2 Maria Grazia Pia, INFN Genova Some use cases Regression testing –Throughout the software life-cycle Online DAQ –Monitoring detector behaviour w.r.t. a reference Simulation validation –Comparison with experimental data Reconstruction –Comparison of reconstructed vs. expected distributions Physics analysis –Comparisons of experimental distributions (ATLAS vs. CMS Higgs?) –Comparison with theoretical distributions (data vs. Standard Model) The test statistics computation concerns the agreement between the two samples empirical distribution functions Historical background… Validation of Geant4 physics models through comparison of simulation vs. experimental data or reference databases

3 Maria Grazia Pia, INFN Genova G.A.P Cirrone, S. Donadio, S. Guatelli, A. Mantero, B. Mascialino, S. Parlati, M.G. Pia, A. Pfeiffer, A. Ribon, P. Viarengo A Goodness-of-Fit Statistical Toolkit IEEE- Transactions on Nuclear Science (2004), 51 (5): Releases are publicly downloadable from the web code, documentation etc. Releases are also distributed with LCG Mathematical Libraries Also ported to Java, distributed with JAS

4 Maria Grazia Pia, INFN Genova Vision of the project software process Rigorous software process vision Basic vision –General purpose tool –Toolkit approach (choice open to users) –Open source product –Independent from specific analysis tools –Easily usable in analysis and other tools architecture Build on a solid architecture Clearly define scope objectives scope, objectives Flexible, extensible, maintainable Flexible, extensible, maintainable system quality Software quality

5 Maria Grazia Pia, INFN Genova

6 GoF algorithms (latest public release) Algorithms for binned distributions – Anderson-Darling test – Chi-squared test – Fisz-Cramer-von Mises test – Algorithms for unbinned distributions – Anderson-Darling test – Cramer-von Mises test – Goodman test (Kolmogorov-Smirnov test in chi-squared approximation) – Kolmogorov-Smirnov test – Kuiper test

7 Maria Grazia Pia, INFN Genova Recent extensions: algorithms It is the most complete software for the comparison of two distributions, even among commercial/professional statistics tools –goal: provide all 2-sample GoF algorithms existing in statistics literature Publication in preparation to describe the new algorithms Software release: March 2006 Fisz-Cramer-von Mises test and Anderson-Darling test exact asymptotic distribution (earlier: critical values) Tiku test Cramer-von Mises test in a chi-squared approximation Improved tests New tests Weighted Kolmogorov-Smirnov, weighted Cramer-von Mises various weighting functions available in literature Watson test can be applied in case of cyclic observations, like the Kuiper test In preparation Girone test

8 Maria Grazia Pia, INFN Genova Simple user layer –Shields the user from the complexity of the underlying algorithms and design analysis objectscomparison algorithm –Only deal with the users analysis objects and choice of comparison algorithm User Layer First release: user layer for AIDA analysis objects –LCG Architecture Blueprint, Geant4 requirement July 2005: added user layer for ROOT histograms –in response to user requirements Other user layer implementations foreseen –easy to add –sound architecture decouples the mathematical component and the users representation of analysis objects

9 Maria Grazia Pia, INFN Genova Power of GoF tests Do we really need such a wide collection of GoF tests? Why? most appropriate Which is the most appropriate test to compare two distributions? good How good is a test at recognizing real equivalent distributions and rejecting fake ones? Which test to use? No comprehensive study of the relative power of GoF tests exists in literature –novel research in statistics –novel research in statistics (not only in physics data analysis!) Systematicall Systematic study of all existing GoF tests in progress –made possible by the extensive collection of tests in the Statistical Toolkit

10 Maria Grazia Pia, INFN Genova Method for the evaluation of power N=1000 Monte Carlo replicas Pseudoexperiment: a random drawing of two samples from two parent distributions For each test, the p-value computed by the GoF Toolkit derives from the analytical calculation of the asymptotic distribution, often depending on the samples sizes Confidence Level = 0.05 Parent distribution 1 Sample 1 n Sample 2 m GoF test Parent distribution 2 Power = # pseudoexperiments with p-value < (1-CL) # pseudoexperiments

11 Maria Grazia Pia, INFN Genova Parent distributions Uniform Gaussian Double exponential Cauchy Exponential Contaminated Normal Distribution 2Contaminated Normal Distribution 1 Also Breit-Wigner, other distributions being considered

12 Maria Grazia Pia, INFN Genova Characterization of distributions Parent distribution ST f 1 (x) Uniform f 2 (x) Gaussian f 3 (x) Double exponential f 4 (x) Cauchy f 5 (x) Exponential f 6 (x) Contamined normal f 7 (x) Contamined normal SkewnessTailweight

13 Maria Grazia Pia, INFN Genova General alternative Compare different distributions Parent1 Parent2 Unbinned distributions

14 Maria Grazia Pia, INFN Genova The power increases as a function of the sample size FLAT vs EXPONENTIAL Sample size Empirical power (%) Symmetric vs skewed Short tailed vs Medium tailed Sample size Empirical power (%) Symmetric vs Skewed Medium tailed vs Medium tailed ADKW KS CvM AD K W DOUBLE EXPONENTIAL vs CN1 Sample size Empirical power (%) Medium tailed vs Medium tailed Symmetric vs Symmetric DOUBLE EXPONENTIAL vs EXPONENTIAL Sample size Empirical power (%) Medium tailed vs Medium tailed Symmetric vs Simmetric No clear winner CN1 vs CN2

15 Maria Grazia Pia, INFN Genova GAUSSIAN vs DOUBLE EXPONENTIAL Samples size Empirical power (%) Very similar distributions Simmetric vs Simmetric Medium tailed vs Long tailed The power increases as a function of the sample size Sample size Empirical power (%) CAUCHY vs EXPONENTIAL Long tailed vs Medium tailed Symmetric vs Asymmetric AD GAUSSIAN vs DOUBLE EXPONENTIAL Sample size Empirical power (%) Symmetric vs Symmetric Medium tailed vs Long tailed CvM GAUSSIAN vs CN2 Sample size Empirical power (%) Symmetric vs skewed Medium tailed vs Medium tailed CvM AD KS CvM AD K W GAUSSIAN vs CN1 Sample size Empirical power (%) Symmetric vs Symmetric Medium tailed vs Medium tailed CvM KS

16 Maria Grazia Pia, INFN Genova The power varies as a function of the parent distributions characteristics Samples size = 5 EXPONENTIAL vs OTHER DISTRIBUTIONS Tailweight 2 ND distribution Empirical power (%) Samples size = 15 Tailweight 2 ND distribution Empirical power (%) EXPONENTIAL vs OTHER DISTRIBUTIONS Empirical power (%) Tailweight 2 ND distribution Empirical power (%) Tailweight 2 ND distribution Samples size = 5 Samples size = 15 FLAT vs OTHER DISTRIBUTIONS FLAT vs OTHER DISTRIBUTIONS KS CvM AD K W Sample size = 5 EXPONENTIAL vs OTHER DISTRIBUTIONS Tailweight 2 ND distribution Empirical power (%) Sample size = 15 Tailweight 2 ND distribution Empirical power (%) EXPONENTIAL vs OTHER DISTRIBUTIONS Empirical power (%) Tailweight 2 ND distribution Empirical power (%) Tailweight 2 ND distribution Distribution1 asymmetric Distribution1 symmetric FLAT vs OTHER DISTRIBUTIONS FLAT vs OTHER DISTRIBUTIONS

17 Maria Grazia Pia, INFN Genova The power varies as a function of parent distributions characteristics Sample size = 5 EXPONENTIAL vs OTHER DISTRIBUTIONS Tailweight 2 ND distribution Empirical power (%) Sample size = 15 Tailweight 2 ND distribution Empirical power (%) EXPONENTIAL vs OTHER DISTRIBUTIONS Empirical power (%) Tailweight 2 ND distribution Empirical power (%) Tailweight 2 ND distribution Sample size = 5 Sample size = 15 FLAT vs OTHER DISTRIBUTIONS FLAT vs OTHER DISTRIBUTIONS KS CvM AD K W Distribution1 asymmetric Distribution1 symmetric

18 Maria Grazia Pia, INFN Genova Comparative evaluation of tests Short (T<1.5) Medium (1.5 < T < 2) Long(T>2) S~1S~1S~1S~1KSKS – CVMCVM - AD S>1.5 KS - ADCVM-AD Skewness Tailweight Preliminary

19 Maria Grazia Pia, INFN Genova Location-scale alternative Same distribution, shifted or scaled Parent1(x) = Parent2 ((x-θ )/ τ )

20 Maria Grazia Pia, INFN Genova Power increases as a function of sample size KS CvM AD K W Empirical power (%) Sample size EXPONENTIAL θ =0.5, τ = 0.5 Empirical power (%) Sample size θ =0.5, τ = 1.5 EXPONENTIAL KW CvM KS Empirical power (%) Sample size EXPONENTIAL θ =1.0, τ = 0.5 No clear winner Sample size Empirical power (%) EXPONENTIAL θ =1.0, τ = 1.5 CvM KS

21 Maria Grazia Pia, INFN Genova Power increases as a function of sample size Sample size Empirical power (%) Sample size KS CvM AD K W θ =0.5, τ = 0.5 θ =1.0, τ = 0.5 θ =0.5, τ = 1.5 θ =1.0, τ = 1.5 CN2 No clear winner WK

22 Maria Grazia Pia, INFN Genova Power decreases as a function of tailweight No clear winner KS CvM AD K W Tailweight Empirical power (%) θ =1.0, τ = 0.5 N=10 Tailweight Empirical power (%) θ =1.0, τ = 1.5 N=10 Tailweight Empirical power (%) θ =0.5, τ = 0.5 N=10 Empirical power (%) Tailweight θ =0.5, τ = 1.5 N=10 No clear winner

23 Maria Grazia Pia, INFN Genova Binned distributions General alternative Compare different distributions Parent1 Parent2

24 Maria Grazia Pia, INFN Genova Chi-squared test: POWER % NormDoubleExpCauchyCN1 DoubleExp20 Cauchy6335 CN CN Samples size = 500 Number of bins = 20

25 Maria Grazia Pia, INFN Genova Preliminary results clear winner No clear winner for all the considered distributions in general –the performance of a test depends on its intrinsic features as well as on the features of the distributions to be compared Practical recommendations 1)first classify the type of the distributions in terms of skewness and tailweight 2)choose the most appropriate test given the type of distributions Systematic study of the power in progress –for both binned and unbinned distributions Topic still subject to research activity in the domain of statistics Publication in preparation

26 Maria Grazia Pia, INFN Genova Surprise… KS CvM AD K W KS CvM AD K W KS CvM AD K W Flat Gaussian Exponential InefficiencyInefficiency InefficiencyInefficiency InefficiencyInefficiency General alternative, same distributions CL = 95%, expect 5% inefficiency Anderson-Darling inefficiency Anderson-Darling exhibits an unexpected inefficiency at low numerosity Not documented anywhere in literature! Limitation of applicability?

27 Maria Grazia Pia, INFN Genova Outlook 1-sample GoF tests (comparison w.r.t. a function) Comparison of two/multi-dimensional distributions Systematic study of the power of GoF tests Goal to provide an extensive set of algorithms so far published in statistics literature, with a critical evaluation of their relative strengths and applicability Treatment of errors, filtering New release coming soon New papers in preparation Other components beyond GoF? Suggestions are welcome…

28 Maria Grazia Pia, INFN Genova Conclusions A novel, complete software toolkit for statistical analysis is being developed –rich set of algorithms –sound architectural design –rigorous software process A systematic study of the power of GoF tests is in progress –unexplored area of research Application in various domains –Geant4, HEP, space science, medicine… Feedback and suggestions are very much appreciated The project is open to developers interested in statistical methods

29 Maria Grazia Pia, INFN Genova IEEE Transactions on Nuclear Science Prime journal on technology in particle/nuclear physics Review process reorganized about one year ago Associate Editor dedicated to computing papers Various papers associated to CHEP 2004 published on IEEE TNS Papers associated to CHEP 2006 are welcome Manuscript submission: Papers submitted for publication will be subject to the regular review process Publications on refereed journals are beneficial not only to authors, but to the whole community of computing-oriented physicists Our hardware colleagues have better established publication habits… Further info:


Download ppt "Maria Grazia Pia, INFN Genova Statistical Toolkit Power of Goodness-of-Fit tests M.G. Pia 1 B. Mascialino 1, A. Pfeiffer 2, M.G. Pia 1, A. Ribon 2, P."

Similar presentations


Ads by Google