Presentation is loading. Please wait.

Presentation is loading. Please wait.

N.D.GagunashviliUniversity of Akureyri, Iceland Pearson´s χ 2 Test Modifications for Comparison of Unweighted and Weighted Histograms and Two Weighted.

Similar presentations


Presentation on theme: "N.D.GagunashviliUniversity of Akureyri, Iceland Pearson´s χ 2 Test Modifications for Comparison of Unweighted and Weighted Histograms and Two Weighted."— Presentation transcript:

1 N.D.GagunashviliUniversity of Akureyri, Iceland Pearson´s χ 2 Test Modifications for Comparison of Unweighted and Weighted Histograms and Two Weighted Histograms Nikolai Gagunashvili School of Computing, University of Akureyri, Iceland nikolai@unak.is

2 N.D.GagunashviliUniversity of Akureyri, Iceland Contents Introduction χ 2 test for comparison two (unweighted) histograms Unweighted and weighted histograms comparison Two weighted histograms comparison Numerical example and experiments Conclusions References

3 N.D.GagunashviliUniversity of Akureyri, Iceland Introduction A frequently used technique in data analysis is the comparison of histograms. First suggested by Pearson at 1904 the χ 2 test of homogeneity is used widely for comparing usual (unweighted) histograms. The modification of χ 2 test for comparison of weighted and unweighted histograms was proposed at 2005 (see Proceedings of PHYSTAT2005, Oxford 2005). This report develops the ideas presented at the PHYSTAT2005 conference.

4 N.D.GagunashviliUniversity of Akureyri, Iceland χ 2 test for comparison two (unweighted) histograms Let us consider two histograms with the same binning and the number of bins equal to r. Let us denote: The number of events in the ith bin in the first histogram n i The number of events in the ith bin in the second histogram m i The total number of events are equal to for the first histogram, for the second histogram.

5 N.D.GagunashviliUniversity of Akureyri, Iceland χ 2 test for comparison two (unweighted) histograms The hypothesis of homogeneity: Two histograms represent random values with identical distributions. It is equivalent: There exist r constants, and the probability of belonging to the ith bin for some measured value in both experiments is equal to p i.

6 N.D.GagunashviliUniversity of Akureyri, Iceland χ 2 test for comparison two (unweighted) histograms The number of events in the ith bin is a random variable with a distribution approximated by a Poisson probability distribution for the first histogram, for the second histogram. If the hypothesis of homogeneity is valid, then the maximum likelihood estimator of

7 N.D.GagunashviliUniversity of Akureyri, Iceland χ 2 test for comparison two (unweighted) histograms and then

8 N.D.GagunashviliUniversity of Akureyri, Iceland χ 2 test for comparison two (unweighted) histograms The comparison procedure can include an analyses of the residuals which is often helpful in identifying the bins of histogram responsible for a significant overall X 2 value. Most convenient for analysis are normalized residuals If hypotheses of homogeneity are valid then residuals r i are approximately independent and identically distributed random variables having distribution.

9 N.D.GagunashviliUniversity of Akureyri, Iceland χ 2 test for comparison two (unweighted) histograms The application of the χ 2 test has restrictions related to the value of the expected frequencies Np i, Mp i, i = 1,…, r. A conservative rule is that all the expectations must be 1 or greater for both histograms. In practical cases when expected frequencies are not known the estimated expected frequencies can be used.

10 N.D.GagunashviliUniversity of Akureyri, Iceland Unweighted and weighted histograms comparison A simple modification of the ideas described above can be used for the comparison of the usual (unweighted) and weighted histograms. Let us denote: The number of events in the ith bin in the unweighted histogram n i The weight of events in the ith bin of the weighted histogram w i The number of events in the unweighted histogram is equal to The total weight of events in the weighted histogram is equal to

11 N.D.GagunashviliUniversity of Akureyri, Iceland Unweighted and weighted histograms comparison The hypothesis of identity of an unweighted histogram to a weighted histogram: There exist r constants p 1,…, p r, such that and the probability of belonging to the ith bin for some measured value is equal to p i for the unweighted histogram and expectation values of weights w i equal to Wp i for the weighted histogram.

12 N.D.GagunashviliUniversity of Akureyri, Iceland Unweighted and weighted histograms comparison The number of events in the ith bin of unweighted histogram is a random variable with distribution approximated by the Poisson probability distribution The weight w i is a random variable with a distribution approximated by the normal probability distribution where σ i 2 is the variance of the weight w i.

13 N.D.GagunashviliUniversity of Akureyri, Iceland Unweighted and weighted histograms comparison If we replace the variance σ i 2 with estimate s i 2 (sum of squares of weights of events in the ith bin) and the hypothesis of identity is valid, then the maximum likelihood estimator of p i, i = 1,..,r, is

14 N.D.GagunashviliUniversity of Akureyri, Iceland Unweighted and weighted histograms comparison We may then use the test statistic and it is plausible that this has approximately a distribution where

15 N.D.GagunashviliUniversity of Akureyri, Iceland Unweighted and weighted histograms comparison The variance z i 2 of the difference between the weight w i and the estimated expectation value of the weight is approximately equal to: The residuals have approximately a normal distribution with mean equal to 0 and standard deviation equal to 1

16 N.D.GagunashviliUniversity of Akureyri, Iceland Unweighted and weighted histograms comparison Restrictions The minimal expected frequency for an unweighted histogram must be 1. The expected frequencies recommended for the weighted histogram is more than 25.

17 N.D.GagunashviliUniversity of Akureyri, Iceland Two weighted histograms comparison Let us consider two histograms with the same binning and the number of bins equal to r. Let us denote: The weight of events in the ith bin of the first histogram w 1i The weight of events in the ith bin of the second histogram w 2i The total weight of events in the first histogram is equal to The total weight of events in the second histogram is equal to

18 N.D.GagunashviliUniversity of Akureyri, Iceland Two weighted histograms comparison The hypothesis of identity of two weighted histograms: There exist r constants p 1,…, p r, such that expectation values of weights w 1i equal to W 1 p i for the first histogram and expectation values of weights w 2i equal to W 2 p i for the second histogram

19 N.D.GagunashviliUniversity of Akureyri, Iceland Two weighted histograms comparison Weights in both the histograms are random variables with distributions which can be approximated by a normal probability distribution for the first histogram and by a normal probability distribution for the second histogram Here σ 1i 2 and σ 2i 2 are the variances of w 1i and w 2i with estimators s 1i 2 and s 2i 2 respectively.

20 N.D.GagunashviliUniversity of Akureyri, Iceland Two weighted histograms comparison If the hypothesis of identity is valid, then the maximum likelihood and Least Square Method estimator of p i, 1,…, r, is

21 N.D.GagunashviliUniversity of Akureyri, Iceland Two weighted histograms comparison We may then use the test statistic and it is plausible that this has approximately a distribution.

22 N.D.GagunashviliUniversity of Akureyri, Iceland Two weighted histograms comparison The normalized residuals have approximately a normal distribution with mean equal to 0 and standard deviation 1. where

23 N.D.GagunashviliUniversity of Akureyri, Iceland Two weighted histograms comparison Restriction A recommended minimal expected frequency is equal to 25 for the proposed test.

24 N.D.GagunashviliUniversity of Akureyri, Iceland Numerical example and experiments The method described herein is now illustrated with an example. We take a distribution defined on the interval [4; 16]. Events distributed according to the formula are simulated to create the unweighted histogram. Uniformly distributed events are simulated for the weighted histogram with weights calculated by formula. Each histogram has the same number of bins: 20.

25 N.D.GagunashviliUniversity of Akureyri, Iceland An example of comparison of the unweighted histogram with 200 events and the weighted histogram with 500 events weighted histogram unweighted histogram Q-Q plot residuals

26 N.D.GagunashviliUniversity of Akureyri, Iceland Numerical example and experiments The value of the test statistic X 2 is equal to 21.09 with p-value equal to 0.33, therefore the hypothesis of identity of the two histograms can be accepted. The behavior of the normalized residuals plot and the normal Q-Q plot of residuals are regular and we cannot identify the outliers or bins with a big influence on X 2.

27 N.D.GagunashviliUniversity of Akureyri, Iceland Chi-square Q-Q plots of X 2 statistics for two unweighted histograms with different minimal expected frequencies.

28 N.D.GagunashviliUniversity of Akureyri, Iceland Chi-square Q-Q plots of X 2 statistics for unweighted and weighted histograms with different minimal expected frequencies.

29 N.D.GagunashviliUniversity of Akureyri, Iceland Chi-square Q-Q plots of X 2 statistics for two weighted histograms with different minimal expected frequencies.

30 N.D.GagunashviliUniversity of Akureyri, Iceland Conclusions A test for comparing the usual (unweighted) histogram and the weighted histogram was proposed. A test for comparing two weighted histograms was proposed. In both cases formulas for normalized residuals were presented that can be useful for the identifications of bins that are outliers, or bins that have a big influence on X 2. The proposed in this paper approach can be generalized for a comparison of several unweighted and weighted histograms or just weighted histograms. The test statistic has approximately a distribution for s histograms with r bins.

31 N.D.GagunashviliUniversity of Akureyri, Iceland [7] Gagunashvili, N., Comparison of weighted and unweighted histograms, arXiv:physics/0605123, 2006


Download ppt "N.D.GagunashviliUniversity of Akureyri, Iceland Pearson´s χ 2 Test Modifications for Comparison of Unweighted and Weighted Histograms and Two Weighted."

Similar presentations


Ads by Google