Download presentation

Presentation is loading. Please wait.

Published byKylie Biby Modified about 1 year ago

1
Lwando Kondlo Supervisor: Prof. Chris Koen University of the Western Cape 12/3/2008 SKA SA Postgraduate Bursary Conference Estimation of the parameters of a truncated Pareto distribution when the sample is contaminated by measurement errors

2
Introduction 12/3/2008 SKA SA Postgraduate Bursary Conference The Pareto distribution is a simple model for positive data. The truncated version has a wide range of application in several field in data analysis [1]. In astronomy and many physical and social sciences, the parameters of this truncated Pareto are estimated to draw inference about the processes underlying the phenomena: 1. To scale up the local observations to global patterns 2. To test theoretical models

3
Introduction(Cont’d) 12/3/2008 SKA SA Postgraduate Bursary Conference Therefore, it is essential that these parameters be estimated accurately. Unfortunately, the binning-based method traditionally used in astronomy and other fields perform quite poorly [2]. In this presentation, we discuss a more sophisticated method for fitting these parameters based on MLE.

4
Measurement error model 12/3/2008 SKA SA Postgraduate Bursary Conference The model for a variable measured with error is Where the measurement error is assumed to be independent of X. X is true value, but X is not directly observed, Y is observed instead.

5
Measurement error model (Cont’d) 12/3/2008 SKA SA Postgraduate Bursary Conference The PDF or parameters of X are of interest when the objective is to estimate characteristics of the population excluding within-variability. The estimation of PDF or parameters in the presence of measurement error is also known as deconvolution.

6
Objectives 12/3/2008 SKA SA Postgraduate Bursary Conference Develop a numerical methodology for deconvolution when the distribution is of Pareto form. Apply the methodology to the real data. Data of cloud masses (GMC) in various galaxies.

7
Convolution 12/3/2008 SKA SA Postgraduate Bursary Conference If X has the PDF g(.) and has the PDF h(.) Then, the sum Y has the PDF given by the convolution integral The forms of the densities g(.) and h(.) are assumed known, in this case.

8
Convolution (Cont’d) 12/3/2008 SKA SA Postgraduate Bursary Conference g(.) has a power-law form (i.e., truncated Pareto distribution)

9
Convolution (Cont’d) 12/3/2008 SKA SA Postgraduate Bursary Conference h(.) has a normal distribution with zero mean

10
Error-contaminated distribution 12/3/2008 SKA SA Postgraduate Bursary Conference

11
Non-contaminated distribution 12/3/2008 SKA SA Postgraduate Bursary Conference

12
Convolution 12/3/2008 SKA SA Postgraduate Bursary Conference The density f(.) is given by This is called an error-contaminated truncated Pareto density function. The only unknowns are the specific parameter values

13
Parameter Estimation 12/3/2008 SKA SA Postgraduate Bursary Conference MLE – preferred method for estimating parameter values. MLE determines the parameter values that maximise the likelihood of the observed data given the model Specifically, MLE finds the value of L, U, a and that maximise the product of the probabilities of each observed values Y

14
Parameter Estimation (Cont’d) 12/3/2008 SKA SA Postgraduate Bursary Conference The log-likelihood is The best values of the parameters are obtained by maximising. Optimisation – iterative procedure.

15
Simulation 12/3/2008 SKA SA Postgraduate Bursary Conference To validate our method, we generated data-sets of sizes n = 200, 400, 600 random points drawn from a truncated Pareto distribution with added normal errors to simulate the effects of measurement error.

16
Simulation Results 12/3/2008 SKA SA Postgraduate Bursary Conference True values Estimated values LUa 371.50.4 nLUa 2002.91386.87631.13880.3000 Bias error2.87%1.77%24.08%25.00% 4002.98957.03721.82550.2862 Bias error0.35%0.53%21.70%28.45% 6003.02837.06151.83810.4208 Bias error0.94%0.88%22.54%5.20%

17
Application 12/3/2008 SKA SA Postgraduate Bursary Conference We apply the method to the statistical analysis of cloud masses from a survey in various galaxies obtained by radio telescope somewhere. It is known that cloud masses follows a power-law (Pareto) distribution. But the methods used to measure cloud masses are subject to measurement error [5] Instrumental error – chemical evolution, temperature, etc.

18
Second survey of Molecular clouds, by Fukui et. al., 2008 12/3/2008 SKA SA Postgraduate Bursary Conference The figure shows the M33 region

19
Frequency Distribution 12/3/2008 SKA SA Postgraduate Bursary Conference

20
Results 12/3/2008 SKA SA Postgraduate Bursary Conference L Ua MLE6.937977.71961.33353.4753 Std errors info. Matrix 0.64844.96850.26240.5656 Std errors Jackknife 0.73282.26050.27010.5554 The lowest mass L we measure for a GMC in M33 is 6.9379 The highest mass U for a GMC in M33 is 77.7196 The masses with power law exponent for a GMC in M33 is 1.3335 The std errors are used to provide an indication of the size of the uncertainty, but its formal use is to provide confidence intervals

21
Assessing quality of fit 12/3/2008 SKA SA Postgraduate Bursary Conference The cumulative distribution function (CDF) of Y is given as: Then

22
Assessing quality of fit Graphical AssessmentsGoodness of Fit test 12/3/2008 SKA SA Postgraduate Bursary Conference Probability-Probability (P- P) Plots Kolmogorov-Smirnov (K- S) test

23
P-P Plots P-P plot compares the theoretical and empirical CDF in terms of their probabilities 12/3/2008 SKA SA Postgraduate Bursary Conference The coordinates of a point on a P- P plot are

24
K-S GoF test 12/3/2008 SKA SA Postgraduate Bursary Conference The K-S test statistic is based on the maximum distance between the theoretical CDF and the empirical CDF. The K- S statistic

25
K-S GoF Test 12/3/2008 SKA SA Postgraduate Bursary Conference

26
Conclusion 12/3/2008 SKA SA Postgraduate Bursary Conference The deconvolution method recovers the properties of the truncated Pareto distribution with very little/no bias. Produces reasonable error estimates from inverse Fisher information matrix and the Jackknife. The probability plot is approximately linear, indicating that the sample comes from the postulated distribution.

27
Future objectives 12/3/2008 SKA SA Postgraduate Bursary Conference Other distributions Truncated data Comparison with results for other methods.

28
Acknowledgments 12/3/2008 SKA SA Postgraduate Bursary Conference Acknowledge all people who contributed to the work presented and the funding sources SKA.

29
References 12/3/2008 SKA SA Postgraduate Bursary Conference [1] Zeninetti, L. and Ferraro, M.(2008). On the truncated Pareto distribution with applications. Central European Journal of Physics. Vol. 6 (1). 1-6 [2] White, E. P., et. al. (2008). “On estimating the exponent of power-law frequency distributions. Ecology. Vol. 89. 905-912. [3] Engargiola, G., et. al., 2003. "Giant molecular clouds in M33. I. BIMA all-disk survey, 343-363 [4] Cordy, C. B. and Thomas, D. ". ApJS 149 (1997). “ Deconvolution of a distribution function”. American Statistical Association. Vol. 92. 1456-1465. [5] Rosolowsky, E (2005). “The Mass Spectra of Giant Molecular Clouds in the Local Group ” The Astronomical Society of the Pacific, Vol. 117, 1403-1410.

30
12/3/2008 SKA SA Postgraduate Bursary Conference Thank you

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google