# 1 Goodness-of-Fit Tests with Censored Data Edsel A. Pena Statistics Department University of South Carolina Columbia, SC [ Research.

## Presentation on theme: "1 Goodness-of-Fit Tests with Censored Data Edsel A. Pena Statistics Department University of South Carolina Columbia, SC [ Research."— Presentation transcript:

1 Goodness-of-Fit Tests with Censored Data Edsel A. Pena Statistics Department University of South Carolina Columbia, SC [E-Mail: pena@stat.sc.edu] Research support from NIH and NSF. Talk at Cornell University, 3/13/02

2 Practical Problem

3 Product-Limit Estimator and Best-Fitting Exponential Survivor Function Question: Is the underlying survivor function modeled by a family of exponential distributions? a Weibull distribution?

4 A Car Tire Data Set Times to withdrawal (in hours) of 171 car tires, with withdrawal either due to failure or right-censoring. Reference: Davis and Lawrance, in Scand. J. Statist., 1989. Pneumatic tires subjected to laboratory testing by rotating each tire against a steel drum until either failure (several modes) or removal (right-censoring).

5 Product-Limit Estimator, Best-Fitting Exponential and Weibull Survivor Functions Exp PLE Weibull Question: Is the Weibull family a good model for this data?

6 Goodness-of-Fit Problem T 1, T 2, …, T n are IID from an unknown distribution function F Case 1: F is a continuous df Case 2: F is discrete df with known jump points Case 3: F is a mixed distribution with known jump points C 1, C 2, …, C n are (nuisance) variables that right-censor the T i ’s Data: (Z 1,  1 ), (Z 2,  2 ), …, (Z n,  n ), where Z i = min(T i, C i )and  i = I(T i < C i )

7 Simple GOF Problem: For a pre-specified F 0, to test the null hypothesis that H 0 : F = F 0 versus H 1 : F  F 0. On the basis of the data (Z 1,  1 ), (Z 2,  2 ), …, (Z n,  n ): Composite GOF Problem: For a pre-specified family of dfs F = {F 0 (.;  }, to test the hypotheses that H 0 : F  F versus H 1 : F  F. Statement of the GOF Problem

8 Generalizing Pearson With complete data, the famous Pearson test statistics are: Simple Case:Composite Case: where O i is the # of observations in the i th interval; E i is the expected number of observations in the i th interval; and is the estimated expected number of observations in the i th interval under the null model.

9 Obstacles with Censored Data With right-censored data, determining the exact values of the O j ’s is not possible. Need to estimate them using the product-limit estimator (Hollander and Pena, ‘92; Li and Doss, ‘93), Nelson-Aalen estimator (Akritas, ‘88; Hjort, ‘90), or by self-consistency arguments. Hard to examine the power or optimality properties of the resulting Pearson generalizations because of the ad hoc nature of their derivations.

10 In Hazards View: Continuous Case For T an abs cont +rv, the hazard rate function (t) is: Cumulative hazard function  (t) is: Survivor function in terms of  :

11 Two Common Examples Exponential : Two-parameter Weibull:

12 Counting Processes and Martingales {M(t): 0 < t <  is a square-integrable zero-mean martingale with predictable quadratic variation (PQV) process

13 Idea in Continuous Case For testing H 0 : (.)  C ={ 0 (.;  ): , if H 0 holds, then there is some   such that the true hazard   is such   0 (.;   ) K Let Basis Set for K : Expansion: Truncation:p is smoothing order,

14 Hazard Embedding and Approach From this truncation, we obtain the approximation Embedding Class CpCp Note: H 0  C p obtains by taking  = 0. GOF Tests: Score tests for H 0 :  = 0 versus H 1 :   0. Note that  is a nuisance parameter in this testing problem.

15 Class of Statistics Estimating equation for the nuisance  Quadratic Statistic: is an estimator of the limiting covariance of

16 Asymptotics and Test Under regularity conditions, Estimator of  obtained from the matrix: Test: Reject H 0 if

17 A Choice of  Generalizing Pearson Partition [0,  ] into 0 = a 1 < a 2 < … < a p =  and let Then are dynamic expected frequencies

18 Special Case: Testing Exponentiality C = {   t  } with Exponential Hazards: Test Statistic (“generalized Pearson”): where

19 A Polynomial-Type Choice of  Components of where Resulting test based on the ‘generalized’ residuals. The framework allows correcting for the estimation of nuisance .

20 Simulated Levels (Polynomial Specification, K = p)

21 Simulated Powers Legend: Solid: p=2; Dots: p=3; Short Dashes: p = 4; Long Dashes: p=5

22 Back to Lung Cancer Data Test for Exponentiality Test for Weibull S 4 and S 5 also both indicate rejection of Weibull family.

23 Back to Davis & Lawrance Car Tire Data Exp PLE Weibull

24 Test of Exponentiality Conclusion: Exponentiality does not hold as in graph!

25 Test of Weibull Family Conclusion: Cannot reject Weibull family of distributions.

26 Simple GOF Problem: Discrete Data T i ’s are discrete +rvs with jump points {a 1, a 2, a 3, …}. Hazards: Problem: To test the hypotheses based on the right-censored data (Z 1,  1 ), …, (Z n,  n ).

27 True and hypothesized hazard odds: For p a pre-specified order, let be a p x J (possibly random) matrix, with its p rows linearly independent, and with [0, a J ] being the maximum observation period for all n units.

28 Embedding Idea To embed the hypothesized hazard odds into Equivalent to assuming that the log hazard odds ratios satisfy Class of tests are the score tests of H 0 :  vs. H 1 :  as p and  are varied.

29 Class of Test Statistics Quadratic Score Statistic: Under H 0, this converges in distribution to a chi-square rv. p

30 A Pearson-Type Choice of  Partition {1,2,…,J}:

31 A Polynomial-Type Choice quadratic form from the above matrices.

32 Hyde’s Test: A Special Case When p = 1 with polynomial specification, we obtain: Resulting test coincides with Hyde’s (‘77, Btka) test.

33 Adaptive Choice of Smoothing Order = partial likelihood of = associated observed information matrix = partial MLE of Adjusted Schwarz (‘78, Ann. Stat.) Bayesian Information Criterion *

34 Simulation Results for Simple Discrete Case Note: Based on polynomial-type specification. Performances of Pearson type tests were not as good as for the polynomial type.

35 Concluding Remarks Framework is general enough so as to cover both continuous and discrete cases. Mixed case dealt with via hazard decomposition. Since tests are score tests, they possess local optimality properties. Enables automatic adjustment of effects due to estimation of nuisance parameters. Basic approach extends Neyman’s 1937 idea by embedding hazards instead of densities. More studies needed for adaptive procedures.

Download ppt "1 Goodness-of-Fit Tests with Censored Data Edsel A. Pena Statistics Department University of South Carolina Columbia, SC [ Research."

Similar presentations