1 Goodness-of-Fit Tests with Censored Data Edsel A. Pena Statistics Department University of South Carolina Columbia, SC [ Research.

Slides:



Advertisements
Similar presentations
Point Estimation Notes of STAT 6205 by Dr. Fan.
Advertisements

Chapter 7 Title and Outline 1 7 Sampling Distributions and Point Estimation of Parameters 7-1 Point Estimation 7-2 Sampling Distributions and the Central.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
The General Linear Model. The Simple Linear Model Linear Regression.
Goodness of Fit of a Joint Model for Event Time and Nonignorable Missing Longitudinal Quality of Life Data – A Study by Sneh Gulati* *with Jean-Francois.
Nonparametric Estimation with Recurrent Event Data Edsel A. Pena Department of Statistics University of South Carolina Research.
1 A General Class of Models for Recurrent Events Edsel A. Pena University of South Carolina at Columbia [ Research support from.
Maximum likelihood (ML) and likelihood ratio (LR) test
Nonparametric Estimation with Recurrent Event Data Edsel A. Pena Department of Statistics University of South Carolina Research.
Nonparametric Estimation with Recurrent Event Data Edsel A. Pena Department of Statistics University of South Carolina Research.
Hypothesis testing Some general concepts: Null hypothesisH 0 A statement we “wish” to refute Alternative hypotesisH 1 The whole or part of the complement.
Chapter 10 Simple Regression.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem, random variables, pdfs 2Functions.
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Nonparametric Estimation with Recurrent Event Data Edsel A. Pena Department of Statistics, USC Research supported by NIH and NSF Grants Based on joint.
Maximum likelihood (ML) and likelihood ratio (LR) test
Maximum-Likelihood estimation Consider as usual a random sample x = x 1, …, x n from a distribution with p.d.f. f (x;  ) (and c.d.f. F(x;  ) ) The maximum.
Log-linear and logistic models
Chapter 11 Multiple Regression.
G. Cowan 2011 CERN Summer Student Lectures on Statistics / Lecture 41 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 14 Goodness-of-Fit Tests and Categorical Data Analysis.
Linear and generalised linear models
STATISTICAL INFERENCE PART VI
July 3, Department of Computer and Information Science (IDA) Linköpings universitet, Sweden Minimal sufficient statistic.
Inferences About Process Quality
Linear and generalised linear models
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Survival Analysis A Brief Introduction Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event.
The Neymann-Pearson Lemma Suppose that the data x 1, …, x n has joint density function f(x 1, …, x n ;  ) where  is either  1 or  2. Let g(x 1, …,
On Model Validation Techniques Alex Karagrigoriou University of Cyprus "Quality - Theory and Practice”, ORT Braude College of Engineering, Karmiel, May.
Statistical Decision Theory
G. Cowan 2009 CERN Summer Student Lectures on Statistics1 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability densities,
G. Cowan Lectures on Statistical Data Analysis Lecture 3 page 1 Lecture 3 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
Random Sampling, Point Estimation and Maximum Likelihood.
Section 9.2 Testing the Mean  9.2 / 1. Testing the Mean  When  is Known Let x be the appropriate random variable. Obtain a simple random sample (of.
Extrapolation Models for Convergence Acceleration and Function ’ s Extension David Levin Tel-Aviv University MAIA Erice 2013.
An Empirical Likelihood Ratio Based Goodness-of-Fit Test for Two-parameter Weibull Distributions Presented by: Ms. Ratchadaporn Meksena Student ID:
Multinomial Distribution
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
Chapter 7 Point Estimation
Borgan and Henderson:. Event History Methodology
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Slide 26-1 Copyright © 2004 Pearson Education, Inc.
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005 Dr. John Lipp Copyright © Dr. John Lipp.
Chapter 14 Repeated Measures and Two Factor Analysis of Variance
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
Empirical Likelihood for Right Censored and Left Truncated data Jingyu (Julia) Luan University of Kentucky, Johns Hopkins University March 30, 2004.
Brief Review Probability and Statistics. Probability distributions Continuous distributions.
Lecture 4: Likelihoods and Inference Likelihood function for censored data.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
Love does not come by demanding from others, but it is a self initiation. Survival Analysis.
Goodness-of-fit (GOF) Tests Testing for the distribution of the underlying population H 0 : The sample data came from a specified distribution. H 1 : The.
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
Stats Term Test 4 Solutions. c) d) An alternative solution is to use the probability mass function and.
Comparing Counts Chapter 26. Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University.
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
Chapter 9: Introduction to the t statistic. The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown.
Two Types of Empirical Likelihood Zheng, Yan Department of Biostatistics University of California, Los Angeles.
Discrete Event Simulation - 4
Parametric Survival Models (ch. 7)
EC 331 The Theory of and applications of Maximum Likelihood Method
Lecture 4: Econometric Foundations
Love does not come by demanding from others, but it is a self initiation. Survival Analysis.
Lecture 4: Likelihoods and Inference
Lecture 4: Likelihoods and Inference
Presentation transcript:

1 Goodness-of-Fit Tests with Censored Data Edsel A. Pena Statistics Department University of South Carolina Columbia, SC [ Research support from NIH and NSF. Talk at Cornell University, 3/13/02

2 Practical Problem

3 Product-Limit Estimator and Best-Fitting Exponential Survivor Function Question: Is the underlying survivor function modeled by a family of exponential distributions? a Weibull distribution?

4 A Car Tire Data Set Times to withdrawal (in hours) of 171 car tires, with withdrawal either due to failure or right-censoring. Reference: Davis and Lawrance, in Scand. J. Statist., Pneumatic tires subjected to laboratory testing by rotating each tire against a steel drum until either failure (several modes) or removal (right-censoring).

5 Product-Limit Estimator, Best-Fitting Exponential and Weibull Survivor Functions Exp PLE Weibull Question: Is the Weibull family a good model for this data?

6 Goodness-of-Fit Problem T 1, T 2, …, T n are IID from an unknown distribution function F Case 1: F is a continuous df Case 2: F is discrete df with known jump points Case 3: F is a mixed distribution with known jump points C 1, C 2, …, C n are (nuisance) variables that right-censor the T i ’s Data: (Z 1,  1 ), (Z 2,  2 ), …, (Z n,  n ), where Z i = min(T i, C i )and  i = I(T i < C i )

7 Simple GOF Problem: For a pre-specified F 0, to test the null hypothesis that H 0 : F = F 0 versus H 1 : F  F 0. On the basis of the data (Z 1,  1 ), (Z 2,  2 ), …, (Z n,  n ): Composite GOF Problem: For a pre-specified family of dfs F = {F 0 (.;  }, to test the hypotheses that H 0 : F  F versus H 1 : F  F. Statement of the GOF Problem

8 Generalizing Pearson With complete data, the famous Pearson test statistics are: Simple Case:Composite Case: where O i is the # of observations in the i th interval; E i is the expected number of observations in the i th interval; and is the estimated expected number of observations in the i th interval under the null model.

9 Obstacles with Censored Data With right-censored data, determining the exact values of the O j ’s is not possible. Need to estimate them using the product-limit estimator (Hollander and Pena, ‘92; Li and Doss, ‘93), Nelson-Aalen estimator (Akritas, ‘88; Hjort, ‘90), or by self-consistency arguments. Hard to examine the power or optimality properties of the resulting Pearson generalizations because of the ad hoc nature of their derivations.

10 In Hazards View: Continuous Case For T an abs cont +rv, the hazard rate function (t) is: Cumulative hazard function  (t) is: Survivor function in terms of  :

11 Two Common Examples Exponential : Two-parameter Weibull:

12 Counting Processes and Martingales {M(t): 0 < t <  is a square-integrable zero-mean martingale with predictable quadratic variation (PQV) process

13 Idea in Continuous Case For testing H 0 : (.)  C ={ 0 (.;  ): , if H 0 holds, then there is some   such that the true hazard   is such   0 (.;   ) K Let Basis Set for K : Expansion: Truncation:p is smoothing order,

14 Hazard Embedding and Approach From this truncation, we obtain the approximation Embedding Class CpCp Note: H 0  C p obtains by taking  = 0. GOF Tests: Score tests for H 0 :  = 0 versus H 1 :   0. Note that  is a nuisance parameter in this testing problem.

15 Class of Statistics Estimating equation for the nuisance  Quadratic Statistic: is an estimator of the limiting covariance of

16 Asymptotics and Test Under regularity conditions, Estimator of  obtained from the matrix: Test: Reject H 0 if

17 A Choice of  Generalizing Pearson Partition [0,  ] into 0 = a 1 < a 2 < … < a p =  and let Then are dynamic expected frequencies

18 Special Case: Testing Exponentiality C = {   t  } with Exponential Hazards: Test Statistic (“generalized Pearson”): where

19 A Polynomial-Type Choice of  Components of where Resulting test based on the ‘generalized’ residuals. The framework allows correcting for the estimation of nuisance .

20 Simulated Levels (Polynomial Specification, K = p)

21 Simulated Powers Legend: Solid: p=2; Dots: p=3; Short Dashes: p = 4; Long Dashes: p=5

22 Back to Lung Cancer Data Test for Exponentiality Test for Weibull S 4 and S 5 also both indicate rejection of Weibull family.

23 Back to Davis & Lawrance Car Tire Data Exp PLE Weibull

24 Test of Exponentiality Conclusion: Exponentiality does not hold as in graph!

25 Test of Weibull Family Conclusion: Cannot reject Weibull family of distributions.

26 Simple GOF Problem: Discrete Data T i ’s are discrete +rvs with jump points {a 1, a 2, a 3, …}. Hazards: Problem: To test the hypotheses based on the right-censored data (Z 1,  1 ), …, (Z n,  n ).

27 True and hypothesized hazard odds: For p a pre-specified order, let be a p x J (possibly random) matrix, with its p rows linearly independent, and with [0, a J ] being the maximum observation period for all n units.

28 Embedding Idea To embed the hypothesized hazard odds into Equivalent to assuming that the log hazard odds ratios satisfy Class of tests are the score tests of H 0 :  vs. H 1 :  as p and  are varied.

29 Class of Test Statistics Quadratic Score Statistic: Under H 0, this converges in distribution to a chi-square rv. p

30 A Pearson-Type Choice of  Partition {1,2,…,J}:

31 A Polynomial-Type Choice quadratic form from the above matrices.

32 Hyde’s Test: A Special Case When p = 1 with polynomial specification, we obtain: Resulting test coincides with Hyde’s (‘77, Btka) test.

33 Adaptive Choice of Smoothing Order = partial likelihood of = associated observed information matrix = partial MLE of Adjusted Schwarz (‘78, Ann. Stat.) Bayesian Information Criterion *

34 Simulation Results for Simple Discrete Case Note: Based on polynomial-type specification. Performances of Pearson type tests were not as good as for the polynomial type.

35 Concluding Remarks Framework is general enough so as to cover both continuous and discrete cases. Mixed case dealt with via hazard decomposition. Since tests are score tests, they possess local optimality properties. Enables automatic adjustment of effects due to estimation of nuisance parameters. Basic approach extends Neyman’s 1937 idea by embedding hazards instead of densities. More studies needed for adaptive procedures.