Nonparametric Estimation with Recurrent Event Data Edsel A. Pena Department of Statistics, USC Research supported by NIH and NSF Grants Based on joint.

Slides:



Advertisements
Similar presentations
Random Processes Introduction (2)
Advertisements

COMPUTER INTENSIVE AND RE-RANDOMIZATION TESTS IN CLINICAL TRIALS Thomas Hammerstrom, Ph.D. USFDA, Division of Biometrics The opinions expressed are those.
T.C ATILIM UNIVERSITY MODES ADVANCED SYSTEM SIMULATION MODES 650
1 Goodness-of-Fit Tests with Censored Data Edsel A. Pena Statistics Department University of South Carolina Columbia, SC [ Research.
STAT 497 APPLIED TIME SERIES ANALYSIS
Cox Model With Intermitten and Error-Prone Covariate Observation Yury Gubman PhD thesis in Statistics Supervisors: Prof. David Zucker, Prof. Orly Manor.
Goodness of Fit of a Joint Model for Event Time and Nonignorable Missing Longitudinal Quality of Life Data – A Study by Sneh Gulati* *with Jean-Francois.
Survival analysis 1 The greatest blessing in life is in giving and not taking.
Nonparametric Estimation with Recurrent Event Data Edsel A. Pena Department of Statistics University of South Carolina Research.
1 A General Class of Models for Recurrent Events Edsel A. Pena University of South Carolina at Columbia [ Research support from.
Nonparametric Estimation with Recurrent Event Data Edsel A. Pena Department of Statistics University of South Carolina Research.
Nonparametric Estimation with Recurrent Event Data Edsel A. Pena Department of Statistics University of South Carolina Research.

Analysis of Variance. Experimental Design u Investigator controls one or more independent variables –Called treatment variables or factors –Contain two.
Statistics Lecture 20. Last Day…completed 5.1 Today Parts of Section 5.3 and 5.4.
Probability By Zhichun Li.
2008 Chingchun 1 Bootstrap Chingchun Huang ( 黃敬群 ) Vision Lab, NCTU.
Violations of Assumptions In Least Squares Regression.
Modeling clustered survival data The different approaches.
1 2. Reliability measures Objectives: Learn how to quantify reliability of a system Understand and learn how to compute the following measures –Reliability.
Bootstrap spatobotp ttaoospbr Hesterberger & Moore, chapter 16 1.
Analysis of Complex Survey Data
Choosing Statistical Procedures
A Review of Probability Models
1 A General Class of Models for Recurrent Events Edsel A. Pena University of South Carolina at Columbia [ Research support from.
AM Recitation 2/10/11.
Simulation Output Analysis
Stephen Mildenhall September 2001
Applications of bootstrap method to finance Chin-Ping King.
Statistical approaches to analyse interval-censored data in a confirmatory trial Margareta Puu, AstraZeneca Mölndal 26 April 2006.
Random Sampling, Point Estimation and Maximum Likelihood.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
1 Sampling Distributions Lecture 9. 2 Background  We want to learn about the feature of a population (parameter)  In many situations, it is impossible.
Using Resampling Techniques to Measure the Effectiveness of Providers in Workers’ Compensation Insurance David Speights Senior Research Statistician HNC.
0 K. Salah 2. Review of Probability and Statistics Refs: Law & Kelton, Chapter 4.
Hung X. Nguyen and Matthew Roughan The University of Adelaide, Australia SAIL: Statistically Accurate Internet Loss Measurements.
INTRODUCTION TO SURVIVAL ANALYSIS
Borgan and Henderson:. Event History Methodology
Lecture 2 Forestry 3218 Lecture 2 Statistical Methods Avery and Burkhart, Chapter 2 Forest Mensuration II Avery and Burkhart, Chapter 2.
Pro gradu –thesis Tuija Hevonkorpi.  Basic of survival analysis  Weibull model  Frailty models  Accelerated failure time model  Case study.
Stefano Vezzoli, CROS NT
Limits to Statistical Theory Bootstrap analysis ESM April 2006.
Week 21 Stochastic Process - Introduction Stochastic processes are processes that proceed randomly in time. Rather than consider fixed random variables.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
: An alternative representation of level of significance. - normal distribution applies. - α level of significance (e.g. 5% in two tails) determines the.
Empirical Likelihood for Right Censored and Left Truncated data Jingyu (Julia) Luan University of Kentucky, Johns Hopkins University March 30, 2004.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
01/20151 EPI 5344: Survival Analysis in Epidemiology Actuarial and Kaplan-Meier methods February 24, 2015 Dr. N. Birkett, School of Epidemiology, Public.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Lecture 4: Likelihoods and Inference Likelihood function for censored data.
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Lecture 4 Confidence Intervals. Lecture Summary Last lecture, we talked about summary statistics and how “good” they were in estimating the parameters.
1 Borgan and Henderson: Event History Methodology Lancaster, September 2006 Session 6.1: Recurrent event data Intensity processes and rate functions Robust.
Stats Term Test 4 Solutions. c) d) An alternative solution is to use the probability mass function and.
1 Double the confidence region S. Eguchi, ISM & GUAS This talk is a part of co-work with J. Copas, University of Warwick.
Week 21 Order Statistics The order statistics of a set of random variables X 1, X 2,…, X n are the same random variables arranged in increasing order.
Topic 19: Survival Analysis T = Time until an event occurs. Events are, e.g., death, disease recurrence or relapse, infection, pregnancy.
1 Ka-fu Wong University of Hong Kong A Brief Review of Probability, Statistics, and Regression for Forecasting.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Bias-Variance Analysis in Regression  True function is y = f(x) +  where  is normally distributed with zero mean and standard deviation .  Given a.
DURATION ANALYSIS Eva Hromádková, Applied Econometrics JEM007, IES Lecture 9.
Multiple Random Variables and Joint Distributions
Comparing Cox Model with a Surviving Fraction with regular Cox model
Ch3: Model Building through Regression
Generalized Spatial Dirichlet Process Models
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Parametric Empirical Bayes Methods for Microarrays
Presentation transcript:

Nonparametric Estimation with Recurrent Event Data Edsel A. Pena Department of Statistics, USC Research supported by NIH and NSF Grants Based on joint works with R. Strawderman (Cornell) and M. Hollander (Florida State) Talk at USC Epid/Biostat 03/04/02

2 A Real Recurrent Event Data (Source: Aalen and Husebye (‘91), Statistics in Medicine) Variable: Migrating motor complex (MMC) periods, in minutes, for 19 individuals in a gastroenterology study concerning small bowel motility during fasting state.

3 Pictorial Representation of Data for a Subject Consider subject #3. K = 3 Inter-Event Times, T j : 284, 59, 186 Censored Time,  -S K : 4 Calendar Times, S j : 284, 343, 529 Limit of Observation Period:  0  =533 S 1 =284 S 2 =343 S 3 =529 T 1 =284 T 2 =59 T 3 =186  -S 3 =4 Calendar Scale

4 Features of Data Set Random observation period per subject (administrative, time, resource constraints). Length of period:  Event of interest is recurrent. # of events (K) informative about MMC period distribution (F). Last MMC period right-censored by which is informative about F. Calendar times: S 1, S 2, …, S K.

5 Assumptions and Problem Aalen and Husebye: “Consecutive MMC periods for each individual appear (to be) approximate renewal processes.” Translation: The inter-event times T ij ’s are assumed stochastically independent. Problem: Under this IID or renewal assumption, and taking into account the informativeness of K and the right-censoring mechanism, how to estimate the inter-event distribution, F and its parameters (e.g., median).

6 General Form of Data Accrual Calendar Times of Event Occurrences S i0 =0 and S ij = T i1 + T i2 + … + T ij Number of Events in Observation Period K i = max{j: S ij <  i } Limits of observation periods,  i ’s, may be fixed, or assumed IID with unknown distribution G.

7 Observables Problem Obtain a nonparametric estimator of the inter-event time distribution, F, and determine its properties.

8 Relevance and Applicability Recurrent phenomena occur in public health/medical settings. –Outbreak of a disease. –Hospitalization of a patient. –Tumor occurrence. –Epileptic seizures. In other settings. –Non-life insurance claims. –Stock index (e.g., Dow Jones) decreases by at least 6% in a day. –Labor strikes. –Reliability and engineering.

9 Existing Methods and Their Limitations Consider only the first, possibly right-censored, observation per unit and use the product-limit estimator (PLE). –Loss of information –Inefficient Ignore the right-censored last observation, and use empirical distribution function (EDF). –Leads to bias (“biased sampling”) –Estimator inconsistent

10 Review: Complete Data T 1, T 2, …, T n IID F(t) = P(T < t) Asymptotics of EDF Empirical Survivor Func. (EDF)

11 In Hazards View Hazard rate function Cumulative hazard function Product representation

12 Alternative Representations Another representation of the variance EDF in Product Form

13 Review: Right-Censored Data Failure times: T 1, T 2, …, T n IID F Censoring times: C 1, C 2, …, C n IID G Right-censored data (Z 1,  1 ), (Z 2,  2 ), …, (Z n,  n ) Z i = min(T i, C i )  i = I{T i < C i } Product-Limit Estimator

14 PLE Properties Asymptotics of PLE If G(w) = 0, so no censoring,

15 Recurrent Event Setting N i (s,t) = number of events for the ith unit in calendar period [0,s] with inter-event times at most t. Y i (s,t) = number of events for the ith unit which are known during the calendar period [0,s] to have inter-event times at least t. K i (s) = number of events for the ith unit that occurred in [0,s]

16 s t N 3 (s=400,t=100) = 1 Y 3 (s=400,t=100) = 1 K 3 (s=400) = 2 Inter-Event Time Calendar Time  s=400 t=100 For MMC Unit #3 t=50 s=550 K 3 (s=550) = 3 N 3 (s=550,t=50) = 0 Y 3 (s=550,t=50) = 3

17 Aggregated Processes Limit Processes as s Increases

18 Estimator of F in Recurrent Event Setting Estimator is called the GPLE; generalizes the EDF and the PLE discussed earlier.

19 Properties of GPLE

20 Asymptotic Distribution of GPLE For large n and regularity conditions, with variance function

21 Variance Functions: Comparisons PLE: GPLE (recurrent event): If in stationary state, R(t) = t/  F, so  G (w) = mean residual life of  at w. EDF:

22 Wang-Chang Estimator (JASA, ‘99) Can handle correlated inter-event times. Comparison with GPLE may be unfair to WC estimator under the IID setting!

23 Frailty (“Random Effects”) Model U 1, U 2, …, U n are IID unobserved Gamma( ,  ) random variables, called frailties. Given U i = u, (T i1, T i2, T i3, …) are independent inter-event times with Marginal survivor function of T ij:

24 Estimator under Frailty Model Frailty parameter,  : dependence parameter. Small (Large)  : Strong (Weak) dependence. EM algorithm needed to obtain the estimator (FRMLE). Unobserved frailties viewed as missing. GPLE needed in EM algorithm. Under frailty model: GPLE not consistent when frailty parameter is finite.

25 Comparisons of Estimators Under gamma frailty model. F = EXP(  ):  = 6 G = EXP(  ):  = 1 n = 50 # of Replications = 1000 Frailty parameter  took values: {  (IID Model), 6, 2} Computer programs: S-Plus and Fortran routines.

26 IID Simulated Comparison of the Three Estimators for Varying Frailty Parameter Black=GPLE; Blue=WCPLE; Red=FRMLE

27 Effect of the Frailty Parameter (  ) for Each of the Three Estimators (Black=Infty; Blue=6; Red=2)

28 The Three Estimates of MMC Period Survivor Function IID assumption seems acceptable. Estimate of frailty parameter  is 10.2.