On a dynamic approach to the analysis of multivariate failure time data Odd Aalen Section of Medical Statistics, University of Oslo, Norway.

Slides:



Advertisements
Similar presentations
Dummy Dependent variable Models
Advertisements

Assumptions underlying regression analysis
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
HSRP 734: Advanced Statistical Methods July 24, 2008.
The General Linear Model. The Simple Linear Model Linear Regression.
Cox Model With Intermitten and Error-Prone Covariate Observation Yury Gubman PhD thesis in Statistics Supervisors: Prof. David Zucker, Prof. Orly Manor.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Section 4.2 Fitting Curves and Surfaces by Least Squares.
MAE 552 Heuristic Optimization Instructor: John Eddy Lecture #19 3/8/02 Taguchi’s Orthogonal Arrays.
Nonparametric Estimation with Recurrent Event Data Edsel A. Pena Department of Statistics University of South Carolina Research.
Nonparametric Estimation with Recurrent Event Data Edsel A. Pena Department of Statistics, USC Research supported by NIH and NSF Grants Based on joint.
Introduction to Survival Analysis Seminar in Statistics 1 Presented by: Stefan Bauer, Stephan Hemri
Modeling clustered survival data The different approaches.
Probability and Statistics in Engineering Philip Bedient, Ph.D.
So are how the computer determines the size of the intercept and the slope respectively in an OLS regression The OLS equations give a nice, clear intuitive.
Survival Analysis A Brief Introduction Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event.
Analysis of Complex Survey Data
Elec471 Embedded Computer Systems Chapter 4, Probability and Statistics By Prof. Tim Johnson, PE Wentworth Institute of Technology Boston, MA Theory and.
2 Textbook Shavelson, R.J. (1996). Statistical reasoning for the behavioral sciences (3 rd Ed.). Boston: Allyn & Bacon. Supplemental Material Ruiz-Primo,
Lecture 16 Duration analysis: Survivor and hazard function estimation
Survival analysis Brian Healy, PhD. Previous classes Regression Regression –Linear regression –Multiple regression –Logistic regression.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 10: Survival Curves Marshall University Genomics Core.
Simple Linear Regression
Essentials of survival analysis How to practice evidence based oncology European School of Oncology July 2004 Antwerp, Belgium Dr. Iztok Hozo Professor.
Random variables Petter Mostad Repetition Sample space, set theory, events, probability Conditional probability, Bayes theorem, independence,
Chapter 13 Statistics © 2008 Pearson Addison-Wesley. All rights reserved.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
The Examination of Residuals. The residuals are defined as the n differences : where is an observation and is the corresponding fitted value obtained.
Inference for a Single Population Proportion (p).
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Random Sampling, Point Estimation and Maximum Likelihood.
Ch4 Describing Relationships Between Variables. Pressure.
1 Using martingale residuals to assess goodness of fit for sampled risk set data Ørnulf Borgan Department of Mathematics University of Oslo Based on joint.
MULTIPLE TRIANGLE MODELLING ( or MPTF ) APPLICATIONS MULTIPLE LINES OF BUSINESS- DIVERSIFICATION? MULTIPLE SEGMENTS –MEDICAL VERSUS INDEMNITY –SAME LINE,
Excepted from HSRP 734: Advanced Statistical Methods June 5, 2008.
Bayesian Analysis and Applications of A Cure Rate Model.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Ch4 Describing Relationships Between Variables. Section 4.1: Fitting a Line by Least Squares Often we want to fit a straight line to data. For example.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
“Further Modeling Issues in Event History Analysis by Robert E. Wright University of Strathclyde, CEPR-London, IZA-Bonn and Scotecon.
Borgan and Henderson:. Event History Methodology
Linear correlation and linear regression + summary of tests
HSRP 734: Advanced Statistical Methods July 17, 2008.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
Lecture 2 Review Probabilities Probability Distributions Normal probability distributions Sampling distributions and estimation.
Pro gradu –thesis Tuija Hevonkorpi.  Basic of survival analysis  Weibull model  Frailty models  Accelerated failure time model  Case study.
Section 10.1 Confidence Intervals
Two Main Uses of Statistics: 1)Descriptive : To describe or summarize a collection of data points The data set in hand = the population of interest 2)Inferential.
Geology 5670/6670 Inverse Theory 21 Jan 2015 © A.R. Lowry 2015 Read for Fri 23 Jan: Menke Ch 3 (39-68) Last time: Ordinary Least Squares Inversion Ordinary.
Agresti/Franklin Statistics, 1 of 88 Chapter 11 Analyzing Association Between Quantitative Variables: Regression Analysis Learn…. To use regression analysis.
Time Series Analysis and Forecasting. Introduction to Time Series Analysis A time-series is a set of observations on a quantitative variable collected.
Sampling and estimation Petter Mostad
1 Data Analysis Linear Regression Data Analysis Linear Regression Ernesto A. Diaz Department of Mathematics Redwood High School.
1 Using dynamic path analysis to estimate direct and indirect effects of treatment and other fixed covariates in the presence of an internal time-dependent.
1 Borgan and Henderson: Event History Methodology Lancaster, September 2006 Session 6.1: Recurrent event data Intensity processes and rate functions Robust.
The inference and accuracy We learned how to estimate the probability that the percentage of some subjects in the sample would be in a given interval by.
BPS - 5th Ed. Chapter 231 Inference for Regression.
1 Borgan and Henderson: Event History Methodology Lancaster, September 2006 Session 8.1: Cohort sampling for the Cox model.
Carolinas Medical Center, Charlotte, NC Website:
Inference for a Single Population Proportion (p)
Department of Mathematics
Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
CHAPTER 29: Multiple Regression*
CHAPTER 26: Inference for Regression
Joanna Romaniuk Quanticate, Warsaw, Poland
Kaplan-Meier and Nelson-Aalen Estimators
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Dynamic analysis of binary longitudinal data
Presentation transcript:

On a dynamic approach to the analysis of multivariate failure time data Odd Aalen Section of Medical Statistics, University of Oslo, Norway

2 Coworkers Ørnulf Borgan: Institute of Mathematics, University of Oslo Johan Fosen: Section of Medical Statistics, University of Oslo Harald Fekjaer: The Norwegian Cancer Registry

3 What are multivariate survival data? 1. Repeated events over time. 2. Possibly, each individual may have several ”units on test”. 3. Complication (as always) is censoring. 4. The long-term ambition is to tackle complex event histories with many different types of events.

4 Example: Small bowel motility Study cyclic pattern of motility (spontaneous movements) of the small bowel in humans. Focus on MMC complexes which come with irregular intervals (lasting from minutes to several hours). Motility is very important from a clinical point of view. Data studied by frailty models in (Aalen & Husebye, 1991). Will here apply dynamic models instead.

5 Small bowel motility data (Husebye)

6 Other examples Duration of amalgam fillings  Each patient contributes a number of fillings Repeated tumors in animal experiments Continuous registration of sleep, moving into and out of various sleep states

7 How are such data analysed? By marginal models. Dependence in the data is generally ignored, except for estimating standard errors. By frailty models. One assumes that each individual has a separate risk of the event occurring.  Frailty models are random effect models.  Censoring is handled well by frailty models.  Excellent state of the art book: Hougaard, Will propose alternative method based on regression on dynamic covariates.

8 Cox 1972: Two major papers The famous one  Cox models for ordinary survival data  JRSSB, 1972, 34, The ignored one  Cox models for point processes  introducing dynamic covariates, e.g. ”time since last event”  In: P.A.W.Lewis, ”Stochastic point processes: Statistical analysis, theory and applications”, Wiley, 1972, It is time to take up the challenge of Cox’s second paper

9 Counting process framework What is a counting process? Observing events occurring over time. Examples of events:  waking up during night  amalgam filling failing  detecting a tumor Counting the number of events as they come along yields a counting process The counting process is denoted N(t) where t is time. The process is constant between events and jumps one unit at each event

10 Illustration of a counting process Time

11 The intensity process of a counting process Definition of the intensity process: Extremely fruitful reformulation of the definition: The fact that a counting process N(t) has an intensity process (t) can be made precise by the following mathematical statement:

12 Martingale for simulated Poisson process with rate 1

13 Stochastic integrals If M(t) is a martingale, then the following is also a martingale: where H(t) is (essentially) any stochastic process dependent on the past, and with left-continuous sample functions  Useful properties of stochastic integrals include explicit formulas for variances and central limit theorems

14 Following up on Cox 1972b: Dynamic models Dynamic models incorporate past observation in the analysis. Example: Frailty induces dependence over time.  e.g. several previous events increases likelihood of new event  hence frailty models can be analyzed as dynamic models  this connection can be made mathematically precise In addition, there are real dynamic effects. (Which may be difficult to distinguish from frailty effects.)

15 Regression on dynamic covariates Dynamic covariates may be defined:  number of previous events  time since last occurrence  or in fact any function of the past (due to martingale theory) Dynamic covariates are continuously updated. Dynamic regression may be more flexible than a frailty approach, since effects may change over time.

16 Types of regression The data are a number of counting processes, with many possible events in each, and observed in parallel Can use  Cox regression  additive regression Will focus on additive regression  This is a local approach, as opposed to Cox regression  Basic idea: Whenever an event occurs, a linear model is estimated with dependent variable being a vector of 0’s, except for a 1 in the process where the event happened.  Individually, these estimates are not informative, but summing them up over time yields something sensible.

17 Additive intensity regression For each individual in the risk set, the intensity process for individual i is defined as a linear function of the covariates: where K i (t) is the number of units at risk (e.g. amalgam fillings) for the individual. The regression functions (  ’s) are arbitrary functions, while the covariates ( Z’ s) are arbitrary predictable processes (e.g. adapted with left- continuous sample paths).

18 Why additivity (linearity)? Seems unnatural since the intensity should be positive. However: Additivity yields complete flexibility as to how effects of covariate change over time. Also: Additivity yields exact martingales in several settings, which is technically convenient. The Cox model never yields exact martingales. In practice, effects are not always (in fact, usually not?) proportional. The additive model can be connected up with other linear models, e.g. for the covariates, and connected into path analysis.

19 Additive model: Local least square estimation Easiest to estimate cumulative regression functions: The slope of these gives information about the regression functions. Estimate defined as stochastic integral of counting proess, for a suitable design matrix Y(t) : where Y(t) - is a generalized inverse.

20 Residual processes Martingale residual processes are defined as follows: l These are exact martingales. l For judging the influence of outliers, one may look at the sum of the hat matrices over jump times:

21 Theory for additive model There exists much theory, including:  asymptotics, testing, residuals, density type estimation of regression functions, ridge regression, optimal procedures  estimating covariate effects on transition probabilities in Markov chains Most theory is based on stochastic integrals and martingales. See e.g. the book by Andersen, Borgan, Gill and Keiding (1993). Aalen et al, Biometrics, 2001.

22 Dynamic analysis of frailty: Simulation Simulating 40 independent Poisson processes Rate in each process simulated from an exponential distribution with expectation 1. The rate serves as a frailty variable For each counting processes we define a dynamic covariate to be the number of previous events in the process

23 Cumulative regression function for dynamic covariate l Cumulative regression function with 95% confidence limits

24 Standardized residual processes l Cumulative residual processes shown left and kernel estimated processes shown right. l Upper panels: No covariate l Lower panels: Dynamic covariate

25 Small bowel motility Dynamic covariates  Number of previous events  Time since last event (cut point at 50 minutes) Cox regression can be applied. Hazard ratios with 95% confidence intervals: 0.98 (0.76, 1.27) 4.66 (2.36, 9.19) Illustrated by additive model on the next slides, which shows  Number of previous occurences has no effect  Time since last occurence does have an effect

26 Dynamic covariate I: number of previous occurrences (upper and lower curves give pointwise 95% confidence intervals) There is clearly no effect of the covariate

27 Dynamic covariate II: time since last occurrence ( above or below 50 days )

28 Example: Repeated tumors Gail, Santner and Brown (1980). Carcinogen injected at day 0 in 76 female rats. Then treated with retinyl acetate for 60 days. The 48 animals which were still tumor free, where randomised to continued retinoid prophylaxis, or to control. The animals were followed until 182 days after the initial injection. Several tumors were observed in most animals, and the time of each tumor was recorded.

29 Data on repeated mammary tumors. Additive model with one fixed and three dynamic covariates. Cumulative baseline intensity Treatment effect Number of previous occurrencesTime since previous event

30 Residual plots I l Fitting all covariates (treatment and dynamic ones). Left panel shows standardized residual. Right panel shows mean and standard deviation of standardized residuals.

31 Residual plots II Using only treatment as covariate. Left panel shows standardized residual. Right panel shows mean and standard deviation of standardized residuals.

32 Influence plot Cumulative sum of hat matrices Straight line marks the limit for influential processes

33 Conclusion for mammary tumors Findings  Constant rate over time.  Effect of treatment.  Effect of number previous occurrences.  No effect of time since last occurrence. Type of process:  Markovian. Several Poisson processes with varying rate.

34 Example of more complex event history: Sleep data Data collected at Max Planck institute in Munich concerning sleep patterns.  Analysing tendency to fall asleep, wake up, have REM periods etc.  Clearly, many occurrences of events each night. Example of counting process: number of times you wake up Dynamic covariates  measurement of cortisol (stress hormone)  number of previous events divided by elapsed time  log duration of ongoing sleep period

35 Kernel estimates of regression functions

36 Interpretation Strong dynamic effects. Could be:  Frailty  Real causal effects Only additional information will tell us which is which. The approach is purely empirical as apart from the latent variable thinking of frailty models

37 Warning Dynamic covariates may ”steal” from the effect of fixed covariates. One therefore has to be careful, using orthogonalization or path analysis type methods. This is presently being developed. The ”locality” of the additive approach makes this easy to handle.

38 General event histories Many events of many different types. Examples:  individual histories of sick- leave, part-time work, full-time work, with many transitions between different states  individual histories of illness There are no good tools for handling complex event histories. The present approach is one attempt that we will develop further.

39 Additive regression in practice An S-Plus computer program named Addreg may be found in: Information on research in event history analysis in Oslo may be found on the web page:

40 References Hougaard, P. (2000). Analysis of Multivariate Survival Data. Springer-Verlag, New York Andersen, P.K., Borgan, Ø., Gill, R.D. and Keiding, N. (1993). Statistical Models Based on Counting Processes. Springer-Verlag, New York Aalen, O.O.; Borgan, Ø.; Fekjær, H. (2001). Covariate adjustment of event histories estimated from Markov chains: The additive approach – Biometrics, 57: