EVENT PROJECTION Minzhao Liu, 2018

Slides:



Advertisements
Similar presentations
Bayesian inference of normal distribution
Advertisements

STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for.
Model Fitting Jean-Yves Le Boudec 0. Contents 1 Virus Infection Data We would like to capture the growth of infected hosts (explanatory model) An exponential.
Estimation A major purpose of statistics is to estimate some characteristics of a population. Take a sample from the population under study and Compute.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Lecture 23: Tues., Dec. 2 Today: Thursday:
1 Validation and Verification of Simulation Models.
Bootstrap Estimation of the Predictive Distributions of Reserves Using Paid and Incurred Claims Huijuan Liu Cass Business School Lloyd’s of London 10/07/2007.
Modeling clustered survival data The different approaches.
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
1 2. Reliability measures Objectives: Learn how to quantify reliability of a system Understand and learn how to compute the following measures –Reliability.
1 A MONTE CARLO EXPERIMENT In the previous slideshow, we saw that the error term is responsible for the variations of b 2 around its fixed component 
4-1 Continuous Random Variables 4-2 Probability Distributions and Probability Density Functions Figure 4-1 Density function of a loading on a long,
Survival Analysis A Brief Introduction Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event.
Use of Quantile Functions in Data Analysis. In general, Quantile Functions (sometimes referred to as Inverse Density Functions or Percent Point Functions)
Inference for regression - Simple linear regression
Chapter 13: Inference in Regression
Rob Lancaster, Orbitz Worldwide Survival Analysis & TTL Optimization.
1 Statistical Analysis - Graphical Techniques Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7370/5370 STAT 5340 : PROBABILITY AND.
Traffic Modeling.
MULTIPLE TRIANGLE MODELLING ( or MPTF ) APPLICATIONS MULTIPLE LINES OF BUSINESS- DIVERSIFICATION? MULTIPLE SEGMENTS –MEDICAL VERSUS INDEMNITY –SAME LINE,
3-2 Random Variables In an experiment, a measurement is usually denoted by a variable such as X. In a random experiment, a variable whose measured.
1 Statistical Distribution Fitting Dr. Jason Merrick.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
2 December 2004PubH8420: Parametric Regression Models Slide 1 Applications - SAS Parametric Regression in SAS –PROC LIFEREG –PROC GENMOD –PROC LOGISTIC.
Pro gradu –thesis Tuija Hevonkorpi.  Basic of survival analysis  Weibull model  Frailty models  Accelerated failure time model  Case study.
01/20151 EPI 5344: Survival Analysis in Epidemiology Actuarial and Kaplan-Meier methods February 24, 2015 Dr. N. Birkett, School of Epidemiology, Public.
Sampling and estimation Petter Mostad
Lecture 3: Parametric Survival Modeling
Treat everyone with sincerity,
2000 SEMINAR ON REINSURANCE PITFALLS IN FITTING LOSS DISTRIBUTIONS CLIVE L. KEATINGE.
The Mixed Effects Model - Introduction In many situations, one of the factors of interest will have its levels chosen because they are of specific interest.
Assumptions of Multiple Regression 1. Form of Relationship: –linear vs nonlinear –Main effects vs interaction effects 2. All relevant variables present.
Copyright © Cengage Learning. All rights reserved. 5 Joint Probability Distributions and Random Samples.
1 Statistical Analysis - Graphical Techniques Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7370/5370 STAT 5340 : PROBABILITY AND.
The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.
Selecting Input Probability Distributions. 2 Introduction Part of modeling—what input probability distributions to use as input to simulation for: –Interarrival.
Confidential and Proprietary Business Information. For Internal Use Only. Statistical modeling of tumor regrowth experiment in xenograft studies May 18.
Modeling and Simulation CS 313
4-1 Continuous Random Variables 4-2 Probability Distributions and Probability Density Functions Figure 4-1 Density function of a loading on a long,
Introductory Statistics and Data Analysis
Context Background Pharmacokinetic data consist of drug concentration measurements, as well as reports of some measured concentrations being below the.
Chapter 4 Continuous Random Variables and Probability Distributions
Logistic Regression APKC – STATS AFAC (2016).
Statistical Data Analysis - Lecture /04/03
12. Principles of Parameter Estimation
Impact of censoring on the statistical methods for handling non-proportional hazards in Immuno-Oncology studies Yifan Huang, Luping Zhao, Jiabu Ye, and.
Modeling and Simulation CS 313
Statistics in MSmcDESPOT
Chapter 8: Inference for Proportions
Statistics 262: Intermediate Biostatistics
Chapter 7: Sampling Distributions
Software Reliability Models.
It’s about Time! (to Event)
CHAPTER 29: Multiple Regression*
Elizabeth Garrett Giovanni Parmigiani
CHAPTER 26: Inference for Regression
Parametric Survival Models (ch. 7)
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
When You See (This), You Think (That)
Sampling Distributions
Cure Survival Data in Oncology Study
The hazard function The hazard function gives the so-called “instantaneous” risk of death (or failure) at time t, assuming survival up to time t. Estimate.
Joint Statistical Meetings 2018
12. Principles of Parameter Estimation
Berlin Chen Department of Computer Science & Information Engineering
Berlin Chen Department of Computer Science & Information Engineering
Lecture 4: Likelihoods and Inference
Lecture 4: Likelihoods and Inference
Fractional-Random-Weight Bootstrap
Presentation transcript:

EVENT PROJECTION Minzhao Liu, 2018 Today I am going to talk about the event projection or event prediction model I did for one of the study. It is still ongoing, but the slides were made early this year. EVENT PROJECTION Minzhao Liu, 2018

Background Subject: 538 Goal: 340 PFS Events Survival model: Death Disease Progression LTFU Ongoing Motivation come from study where deliverables are event-driven, for example, 50% interim analysis is triggered when certain number of PFS or OS events are met. What we are going to do: is to predict when and in what probability will hit the target, i.e. when we will have the target number of events. The Benefits are that we may have a better estimation on the timing of the final report for both business and marketing purposes. And we can use the estimation of the timing to better allocate programming and stat resources. For example here, we have a study where we have 538 subjects. Our goal is to have final analysis once we hit 340 PFS events. It’s an oncology study where survival endpoint is of primary interest. Subjects may have reached death, PD, which will count as PFS event, or LTFU (permanent censored) or ongoing. And our major target is to predict the future event time for those ongoing subjects.

Results Collages Before moving to the detail of the process, here is the aggregated results of the our projection based on about 10 datacuts. The dots and lines are cumulative number of PFS events count vs time, the red dotted line is the target count 340 (once we hit that, we will do final analysis). The original question is when we are going to hit that. And on the right side of the plot, is our prediction, say based on cut on 2016 June, there is 50% probability that we will get 340 PFS event in March 2018. This is how we illustrate the results

Intuition Event Censor Ongoing 1. Model Fitting 2. Prediction # E/C Date 1 E 2014/4/21 2 2014/6/11 C 2014/6/15 3 2014/6/27 … 340 2018/1/1 Event Censor Ongoing Back to our method, our intuition is quite straightforward. we put subjects into 3 categories. PFS Event, permanent censored, and ongoing. Event means subjects already had event PC means subjects is permanent censored who won’t have event anymore in the future. (such as LTFU, taking another ATX) Ongoing means the subject is still ongoing at the time data is cut and may have either PFS event or PC (LTFU) in the future. Our idea is to predict for that particular group of ongoing subjects. After we predict each ongoing subjects, we stack all the predicted events by chronicle order for both observed and projected. Then find the 340th event, that corresponding date will be our projected target date to hit the goal. Having that strategy set up, the next question come to us is how we are going to predict ongoing subject more accurately. Well, use observed information. Which lead to our next two steps: 1. fit model on observed data to capture the pattern and information. 2. using fitted parameters to predict future event time. 1. Model Fitting 2. Prediction

Survival model Parametric: 1. Model Fitting Generalized Gamma Weibull Exponential Model fitting is straightforward. We use the parametric survival model here. And fit with three distribution assumptions: generalized gamma, Weibull, exponential distribution.

Model Settings proc lifereg data = pfs outset = pe1 ; class trt; model aval * cnsr(1) = trt / dist = gamma; output out = b xbeta = lp cdf = p; run; Here’s the code we used. We can change the statement here for different distribution assumptions.

Goodness of Fit Model -2 log likelihood (smaller is better) Gamma 785.303 Exponential 819.12 Weibull 806.211 In terms of model fitting, or which model to use, we check the goodness of fit for each model and simple checking log likelihood help us to pick generalized gamma as our model

2. Prediction (Mean Residual Life) Definition (MRL): 𝑀𝑅𝐿 𝑥 =𝐸 𝑋 −𝑥 𝑋>𝑥] For continuous, 𝑀𝑅𝐿 𝑥 = 𝑥 ∞ 𝑡−𝑥 𝑓 𝑡 𝑑𝑡 𝑆(𝑥) = 𝑥 ∞ 𝑆 𝑡 𝑑𝑡 𝑆(𝑥) Can also show 𝑉 𝑋 =2 0 ∞ 𝑡𝑆 𝑡 𝑑𝑡 − 0 ∞ 𝑆 𝑡 𝑑𝑡 2 Here is the 2nd part. After we get the model estimated parameters, the next step is to predict event time for the ongoing subject. Theoretically we can have close form for the mean and variance of the residual life, where mean residual life is defined as mean of the remaining life given what we have observed.

2. Prediction (Individual Sample Simulation) Simulate generalized gamma distributed data for PROC LIFEREG a=1/(Shape**2); c=Scale/Shape; b=exp(xb)/(a**c); r=rangam(0,a); X=b*(r**c); where X is the generated survival time and xb can be the linear predictor Weibull (Shape = 1) Exponential (Shape = 1, Scale = 1) Lognormal (Shape = 0) Reference More practically, we used simulation method to simulate event time for each ongoing subjects. Which is more straightforward, and we can get a big picture when all subjects reaching number of events as a whole group. Below is the generalized formula. It can fit Weibull, exponential and lognormal distribution as well as they are just special case of generalized gamma distribution. The X random variable here will be the generated survival time.

2. Prediction (Individual Sample Simulation) Truncated Sampling for Ongoing Subjects: 𝑋 1 : the generated survival time Generate 𝑋 1 | 𝑋 1 > x: sample until generated survival time > current observed time Do the similar to model Time to (Permanent Censor) independently and generate 𝑋 2 time to PC Simulation example: However, it’s not over yet. Since we already observed that ongoing subjects, his future event time must be beyond that observed time; therefore it is truncated distribution or conditional distribution given his observed time. It’s easy to implement within simulation, we just need to re-sampling until we get a later event time than current observed time. Similarly we can independently do the projection for time to lost to FU. Then For each individual, by combining his projected event time and LTFU time, which happen the earlier, we will use that as the simulated result for that subject.

2. Prediction (Individual Sample Simulation) Stack simulated results Simulation#1 # E/C Date 1 E 2014/4/21 2 2014/6/11 C 2014/6/15 3 2014/6/27 … 340 2018/1/1 Once we get simulated result for each individuals, just stack them together for those who have events both observed and predicted. Take the 340th event date as the projected date for the simulation iteration.

2. Prediction (Individual Sample Simulation) Repeat Simulation 10,000 times Histogram of the Projection Date from 10,000 Simulations Repeat simulation 10000 times and we will get 10000 projected dates when we hit the target. Afterwards we can do whatever we want, like picking the median, mean, 25th and 75 quantiles, or make the histogram.

Restrictions and Limitation Inaccurate Censor model at early stage Date delay Data revert Limitation: At early stage, there are only few data points for LTFU model, therefore the overall projection is over-optimistic, the model would be more likely to assume that subject is more likely to get PD/Death sooner than LTFU; but after we get more data for subject LTFU, the model changed and the prediction is more likely to get the results that ongoing subjects will LTFU first before they get PD/Death. Data delay/change: it often happen that subjects may already have PFS event, but due to data delay, it was still showing the subjects is ongoing when we do the modeling. It also happen to us that number of events decreased after a new datacut. It is not that subject turning alive. It turn out to be that the subject used to be PD before, but later found out he has an earlier ATX, then he become permanent censored. His PD is not counted as event any more.

Summary Fit data using parametric survival models (generalized gamma/exponential/Weibull) On both PFS and Permanent Censor (PC) outcome Goodness of fit to pick the best model Simulate remaining Ongoing subjects using estimated parameters: Truncated sampling Consider both event and PC situation Stack simulated results and obtain predicted date to target Replicate simulations