HSRP 734: Advanced Statistical Methods July 10, 2008.

Slides:



Advertisements
Similar presentations
Survival Analysis. Key variable = time until some event time from treatment to death time for a fracture to heal time from surgery to relapse.
Advertisements

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 9 Inferences Based on Two Samples.
Survival Analysis-1 In Survival Analysis the outcome of interest is time to an event In Survival Analysis the outcome of interest is time to an event The.
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
Hypothesis Testing Steps in Hypothesis Testing:
Introduction to Survival Analysis October 19, 2004 Brian F. Gage, MD, MSc with thanks to Bing Ho, MD, MPH Division of General Medical Sciences.
HSRP 734: Advanced Statistical Methods July 24, 2008.
1 Statistics 262: Intermediate Biostatistics Kaplan-Meier methods and Parametric Regression methods.
Survival analysis 1 The greatest blessing in life is in giving and not taking.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Copyright © Cengage Learning. All rights reserved. 9 Inferences Based on Two Samples.
1 Analysis of Variance This technique is designed to test the null hypothesis that three or more group means are equal.
Statistics II: An Overview of Statistics. Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution.
Lecture 3 Survival analysis. Problem Do patients survive longer after treatment A than after treatment B? Possible solutions: –ANOVA on mean survival.
Survival analysis1 Every achievement originates from the seed of determination.
Chapter 7 Sampling and Sampling Distributions
Biostatistics in Research Practice Time to event data Martin Bland Professor of Health Statistics University of York
Inference about a Mean Part II
Today Concepts underlying inferential statistics
Measures of disease frequency (I). MEASURES OF DISEASE FREQUENCY Absolute measures of disease frequency: –Incidence –Prevalence –Odds Measures of association:
5-3 Inference on the Means of Two Populations, Variances Unknown
Survival Analysis A Brief Introduction Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event.
Analysis of Complex Survey Data
Survival analysis Brian Healy, PhD. Previous classes Regression Regression –Linear regression –Multiple regression –Logistic regression.
1 Kaplan-Meier methods and Parametric Regression methods Kristin Sainani Ph.D. Stanford University Department of Health.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 10: Survival Curves Marshall University Genomics Core.
AM Recitation 2/10/11.
Inference for regression - Simple linear regression
Lecture 9: Hypothesis Testing One sample tests >2 sample.
Essentials of survival analysis How to practice evidence based oncology European School of Oncology July 2004 Antwerp, Belgium Dr. Iztok Hozo Professor.
NASSER DAVARZANI DEPARTMENT OF KNOWLEDGE ENGINEERING MAASTRICHT UNIVERSITY, 6200 MAASTRICHT, THE NETHERLANDS 22 OCTOBER 2012 Introduction to Survival Analysis.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Lecture 3 Survival analysis.
The Examination of Residuals. The residuals are defined as the n differences : where is an observation and is the corresponding fitted value obtained.
On Model Validation Techniques Alex Karagrigoriou University of Cyprus "Quality - Theory and Practice”, ORT Braude College of Engineering, Karmiel, May.
10-1 Introduction 10-2 Inference for a Difference in Means of Two Normal Distributions, Variances Known Figure 10-1 Two independent populations.
1 Introduction to medical survival analysis John Pearson Biostatistics consultant University of Otago Canterbury 7 October 2008.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
INTRODUCTION TO SURVIVAL ANALYSIS
Chapter 12 Survival Analysis.
01/20151 EPI 5344: Survival Analysis in Epidemiology Survival curve comparison (non-regression methods) March 3, 2015 Dr. N. Birkett, School of Epidemiology,
HSRP 734: Advanced Statistical Methods July 17, 2008.
Confidence intervals and hypothesis testing Petter Mostad
Applied Epidemiologic Analysis - P8400 Fall 2002 Lab 9 Survival Analysis Henian Chen, M.D., Ph.D.
Data Analysis for Two-Way Tables. The Basics Two-way table of counts Organizes data about 2 categorical variables Row variables run across the table Column.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Ch11: Comparing 2 Samples 11.1: INTRO: This chapter deals with analyzing continuous measurements. Later, some experimental design ideas will be introduced.
01/20151 EPI 5344: Survival Analysis in Epidemiology Actuarial and Kaplan-Meier methods February 24, 2015 Dr. N. Birkett, School of Epidemiology, Public.
© Copyright McGraw-Hill 2004
T Test for Two Independent Samples. t test for two independent samples Basic Assumptions Independent samples are not paired with other observations Null.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Copyright © Cengage Learning. All rights reserved. 9 Inferences Based on Two Samples.
The p-value approach to Hypothesis Testing
Lesson Test to See if Samples Come From Same Population.
Chapter 13 Understanding research results: statistical inference.
Chapter 9: Introduction to the t statistic. The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown.
Topic 19: Survival Analysis T = Time until an event occurs. Events are, e.g., death, disease recurrence or relapse, infection, pregnancy.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
STATISTICS People sometimes use statistics to describe the results of an experiment or an investigation. This process is referred to as data analysis or.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
Chapter 10: The t Test For Two Independent Samples.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
April 18 Intro to survival analysis Le 11.1 – 11.2
Chapter 8: Inference for Proportions
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
Data Analysis for Two-Way Tables
Chapter 23 Comparing Means.
Chapter 10 Analyzing the Association Between Categorical Variables
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Presentation transcript:

HSRP 734: Advanced Statistical Methods July 10, 2008

Objectives Describe the Kaplan-Meier estimated survival curve Describe the Kaplan-Meier estimated survival curve Describe the log-rank test Describe the log-rank test Use SAS to implement Use SAS to implement

Kaplan-Meier Estimate of Survival Function S(t) The Kaplan-Meier estimate of the survival function is a simple, useful and popular estimate for the survival function. The Kaplan-Meier estimate of the survival function is a simple, useful and popular estimate for the survival function. This estimate incorporates both censored and noncensored observations This estimate incorporates both censored and noncensored observations Breaks the estimation problem down into small pieces Breaks the estimation problem down into small pieces

Kaplan-Meier Estimate of the Survival Function S(t) For grouped survival data, For grouped survival data, Let interval lengths L j become very small – all of length L=  t and let t 1, t 2, … be times of events (survival times) Let interval lengths L j become very small – all of length L=  t and let t 1, t 2, … be times of events (survival times)

Kaplan-Meier Estimate of the Survival Function S(t) 2 cases to consider in the previous equation 2 cases to consider in the previous equation Case 1. No event in a bin (interval) Case 1. No event in a bin (interval) does not change — which means that we can ignore bins with no events does not change — which means that we can ignore bins with no events

Kaplan-Meier Estimate of the Survival Function S(t) Case 2. y j events occur in a bin (interval) Case 2. y j events occur in a bin (interval) Also: n j persons enter the bin assume any censored times that occur in the bin occur at the end of the bin

Kaplan-Meier Estimate of the Survival Function S(t) So, as  t → 0, we get the Kaplan- Meier estimate of the survival function S(t) So, as  t → 0, we get the Kaplan- Meier estimate of the survival function S(t) Also called the “product-limit estimate” of the survival function S(t) Also called the “product-limit estimate” of the survival function S(t) Note: each conditional probability estimate is obtained from the observed number at risk for an event and the observed number of events (n j -y j ) / n j Note: each conditional probability estimate is obtained from the observed number at risk for an event and the observed number of events (n j -y j ) / n j

Kaplan-Meier Estimate of Survival Function S(t) We begin by We begin by Rank ordering the survival times (including the censored survival times) Rank ordering the survival times (including the censored survival times) Define each interval as starting at an observed time and ending just before the next ordered time Define each interval as starting at an observed time and ending just before the next ordered time Identify the number at risk within each interval Identify the number at risk within each interval Identify the number of events within each interval Identify the number of events within each interval Calculate the probability of surviving within that interval Calculate the probability of surviving within that interval Calculate the survival function for that interval as the probability of surviving that interval times the probability of surviving to the start of that interval Calculate the survival function for that interval as the probability of surviving that interval times the probability of surviving to the start of that interval

Group Weeks in remission -- ie, time to relapse Maintenance chemo (X=1) 9, 13, 13+, 18, 23, 28+, 31, 34, 45+, 48, 161+ No maintenance chemo (X=0) 5, 5, 8, 8, 12, 16+, 23, 27, 30+, 33, 43, 45 Example - AML + indicates a censored time to relapse; e.g., 13+ = more than 13 weeks to relapse

Example – AML Calculation of Kaplan-Meier estimates: Calculation of Kaplan-Meier estimates: In the “not maintained on chemotherapy” group: Time At risk Events tjtjtjtj njnjnjnj yjyjyjyj x ((12-2)/12) = x ((10-2)/10) = x ((8-1)/8) = x ((6-1)/6) = x ((5-1)/5) = x ((3-1)/3) = x ((2-1)/2) = x ((1-1)/1) = 0

Example – AML (cont’d) In the “maintained on chemotherapy” group: Time At risk Events tjtjtjtj njnjnjnj yjyjyjyj x ((11-1)/11) = x ((10-1)/10) =

Example – AML (cont’d) The “Kaplan-Meier curve” plots the estimated survival function vs. time — separate curves for each group The “Kaplan-Meier curve” plots the estimated survival function vs. time — separate curves for each group

Example – AML (cont’d) Notes Notes — Can count the total number of events by counting the number of steps (times) — If feasible, picture the censoring times on the graph as shown above.

Kaplan-Meier Estimate Using SAS

Comments on the Kaplan-Meier Estimate If the event and censoring times are tied, we assume that the censoring time is slightly larger than the death time. If the event and censoring times are tied, we assume that the censoring time is slightly larger than the death time. If the largest observation is an event, the Kaplan-Meier estimate is 0. If the largest observation is an event, the Kaplan-Meier estimate is 0. If the largest observation is censored, the Kaplan-Meier estimate remains constant forever. If the largest observation is censored, the Kaplan-Meier estimate remains constant forever.

Comments on the Kaplan-Meier Estimate If we plot the empirical survival estimates, we observe a step function. If there are no ties and no censoring, the step function drops by 1/n. If we plot the empirical survival estimates, we observe a step function. If there are no ties and no censoring, the step function drops by 1/n. With every censored observation the size of the steps increase. With every censored observation the size of the steps increase. When does the number of intervals equal the number of deaths in the sample? When does the number of intervals equal the number of deaths in the sample? When does the number of intervals equal n? When does the number of intervals equal n?

Comments on the Kaplan-Meier Estimate The Kaplan-Meier is a consistent estimate of the true S(t). That means that as the sample size gets large, KM estimate converges to the true value. The Kaplan-Meier is a consistent estimate of the true S(t). That means that as the sample size gets large, KM estimate converges to the true value. The Kaplan-Meier estimate can be used to empirically estimate any cumulative distribution function The Kaplan-Meier estimate can be used to empirically estimate any cumulative distribution function

Comments on the Kaplan-Meier Estimate The step function in K-M curve really looks like this: The step function in K-M curve really looks like this: If you have a failure at t 1 then you want to say survivorship at t 1 should be less than 1. If you have a failure at t 1 then you want to say survivorship at t 1 should be less than 1. For small data sets it matters, but for large data sets it does not matter. For small data sets it matters, but for large data sets it does not matter.

Confidence Interval for S(t) – Greenwood’s Formula Greenwood’s formula for the variance of : Greenwood’s formula for the variance of : Using Greenwood’s formula, an approximate 95% CI for S(t) is Using Greenwood’s formula, an approximate 95% CI for S(t) is There is a “problem”: the 95% CI is not constrained to lie within the interval (0,1) There is a “problem”: the 95% CI is not constrained to lie within the interval (0,1)

Confidence Interval for S(t) – Alternative Formula Based on log(-log(S(t)) which ranges from -∞ to ∞ Based on log(-log(S(t)) which ranges from -∞ to ∞ Find the standard error of above, find the CI of above, then transform CI to one for S(t) Find the standard error of above, find the CI of above, then transform CI to one for S(t) This CI will lie within the interval [0,1] This CI will lie within the interval [0,1] This is the default in SAS This is the default in SAS

Log-rank test for comparing survivor curves Are two survivor curves the same? Are two survivor curves the same? Use the times of events: t 1, t 2,... Use the times of events: t 1, t 2,... (do not include censoring times) Treat each event and its “set of persons still at risk” (i.e., risk set) at each time t j as an independent table Treat each event and its “set of persons still at risk” (i.e., risk set) at each time t j as an independent table Make a 2×2 table at each t j Make a 2×2 table at each t j Event No Event Total Group A ajajajaj n jA - a j n jA Group B cjcjcjcj n jB -c j n jB Total djdjdjdj n j -d j njnjnjnj

Log-rank test for comparing survivor curves At each event time t j, under assumption of equal survival (i.e., S A (t) = S B (t) ), the expected number of events in Group A out of the total events (d j =a j +c j ) is in proportion to the numbers at risk in group A to the total at risk at time t j : At each event time t j, under assumption of equal survival (i.e., S A (t) = S B (t) ), the expected number of events in Group A out of the total events (d j =a j +c j ) is in proportion to the numbers at risk in group A to the total at risk at time t j : Ea j = d j x n jA / n j Differences between a j and Ea j represent evidence against the null hypothesis of equal survival in the two groups Differences between a j and Ea j represent evidence against the null hypothesis of equal survival in the two groups

Log-rank test for comparing survivor curves Use the Cochran Mantel-Haenszel idea of pooling over events j to get the log-rank chi-squared statistic with one degree of freedom Use the Cochran Mantel-Haenszel idea of pooling over events j to get the log-rank chi-squared statistic with one degree of freedom

Log-rank test for comparing survivor curves Idea summary: Idea summary: Create a 2x2 table at each uncensored failure time Create a 2x2 table at each uncensored failure time The construct of each 2x2 table is based on the corresponding risk set The construct of each 2x2 table is based on the corresponding risk set Combine information from all the tables Combine information from all the tables The null hypothesis is S A (t) = S B (t) for all time t. The null hypothesis is S A (t) = S B (t) for all time t.

Comparisons across Groups Extensions of the log-rank test to several groups require knowledge of matrix algebra. In general, these tests are well approximated by a chi- squared distribution with G-1 degrees of freedom. Extensions of the log-rank test to several groups require knowledge of matrix algebra. In general, these tests are well approximated by a chi- squared distribution with G-1 degrees of freedom. Alternative tests: Alternative tests: Wilcoxon family of tests (including Peto test) Wilcoxon family of tests (including Peto test) Likelihood ratio test (SAS) Likelihood ratio test (SAS)

Comparison between Log-Rank and Wilcoxon Tests The log-rank test weights each failure time equally. No parametric model is assumed for failure times within a stratum. The log-rank test weights each failure time equally. No parametric model is assumed for failure times within a stratum. The Wilcoxon test weights each failure time by a function of the number at risk. Thus, more weight tends to be given to early failure times. As in the log-rank test, no parametric model is assumed for failure times within a stratum. The Wilcoxon test weights each failure time by a function of the number at risk. Thus, more weight tends to be given to early failure times. As in the log-rank test, no parametric model is assumed for failure times within a stratum. Between these two tests (Wilcoxon and log-rank tests), the Wilcoxon test will tend to be better at picking up early departures from the null hypothesis and the log-rank test will tend to be more sensitive to departures in the tail. Between these two tests (Wilcoxon and log-rank tests), the Wilcoxon test will tend to be better at picking up early departures from the null hypothesis and the log-rank test will tend to be more sensitive to departures in the tail.

Comparison with Likelihood Ratio Test in SAS The likelihood ratio test employed in SAS assumes the data within the various strata are exponentially distributed and censoring in non- informative. Thus, this is a parametric method that smoothes across the entire curve. The likelihood ratio test employed in SAS assumes the data within the various strata are exponentially distributed and censoring in non- informative. Thus, this is a parametric method that smoothes across the entire curve.