Download presentation
Presentation is loading. Please wait.
Published byFelix Lee Modified over 9 years ago
1
Survival analysis
2
First example of the day Small cell lungcanser Meadian survival time: 8-10 months 2-year survival is 10% New treatment showed median survival of 13.2months
3
Progressively censored observations Current life table Completed dataset Cohort life table Analysis “on the fly”
4
Problem Do patients survive longer after treatment 1 than after treatment 2? Possible solutions: ANOVA on mean survival time? ANOVA on median survival time? 100 person years of observation: How long has the average person been in the study. 10 persons being observed for 10 years 100 persons being observed for 100 years
5
Life table analysis A sub-set of 13 patients undergoing the same treatment
6
Life table analysis Time interval chosen to be 3 months n i number of patients starting a given period
7
Life table analysis d i number of terminal events, in this example; progression/response w i number of patients that have not yet been in the study long enough to finish this period
8
Life table analysis Number exposed to risk: n i – w i /2 Assuming that patients withdraw in the middle of the period on average.
9
Life table analysis q i = d i /(n i – w i /2) Proportion of patients terminating in the period
10
Life table analysis p i = 1 - q i Proportion of patients surviving
11
Life table analysis S i = p i p i-1...p i-N Cumulative proportion of surviving Conditional probability
12
Survival curves How long will a lung canser patient keep having canser on this particular treatment?
13
Kaplan-Meier Simple example with only 2 ”terminal-events”.
14
Confidence interval of the Kaplan-Meier method Fx after 32 months
15
Confidence interval of the Kaplan-Meier method Survival plot for all data on treatment 1 Are there differences between the treatments?
16
Comparing Two Survival Curves One could use the confidence intervals… But what if the confidence intervals are not overlapping only at some points? Logrank-stats Hazard ratio Mantel-Haenszel methods
17
Comparing Two Survival Curves The logrank statistics Aka Mantel-logrank statistics Aka Cox-Mantel-logrank statistics
18
Comparing Two Survival Curves Five steps to the logrank statistics table 1.Divide the data into intervals (eg. 10 months) 2.Count the number of patients at risk in the groups and in total 3.Count the number of terminal events in the groups and in total 4.Calculate the expected numbers of terminal events e.g. (31-40) 44 in grp1 and 46 in grp2, 4 terminal events. expected terminal events 4x(44/90) and 4x(46/90) 5.Calculate the total
19
Comparing Two Survival Curves Smells like Chi-Square statistics
20
Comparing Two Survival Curves Hazard ratio
21
Comparing Two Survival Curves Mantel Haenszel test Is the OR significant different from 1? Look at cell (1,1) Estimated value, E(a i ) Variance, V(a i ) row total * column total grand total
22
Comparing Two Survival Curves Mantel Haenszel test df = 1; p>0.05
23
Hazard function d is the number of terminal events f is the sum of failure times c is the sum of censured times
24
Logistic regression Who survived Titanic?
25
25 The sinking of Titanic Titanic sank April 14th 1912 with 2228 souls 705 survived. A dataset of 1309 passengers survived. Who survived?
26
26 The data Sibsp is the number of siblings and/or spouses accompanying Parsc is the number of parents and/or children accompanying Some values are missing Can we predict who will survive titanic II? pclasssurvivednamesexagesibspparch 11Allen, Miss. Elisabeth Waltonfemale2900 11Allison, Master. Hudson Trevormale0.916712 10Allison, Miss. Helen Lorainefemale212 10Allison, Mr. Hudson Joshua Creightonmale3012 10Allison, Mrs. Hudson J C (Bessie Waldo Daniels)female2512 11Anderson, Mr. Harrymale4800 11Andrews, Miss. Kornelia Theodosiafemale6310 10Andrews, Mr. Thomas Jrmale3900 11Appleton, Mrs. Edward Dale (Charlotte Lamson)female5320
27
27 Analyzing the data in a (too) simple manner Associations between factors without considering interactions
28
28 Analyzing the data in a (too) simple manner Associations between factors without considering interactions
29
29 Analyzing the data in a (too) simple manner Associations between factors without considering interactions
30
30 Could we use multiple linear regression to predict survival? multiple linear regressionLogistic regression Response variable is defined between –inf and +inf Response variable is defined between 0 and 1 Normal distributedBernoulli distributed
31
31 Logit transformation is modeled linearly The logistic function
32
32 The sigmodal curve
33
33 The sigmodal curve The intercept basically just ‘scale’ the input variable
34
34 The sigmodal curve The intercept basically just ‘scale’ the input variable Large regression coefficient → risk factor strongly influences the probability
35
35 The sigmodal curve The intercept basically just ‘scale’ the input variable Large regression coefficient → risk factor strongly influences the probability Positive regression coefficient → risk factor increases the probability
36
36 Logistic regression of the Titanic data
37
37 Logistic regression of the Titanic data – passenger class 1.Summary of data 2.Coding of the dependent variable 3.Coding of the categorical explanatory variable: First class: 1 Second class: 2 Third class: reference
38
38 Logistic regression of the Titanic data – passenger class A fit of the null-model, basically just the intercept. Usually not interesting The total probability of survival is 500/1309 = 0.382. Cutoff is 0.5 so all are classified as non- survivers. Basically tests if the null-model is sufficient. It almost certainly is not. Shows that survival is related to pclass (which is not in the null- model)
39
39 Logistic regression of the Titanic data – passenger class 1.Omnibus test: Uses LR to describe if the adding the pclass variable to the model makes it better. It did! But better than the null-model, so no surprise. 2.Model Summary. Other measures of the goodness of fit. 3.Classification table: By including pclass 67.7 passengers were correctly categorized. 4.Variables in the equation: first line repeats that pclass has a significant effect on survival. B is the logistic fittet parameter. Exp(B) is the odds rations, so the odds of survival is 4.7 (3.6-6.3) times higher than passengers on third class (reference class)
40
40 Logistic regression of the Titanic data – Adding age to the model Ups… Some data points are missing And the null model is poorer
41
41 Logistic regression of the Titanic data – Adding age to the model Cox and Senll’s R-square increased from 0.093 to 0.141, indicating a better model By this model we can classify 69.1% passenger class only classified 67.7%
42
42 Logistic regression of the Titanic data – Adding age to the model Age has a significant influence on survival. The odds ratio of age is 0.963 So the odds of a 31 year old is 0.963 times the odds of a 30 year old. Or the odds for a 30 year old to survive is 1/0.963 = 1.038 times larger than that of a 31 year old
43
43 Logistic regression of the Titanic data – Age alone The model is extremely poor Consequently age appear to be insignificant in estimating survival.
44
44 Logistic regression of the Titanic data – Adding family and sex The model is becoming better
45
45 Logistic regression of the Titanic data – Using the model as to predict What is the probability that a 25 year old woman accompanied only by her husband holding a second class ticket would survive Titanic? z = -2.703 -0.041*25 +2.552 +1.718 +0.925 = 1.4670
46
46 Using the model to predict survival What is the probability that a 25 year old woman accompanied only by her husband holding a second class ticket would survive Titanic? z = -3.929 -0.589*(-5)/14.41 +1.718 +2.552 +0.926 = 1.4714
47
47 Is it realistic that Leonardo survives and the chick dies?
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.