Presentation is loading. Please wait.

Presentation is loading. Please wait.

Survival Analysis: An Introductory Course Scott Harris October 2009.

Similar presentations


Presentation on theme: "Survival Analysis: An Introductory Course Scott Harris October 2009."— Presentation transcript:

1 Survival Analysis: An Introductory Course Scott Harris October 2009

2 2 Learning outcomes By the end of this session you should: know when to apply survival methods; understand how to use the survival techniques in SPSS and the differences between them; be able to produce and interpret life tables; be able to produce and interpret Kaplan-Meier curves;

3 3 Contents Introduction –When/why use survival analysis. –Types of survival/time to event data. Life table analysis –Producing life tables by hand. –Producing life tables in SPSS. Kaplan-Meier –Producing Kaplan-Meier plots in SPSS. –Comparison of Kaplan-Meier survival curves (Log- rank test) in SPSS.

4 4 Dataset 1: Typical survival dataset Not all survival times known (Limited follow-up) Cancer groupTime to deathDeath status Pancreatic39Deceased Breast? (>45)Alive Breast? (>68)Alive Pancreatic94Deceased Pancreatic67Deceased Breast352Deceased

5 5 Survival analysis? Potentially missing values –Lost to follow-up –Withdrew from study Limited duration of follow-up –Some patients still alive – yet to experience the event of interest (death) Comparative analysis –Survival analysis methods

6 6 Beijing, 100m final: AthleteCountryTimeStatus Usain BoltJAM9.69Finished Richard ThompsonTRI9.89Finished Walter DixUSA9.91Finished Churandy MartinaAHO9.93Finished Asafa PowellJAM9.95Finished Michael FraterJAM9.97Finished Marc BurnsTRI10.01Finished Darvis PattonUSA10.03Finished Dataset 2

7 7 Survival analysis? (Time to event) Potentially missing values –Disqualified –Injured Short duration of follow-up –Everyone who finishes will have a time Comparative Analysis (JAM vs. Other ) –Survival Methods possible but... –Independent samples t test (Normal) –Mann-Whitney test (Non parametric)

8 8 When to use survival methods? Time to event data –Duration between treatment and death –Time from admission to successful discharge from hospital –Time from starting a diet to losing 10 lbs. –Time from release to watching the new Harry Potter film The event may or may not happen: –Ever (some people will die in hospital before being discharged) –In the time period concerned (limited follow-up)

9 9 Censoring Censoring occurs when we have missing information. Left Censoring: Unclear on exact start of monitoring –Missing date of birth –Unknown date of starting treatment –Experiences event before inclusion in study Right censoring: Some individuals may not be observed for the full time to event –Loss to follow-up –Drop out –Termination of study / follow-up

10 10 Right censoring Study startEnd of follow-up Time (days) Subject Key Event Censored time Actual event may occur here but having stopped follow- up earlier this would be missed. No event but no more follow-up Earliest event

11 11 Right censoring – Staggered start Study startEnd of follow-up Time (days) Subject Key Event Censored time Actual event may occur here but having stopped follow- up earlier this would be missed. No event but no more follow-up Quickest event

12 The Example Dataset

13 13 SPSS – Survival time data In SPSS (as with other packages) we require the following two variables when dealing with survival time data: –A continuous time variable that measures the time until either the event or the individuals withdrawal (censoring). –A categorical variable that acts as an indicator for whether the subject experienced the event of interest or whether they did not and were censored.

14 14 Example dataset Time to event data for two groups (Group A and Group B): Coded 1 and 2 respectively. Time in days until event or until end of follow-up. Whether the individual has had the event of interest (‘No event’ and ‘Event’): Coded 0 and 1 respectively. The age of the individual at the start of the study.

15 15 Example dataset GroupTimeStatusAge A9Event65 A12No event61 A14Event57 A14Event55 A16No event50 A18Event52 A24Event51 A30No event50 GroupTimeStatusAge B3Event70 B7Event64 B9No event64 B11Event61 B12Event53 B15Event51 B19Event50 B21Event48

16 16 SPSS – Example dataset

17 17 SPSS – Example dataset: Labelled

18 18 SPSS – Calculating the Time Transform  Compute Variable… Calculating the Time in days. COMPUTE Time = DATEDIFF(LastDate,StartDate,"Days"). EXECUTE.

19 19 Info: Creating new variables in SPSS 1)From the menus select ‘Transform’  ‘Compute…’. 2)Enter the name of the new variable that you want to create into the ‘Target Variable:’ box. 3)Enter the formula for the new variable into the ‘Numeric Expression’ box. ● In this case we just want to create the difference between two date variables. To do this we need to make use of the date functions. Select ‘Date Arithmetic’ and then ‘Datediff’ from the boxes on the right. Then we need to replace the question marks with the relevant information as indicated by the function help in the middle of the window. In this case ‘DATEDIFF(LastDate,StartDate,"Days")’ was entered in the ‘Numeric Expression’ box. 4)Finally click ‘OK’ to produce the new variable or ‘Paste’ to add the syntax for this into your syntax file.

20 20 SPSS – Example dataset: Complete

21 Practical Questions Survival Analysis Question 1

22 22 Practical: Download & Setup From the course webpage download the two SPSS datasets that will be used for the practical's by clicking the right mouse button on the file name and selecting Save Target As. The two datasets are: –Survival_Ex1.sav (The example dataset used in the slides) –BC_Survival.sav (A dataset on Breast cancer survival: Data are from the Mayo clinic) Open up both of the datasets in SPSS. 1)Calculate the Time variable for the Survival_Ex1.sav dataset.

23 Life Table Analysis

24 24 Life table analysis The simplest form of survival analysis –Generally the quickest to do by hand –Split the time variable into X categories –One set of calculations for each time category –Most easily done in a table structure, hence the name

25 25 For Each time category: –No. Entering: Subjects entering ( NE ) –No. withdrawing : Subjects withdrawing ( NW ) –At risk: –Events: Number of events (Number of failures) Theory: Life table analysis

26 26 Theory: Life table analysis Proportion surviving at time point i. Cumulative proportion at time point i (current) Cumulative proportion at time point i-1 (previous) AR Failures.No For Each time category: –Proportion failing: –Proportion surviving: –Cumulative Survival:

27 27 Theory: Life table analysis IntervalEnteringWithdrew At risk EventsFailingSurviving Cum. Survival 0 to <1080810.1250.875 10 to <2072630.5 0.438 20 to <3020210.5 0.219 30 to <40110.50010.219 Group A life table 8 – 0/2 = 8 7 – 2/2 = 6 1/8 = 0.125 1 – 0.125 = 0.875 3/6 = 0.5 1 – 0.5 = 0.5 0.5 x 0.875 = 0.438

28 28 Life table analysis IntervalEnteringWithdrew At risk EventsFailingSurviving Cum. Survival 0 to <1080810.1250.875 10 to <2072630.5 0.438 20 to <3020210.5 0.219 30 to <40110.50010.219 Group A life table

29 29 SPSS – Life table analysis Analyze  Survival  Life Tables…

30 30 SPSS – Life table analysis * Calculating the life table. SURVIVAL TABLE=Time BY Group(1 2) /INTERVAL=THRU 40 BY 10 /STATUS=Status(1) /PRINT=TABLE.

31 31 Info: Life table analysis in SPSS 1)From the menus select ‘Analyze’  ‘Survival’  ‘Life Tables…’. 2)Put the variable containing the time into the ‘Time:’ box. Decide on the period of time to group together and put this into the ‘by’ box of the ‘Display Time Intervals’ box. The first value to go into the ‘Display Time Intervals’ box has to be a multiple of the value in the ‘by’ box as well as being greater than the longest time recorded in your dataset. 3)Put the categorical variable, that indicates whether a case had the event of interest or not into the ‘Status:’ box. Then click the ‘Define Event…’ button and enter the single value or range of values that all indicate that the event occurred. Click ‘Continue’. 4)If you want separate results for each level of a categorical variable then put this variable into the ‘Factor:’ box. Click the ‘Define Range…’ box and then enter the numeric codes for the minimum and maximum of the groups that you want to compare. Click ‘Continue’. 5)Finally click ‘OK’ to produce the test results or ‘Paste’ to add the syntax for this into your syntax file.

32 32 SPSS – Life table analysis : Output The same values as were calculated by hand

33 Practical Questions Survival Analysis Question ?2? and 3

34 34 Practical Questions 2)Calculate the Life table values for Group B from the example dataset by hand, using the skeleton table below: IntervalEnteringWithdrew At risk EventsFailingSurviving Cum. Survival 0 to <10 10 to <20 20 to <30

35 35 Practical Questions The file BC_Survival.sav contains data on 1207 women who were diagnosed with breast cancer. 3)Produce a Life table for this data, separating those women for whom the cancer had infected the lymph nodes from those for whom it had not (ln_yesno). Split the survival time into yearly periods.

36 36 2.The life table for Group B should look like this: Practical Solutions IntervalEnteringWithdrew At risk EventsFailingSurviving Cum. Survival 0 to <10817.520.2670.733 10 to <2050540.80.20.147 20 to <3010111.000

37 37 Practical Solutions: Instructions 3.To produce the Life table you will need syntax similar to the following: * Producing the Life table. SURVIVAL TABLE=time BY ln_yesno(0 1) /INTERVAL=THRU 144 BY 12 /STATUS=status(1) /PRINT=TABLE.

38 38 Practical Solutions: Output

39 39 Practical Solutions: Output

40 Kaplan-Meier

41 41 Kaplan-Meier Rather than categorising, we can estimate the survival function directly from the continuous survival times. Imagine creating a life table so that each time interval contains exactly one case. Multiplying these survival probabilities across the intervals gives what is known as the Kaplan-Meier product limit estimator.

42 42 SPSS – Kaplan-Meier Analyze  Survival  Kaplan-Meier… (Just looking at Group A) There is a filter in place to limit the results to those from Group A alone.

43 43 SPSS – Kaplan-Meier * KM plot for just Group A. KM Time /STATUS=Status(1) /PRINT TABLE MEAN /PLOT SURVIVAL.

44 44 Info: Kaplan-Meier in SPSS 1)From the menus select ‘Analyze’  ‘Survival’  ‘Kaplan- Meier…’. 2)Put the variable containing the time into the ‘Time:’ box. 3)Put the categorical variable, that indicates whether a case had the event of interest or not into the ‘Status:’ box. Then click the ‘Define Event…’ button and enter the single value or range of values that all indicate that the event occurred. Click ‘Continue’. 4)If you want separate curves and results for each level of a categorical variable then put this variable into the ‘Factor:’ box. 5)Click the ‘Options’ button and tick the ‘Survival’ option in the ‘Plots’ box. Click ‘Continue’. 6)Finally click ‘OK’ to produce the test results or ‘Paste’ to add the syntax for this into your syntax file.

45 45 SPSS – Kaplan-Meier: Output - values Calculated proportion with no event at each time point Information for each individual subject in order of length of follow-up Total number of events

46 46 SPSS – Kaplan-Meier: Output - plot

47 47 SPSS – Kaplan-Meier: Output - plot Can also mark where censored observations occur (not advisable for large datasets)

48 48 SPSS – Kaplan-Meier The last few plots are not from SPSS but come from another statistical package: Stata. The default KM plot from SPSS (shown here) is ok but generally needs a bit of tidying up within the SPSS graph editor. As you can see the plot does not automatically start from the top left corner (100% survival at time 0). It starts from the time of the first event, which is not ideal. You may also notice the time axis (x axis) does not start from 0 although this is easily altered.

49 49 Log-rank test Allows for comparison between groups. Possible to compute by hand (based on Chi-square). ‘Just another option’ when using a Statistics package. Other options for comparison include the Breslow and Tarone-Ware tests. H 0 : No difference between the groups. H 1 : The groups are different.

50 50 SPSS – Log-rank test Having removed the filter, but leaving the other options the same as the previous KM setup you only need to add a Factor variable and then select another option for the Log rank test. * Comparative KM plot with log-rank test. KM Time BY Group /STATUS=Status(1) /PRINT TABLE MEAN /PLOT SURVIVAL /TEST LOGRANK /COMPARE OVERALL POOLED.

51 51 Info: K-M and log rank tests in SPSS 1)Follow the information sheet on producing a Kaplan-Meier curve, but stop after point 5. 2)The log rank test will compare the levels of the categorical variable that is put into the ‘Factor:’ box. As such it is unavailable when no such variable has been specified. 3)Once a variable is in the ‘Factor:’ box, click on the ‘Compare Factor…’ button. Tick the option for the ‘Log Rank’ test in the ‘Test Statistics’ box. Click ‘Continue’. 4)Finally click ‘OK’ to produce the test results or ‘Paste’ to add the syntax for this into your syntax file.

52 52 SPSS – Log-rank test: Output The KM plot is now split into each of the levels of the categorical variable (2 groups in this case) The log rank test here shows no significant difference between the groups (p=0.119)

53 53 SPSS – Kaplan-Meier: Presentation

54 54 SPSS – Kaplan-Meier: Presentation

55 Practical Questions Survival Analysis Question 4

56 56 Practical Questions The file BC_Survival.sav contains data on 1207 women who were diagnosed with breast cancer. 3)Produce a Kaplan-Meier curve for this data, separating those women for whom the cancer had infected the lymph nodes from those for whom it had not. Conduct a log-rank test to see if the survival of the two groups is significantly different. Edit the KM plot so that it would be able to ‘stand alone’ in a publication and comment on all of your results.

57 57 4.To produce a Kaplan-Meier curve and the log-rank test you will need syntax similar to the following (You will then need to customise the plot itself with the graph editor afterwards): There is clearly a significant difference between the two categories, with survival being better in the group without lymph node involvement (p<0.001). Practical Solutions: Instructions * Producing the KM plot. KM time BY ln_yesno /STATUS=status(1) /PRINT TABLE MEAN /PLOT SURVIVAL /TEST LOGRANK /COMPARE OVERALL POOLED.

58 58 Practical Solutions: Output

59 59 Practical Solutions: Output It can be seen that the mean survival times are: 124.92 (95% CI: 122.18 to 127.66) months for no involvement, 111.33 (95% CI: 105.44 to 117.23) months for nodal involvement. There are no median survival estimates as at no point over the duration do 50% of the subjects in either group experience an event.

60 60 Summary You should now: know when to apply survival methods; understand how to use the survival techniques in SPSS and the differences between them; be able to interpret life tables; be able to interpret Kaplan-Meier curves;

61 61 References Practical Statistics for medical research, D Altman: Chapter 13. Medical Statistics, B Kirkwood, J Stern: Chapter 26. An introduction to medical statistics, M Bland: Chapter 15.6. Survival analysis specific texts Kleinbaum D. G., Klein M., Survival Analysis: A Self-Learning Text, Springer-Verlag Publishers, 2005. Parmar M. K. B., Machin D., Survival analysis: a practical approach, Wiley, 1995.


Download ppt "Survival Analysis: An Introductory Course Scott Harris October 2009."

Similar presentations


Ads by Google