Survival Analysis: An Introductory Course Scott Harris October 2009.

Slides:



Advertisements
Similar presentations
Surviving Survival Analysis
Advertisements

Survival Analysis. Key variable = time until some event time from treatment to death time for a fracture to heal time from surgery to relapse.
Basics of Biostatistics for Health Research Session 2 – February 14 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health.
Survival Analysis-1 In Survival Analysis the outcome of interest is time to an event In Survival Analysis the outcome of interest is time to an event The.
Analysis of variance (ANOVA)-the General Linear Model (GLM)
Chapter 8 The t Test for Independent Means Part 2: Oct. 15, 2013.
Introduction to SPSS Allen Risley Academic Technology Services, CSUSM
Intro to Factorial ANOVA
WINKS SDA Statistical Data Analysis (Windows Kwikstat) Getting Started Guide.
1 An Introduction to IBM SPSS PSY450 Experimental Psychology Dr. Dwight Hennessy.
Main Points to be Covered
Lecture 3 Survival analysis. Problem Do patients survive longer after treatment A than after treatment B? Possible solutions: –ANOVA on mean survival.
A Simple Guide to Using SPSS© for Windows
Introduction to Survival Analysis PROC LIFETEST and Survival Curves.
Main Points to be Covered Cumulative incidence using life table method Difference between cumulative incidence based on proportion of persons at risk and.
Measures of disease frequency (I). MEASURES OF DISEASE FREQUENCY Absolute measures of disease frequency: –Incidence –Prevalence –Odds Measures of association:
SW318 Social Work Statistics Slide 1 Using SPSS for Graphic Presentation  Various Graphics in SPSS  Pie chart  Bar chart  Histogram  Area chart 
SPSS 1: An Introduction to the Statistical Package SPSS Suzie Cro MRC Clinical Trials Unit.
Assumption of Homoscedasticity
Introduction to SPSS Short Courses Last created (Feb, 2008) Kentaka Aruga.
FEBRUARY, 2013 BY: ABDUL-RAUF A TRAINING WORKSHOP ON STATISTICAL AND PRESENTATIONAL SYSTEM SOFTWARE (SPSS) 18.0 WINDOWS.
Introduction to SPSS (For SPSS Version 16.0)
8/15/2015Slide 1 The only legitimate mathematical operation that we can use with a variable that we treat as categorical is to count the number of cases.
Survival analysis Brian Healy, PhD. Previous classes Regression Regression –Linear regression –Multiple regression –Logistic regression.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 10: Survival Curves Marshall University Genomics Core.
8/23/2015Slide 1 The introductory statement in the question indicates: The data set to use: GSS2000R.SAV The task to accomplish: a one-sample test of a.
SW388R7 Data Analysis & Computers II Slide 1 Assumption of Homoscedasticity Homoscedasticity (aka homogeneity or uniformity of variance) Transformations.
Tutor: Prof. A. Taleb-Bendiab Contact: Telephone: +44 (0) CMPDLLM002 Research Methods Lecture 9: Quantitative.
Essentials of survival analysis How to practice evidence based oncology European School of Oncology July 2004 Antwerp, Belgium Dr. Iztok Hozo Professor.
HSRP 734: Advanced Statistical Methods July 10, 2008.
Lecture 3 Survival analysis.
1 Introduction to medical survival analysis John Pearson Biostatistics consultant University of Otago Canterbury 7 October 2008.
Prevalence The presence (proportion) of disease or condition in a population (generally irrespective of the duration of the disease) Prevalence: Quantifies.
Using SPSS for Windows Part II Jie Chen Ph.D. Phone: /6/20151.
Hunter Valley Amateur Beekeepers Forum User Guide Guide shows sample screenshots with most relevant actions. Website is at
Setting Up an on-line Store Tutorial Using SmartStore.biz This Tutorial assumes you have downloaded the software from This Tutorial.
INTRODUCTION TO SURVIVAL ANALYSIS
Introduction to Survival Analysis Utah State University January 28, 2008 Bill Welbourn.
Introduction to SPSS. Object of the class About the windows in SPSS The basics of managing data files The basic analysis in SPSS.
Statistical Inference for more than two groups Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Within Subjects Analysis of Variance PowerPoint.
SW388R6 Data Analysis and Computers I Slide 1 Percentiles and Standard Scores Sample Percentile Homework Problem Solving the Percentile Problem with SPSS.
Mr. Magdi Morsi Statistician Department of Research and Studies, MOH
Lecture 5: The Natural History of Disease: Ways to Express Prognosis
01/20151 EPI 5344: Survival Analysis in Epidemiology Actuarial and Kaplan-Meier methods February 24, 2015 Dr. N. Birkett, School of Epidemiology, Public.
01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health.
1.Introduction to SPSS By: MHM. Nafas At HARDY ATI For HNDT Agriculture.
DTC Quantitative Methods Summary of some SPSS commands Weeks 1 & 2, January 2012.
Biostatistics Case Studies 2007 Peter D. Christenson Biostatistician Session 2: Aging and Survival.
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Selecting Cases PowerPoint Prepared by Alfred.
Some survival basics Developments from the Kaplan-Meier method October
Using SPSS Note: The use of another statistical package such as Minitab is similar to using SPSS.
INTRODUCTION TO CLINICAL RESEARCH Survival Analysis – Getting Started Karen Bandeen-Roche, Ph.D. July 20, 2010.
1 PEER Session 02/04/15. 2  Multiple good data management software options exist – quantitative (e.g., SPSS), qualitative (e.g, atlas.ti), mixed (e.g.,
02/20161 EPI 5344: Survival Analysis in Epidemiology Hazard March 8, 2016 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive Medicine,
Analyzing Data. Learning Objectives You will learn to: – Import from excel – Add, move, recode, label, and compute variables – Perform descriptive analyses.
Comparing Proportions & Analysing Categorical Data Scott Harris October 2009.
Additional Regression techniques Scott Harris October 2009.
Analysing continuous data Parametric versus Non-parametric methods Scott Harris October 2009.
IENG-385 Statistical Methods for Engineers SPSS (Statistical package for social science) LAB # 1 (An Introduction to SPSS)
Analysis of Variance (ANOVA) Scott Harris October 2009.
Practical Solutions Additional Regression techniques.
Practical Questions Survival Analysis. 2 Practical: Download & Setup From the course webpage download the two SPSS datasets that will be used for the.
SPSS For a Beginner CHAR By Adebisi A. Abdullateef
April 18 Intro to survival analysis Le 11.1 – 11.2
By Dr. Madhukar H. Dalvi Nagindas Khandwala college
DEPARTMENT OF COMPUTER SCIENCE
Statistical Inference for more than two groups
Additional Regression techniques
Kaplan-Meier survival curves and the log rank test
Presentation transcript:

Survival Analysis: An Introductory Course Scott Harris October 2009

2 Learning outcomes By the end of this session you should: know when to apply survival methods; understand how to use the survival techniques in SPSS and the differences between them; be able to produce and interpret life tables; be able to produce and interpret Kaplan-Meier curves;

3 Contents Introduction –When/why use survival analysis. –Types of survival/time to event data. Life table analysis –Producing life tables by hand. –Producing life tables in SPSS. Kaplan-Meier –Producing Kaplan-Meier plots in SPSS. –Comparison of Kaplan-Meier survival curves (Log- rank test) in SPSS.

4 Dataset 1: Typical survival dataset Not all survival times known (Limited follow-up) Cancer groupTime to deathDeath status Pancreatic39Deceased Breast? (>45)Alive Breast? (>68)Alive Pancreatic94Deceased Pancreatic67Deceased Breast352Deceased

5 Survival analysis? Potentially missing values –Lost to follow-up –Withdrew from study Limited duration of follow-up –Some patients still alive – yet to experience the event of interest (death) Comparative analysis –Survival analysis methods

6 Beijing, 100m final: AthleteCountryTimeStatus Usain BoltJAM9.69Finished Richard ThompsonTRI9.89Finished Walter DixUSA9.91Finished Churandy MartinaAHO9.93Finished Asafa PowellJAM9.95Finished Michael FraterJAM9.97Finished Marc BurnsTRI10.01Finished Darvis PattonUSA10.03Finished Dataset 2

7 Survival analysis? (Time to event) Potentially missing values –Disqualified –Injured Short duration of follow-up –Everyone who finishes will have a time Comparative Analysis (JAM vs. Other ) –Survival Methods possible but... –Independent samples t test (Normal) –Mann-Whitney test (Non parametric)

8 When to use survival methods? Time to event data –Duration between treatment and death –Time from admission to successful discharge from hospital –Time from starting a diet to losing 10 lbs. –Time from release to watching the new Harry Potter film The event may or may not happen: –Ever (some people will die in hospital before being discharged) –In the time period concerned (limited follow-up)

9 Censoring Censoring occurs when we have missing information. Left Censoring: Unclear on exact start of monitoring –Missing date of birth –Unknown date of starting treatment –Experiences event before inclusion in study Right censoring: Some individuals may not be observed for the full time to event –Loss to follow-up –Drop out –Termination of study / follow-up

10 Right censoring Study startEnd of follow-up Time (days) Subject Key Event Censored time Actual event may occur here but having stopped follow- up earlier this would be missed. No event but no more follow-up Earliest event

11 Right censoring – Staggered start Study startEnd of follow-up Time (days) Subject Key Event Censored time Actual event may occur here but having stopped follow- up earlier this would be missed. No event but no more follow-up Quickest event

The Example Dataset

13 SPSS – Survival time data In SPSS (as with other packages) we require the following two variables when dealing with survival time data: –A continuous time variable that measures the time until either the event or the individuals withdrawal (censoring). –A categorical variable that acts as an indicator for whether the subject experienced the event of interest or whether they did not and were censored.

14 Example dataset Time to event data for two groups (Group A and Group B): Coded 1 and 2 respectively. Time in days until event or until end of follow-up. Whether the individual has had the event of interest (‘No event’ and ‘Event’): Coded 0 and 1 respectively. The age of the individual at the start of the study.

15 Example dataset GroupTimeStatusAge A9Event65 A12No event61 A14Event57 A14Event55 A16No event50 A18Event52 A24Event51 A30No event50 GroupTimeStatusAge B3Event70 B7Event64 B9No event64 B11Event61 B12Event53 B15Event51 B19Event50 B21Event48

16 SPSS – Example dataset

17 SPSS – Example dataset: Labelled

18 SPSS – Calculating the Time Transform  Compute Variable… Calculating the Time in days. COMPUTE Time = DATEDIFF(LastDate,StartDate,"Days"). EXECUTE.

19 Info: Creating new variables in SPSS 1)From the menus select ‘Transform’  ‘Compute…’. 2)Enter the name of the new variable that you want to create into the ‘Target Variable:’ box. 3)Enter the formula for the new variable into the ‘Numeric Expression’ box. ● In this case we just want to create the difference between two date variables. To do this we need to make use of the date functions. Select ‘Date Arithmetic’ and then ‘Datediff’ from the boxes on the right. Then we need to replace the question marks with the relevant information as indicated by the function help in the middle of the window. In this case ‘DATEDIFF(LastDate,StartDate,"Days")’ was entered in the ‘Numeric Expression’ box. 4)Finally click ‘OK’ to produce the new variable or ‘Paste’ to add the syntax for this into your syntax file.

20 SPSS – Example dataset: Complete

Practical Questions Survival Analysis Question 1

22 Practical: Download & Setup From the course webpage download the two SPSS datasets that will be used for the practical's by clicking the right mouse button on the file name and selecting Save Target As. The two datasets are: –Survival_Ex1.sav (The example dataset used in the slides) –BC_Survival.sav (A dataset on Breast cancer survival: Data are from the Mayo clinic) Open up both of the datasets in SPSS. 1)Calculate the Time variable for the Survival_Ex1.sav dataset.

Life Table Analysis

24 Life table analysis The simplest form of survival analysis –Generally the quickest to do by hand –Split the time variable into X categories –One set of calculations for each time category –Most easily done in a table structure, hence the name

25 For Each time category: –No. Entering: Subjects entering ( NE ) –No. withdrawing : Subjects withdrawing ( NW ) –At risk: –Events: Number of events (Number of failures) Theory: Life table analysis

26 Theory: Life table analysis Proportion surviving at time point i. Cumulative proportion at time point i (current) Cumulative proportion at time point i-1 (previous) AR Failures.No For Each time category: –Proportion failing: –Proportion surviving: –Cumulative Survival:

27 Theory: Life table analysis IntervalEnteringWithdrew At risk EventsFailingSurviving Cum. Survival 0 to < to < to < to < Group A life table 8 – 0/2 = 8 7 – 2/2 = 6 1/8 = – = /6 = – 0.5 = x = 0.438

28 Life table analysis IntervalEnteringWithdrew At risk EventsFailingSurviving Cum. Survival 0 to < to < to < to < Group A life table

29 SPSS – Life table analysis Analyze  Survival  Life Tables…

30 SPSS – Life table analysis * Calculating the life table. SURVIVAL TABLE=Time BY Group(1 2) /INTERVAL=THRU 40 BY 10 /STATUS=Status(1) /PRINT=TABLE.

31 Info: Life table analysis in SPSS 1)From the menus select ‘Analyze’  ‘Survival’  ‘Life Tables…’. 2)Put the variable containing the time into the ‘Time:’ box. Decide on the period of time to group together and put this into the ‘by’ box of the ‘Display Time Intervals’ box. The first value to go into the ‘Display Time Intervals’ box has to be a multiple of the value in the ‘by’ box as well as being greater than the longest time recorded in your dataset. 3)Put the categorical variable, that indicates whether a case had the event of interest or not into the ‘Status:’ box. Then click the ‘Define Event…’ button and enter the single value or range of values that all indicate that the event occurred. Click ‘Continue’. 4)If you want separate results for each level of a categorical variable then put this variable into the ‘Factor:’ box. Click the ‘Define Range…’ box and then enter the numeric codes for the minimum and maximum of the groups that you want to compare. Click ‘Continue’. 5)Finally click ‘OK’ to produce the test results or ‘Paste’ to add the syntax for this into your syntax file.

32 SPSS – Life table analysis : Output The same values as were calculated by hand

Practical Questions Survival Analysis Question ?2? and 3

34 Practical Questions 2)Calculate the Life table values for Group B from the example dataset by hand, using the skeleton table below: IntervalEnteringWithdrew At risk EventsFailingSurviving Cum. Survival 0 to <10 10 to <20 20 to <30

35 Practical Questions The file BC_Survival.sav contains data on 1207 women who were diagnosed with breast cancer. 3)Produce a Life table for this data, separating those women for whom the cancer had infected the lymph nodes from those for whom it had not (ln_yesno). Split the survival time into yearly periods.

36 2.The life table for Group B should look like this: Practical Solutions IntervalEnteringWithdrew At risk EventsFailingSurviving Cum. Survival 0 to < to < to <

37 Practical Solutions: Instructions 3.To produce the Life table you will need syntax similar to the following: * Producing the Life table. SURVIVAL TABLE=time BY ln_yesno(0 1) /INTERVAL=THRU 144 BY 12 /STATUS=status(1) /PRINT=TABLE.

38 Practical Solutions: Output

39 Practical Solutions: Output

Kaplan-Meier

41 Kaplan-Meier Rather than categorising, we can estimate the survival function directly from the continuous survival times. Imagine creating a life table so that each time interval contains exactly one case. Multiplying these survival probabilities across the intervals gives what is known as the Kaplan-Meier product limit estimator.

42 SPSS – Kaplan-Meier Analyze  Survival  Kaplan-Meier… (Just looking at Group A) There is a filter in place to limit the results to those from Group A alone.

43 SPSS – Kaplan-Meier * KM plot for just Group A. KM Time /STATUS=Status(1) /PRINT TABLE MEAN /PLOT SURVIVAL.

44 Info: Kaplan-Meier in SPSS 1)From the menus select ‘Analyze’  ‘Survival’  ‘Kaplan- Meier…’. 2)Put the variable containing the time into the ‘Time:’ box. 3)Put the categorical variable, that indicates whether a case had the event of interest or not into the ‘Status:’ box. Then click the ‘Define Event…’ button and enter the single value or range of values that all indicate that the event occurred. Click ‘Continue’. 4)If you want separate curves and results for each level of a categorical variable then put this variable into the ‘Factor:’ box. 5)Click the ‘Options’ button and tick the ‘Survival’ option in the ‘Plots’ box. Click ‘Continue’. 6)Finally click ‘OK’ to produce the test results or ‘Paste’ to add the syntax for this into your syntax file.

45 SPSS – Kaplan-Meier: Output - values Calculated proportion with no event at each time point Information for each individual subject in order of length of follow-up Total number of events

46 SPSS – Kaplan-Meier: Output - plot

47 SPSS – Kaplan-Meier: Output - plot Can also mark where censored observations occur (not advisable for large datasets)

48 SPSS – Kaplan-Meier The last few plots are not from SPSS but come from another statistical package: Stata. The default KM plot from SPSS (shown here) is ok but generally needs a bit of tidying up within the SPSS graph editor. As you can see the plot does not automatically start from the top left corner (100% survival at time 0). It starts from the time of the first event, which is not ideal. You may also notice the time axis (x axis) does not start from 0 although this is easily altered.

49 Log-rank test Allows for comparison between groups. Possible to compute by hand (based on Chi-square). ‘Just another option’ when using a Statistics package. Other options for comparison include the Breslow and Tarone-Ware tests. H 0 : No difference between the groups. H 1 : The groups are different.

50 SPSS – Log-rank test Having removed the filter, but leaving the other options the same as the previous KM setup you only need to add a Factor variable and then select another option for the Log rank test. * Comparative KM plot with log-rank test. KM Time BY Group /STATUS=Status(1) /PRINT TABLE MEAN /PLOT SURVIVAL /TEST LOGRANK /COMPARE OVERALL POOLED.

51 Info: K-M and log rank tests in SPSS 1)Follow the information sheet on producing a Kaplan-Meier curve, but stop after point 5. 2)The log rank test will compare the levels of the categorical variable that is put into the ‘Factor:’ box. As such it is unavailable when no such variable has been specified. 3)Once a variable is in the ‘Factor:’ box, click on the ‘Compare Factor…’ button. Tick the option for the ‘Log Rank’ test in the ‘Test Statistics’ box. Click ‘Continue’. 4)Finally click ‘OK’ to produce the test results or ‘Paste’ to add the syntax for this into your syntax file.

52 SPSS – Log-rank test: Output The KM plot is now split into each of the levels of the categorical variable (2 groups in this case) The log rank test here shows no significant difference between the groups (p=0.119)

53 SPSS – Kaplan-Meier: Presentation

54 SPSS – Kaplan-Meier: Presentation

Practical Questions Survival Analysis Question 4

56 Practical Questions The file BC_Survival.sav contains data on 1207 women who were diagnosed with breast cancer. 3)Produce a Kaplan-Meier curve for this data, separating those women for whom the cancer had infected the lymph nodes from those for whom it had not. Conduct a log-rank test to see if the survival of the two groups is significantly different. Edit the KM plot so that it would be able to ‘stand alone’ in a publication and comment on all of your results.

57 4.To produce a Kaplan-Meier curve and the log-rank test you will need syntax similar to the following (You will then need to customise the plot itself with the graph editor afterwards): There is clearly a significant difference between the two categories, with survival being better in the group without lymph node involvement (p<0.001). Practical Solutions: Instructions * Producing the KM plot. KM time BY ln_yesno /STATUS=status(1) /PRINT TABLE MEAN /PLOT SURVIVAL /TEST LOGRANK /COMPARE OVERALL POOLED.

58 Practical Solutions: Output

59 Practical Solutions: Output It can be seen that the mean survival times are: (95% CI: to ) months for no involvement, (95% CI: to ) months for nodal involvement. There are no median survival estimates as at no point over the duration do 50% of the subjects in either group experience an event.

60 Summary You should now: know when to apply survival methods; understand how to use the survival techniques in SPSS and the differences between them; be able to interpret life tables; be able to interpret Kaplan-Meier curves;

61 References Practical Statistics for medical research, D Altman: Chapter 13. Medical Statistics, B Kirkwood, J Stern: Chapter 26. An introduction to medical statistics, M Bland: Chapter Survival analysis specific texts Kleinbaum D. G., Klein M., Survival Analysis: A Self-Learning Text, Springer-Verlag Publishers, Parmar M. K. B., Machin D., Survival analysis: a practical approach, Wiley, 1995.