Presentation on theme: "Using Survival Analysis to analyze degree completion"— Presentation transcript:
1Using Survival Analysis to analyze degree completion Janice LoveUniversity of California, Los AngelesOffice of Academic Planning & BudgetCAIR 2014
2Agenda Survival Analysis History & Background Overview Survival Analysis example using SPSSResults of Survival Analysis
3Survival Analysis Background DefinitionA statistical method for studying the time to an event. The term “survival” suggests that the event of interest is death but the technique is useful for other types of events.Alternative terminologyEvent analysis, Time series analysis, Time-to-event analysisSurvival analysis –studies involving time to death (biomedical sciences)Reliability theory / Reliability analysis (engineering)Duration analysis / Duration modeling (economics)Event history analysis (Sociology)UsesClinical trialsCohort studies
7Survival Analysis History Unknown – been around for a few hundred yearsTechniques developed in medical / biological sciencesWorld War II –military vehicles (reliability and failure time analysis)The Kaplan-Meier Estimator was introduced with the publication of NONPARAMETRIC ESTIMATION FROM INCOMPLETE OBSERVATIONS – E. L. Kaplan / Paul Meier, 1958Cited 34,000 times as of 2011
8Survival analysis - Overview A set of statistical methods where the outcome variable is the time until the occurrence of an event of interestFollows cohort over specified time period with focus on an eventUseful when the rate of the occurrence of the event varies over timeDiffers from other statistical methods: handles censored data (the withdrawal of individuals from the study)Censored observations :Individuals who have not experienced “the event” by the end of the studyRight censoringStudy participant can’t be locatedor lives beyond the end of the studyor drop outs before the study is completedor is still enrolledAn observation with incomplete informationDon’t have to handle these individuals as “missing”Do have to follow rules with respect to censored data# of censored should be small relative to non-censoredCensored and non-censored population should be similar (Kaplan-Meier)
10Survival analysis - Censoring Consequences of mishandling or ignoring censored data:ExampleStudent cohort, N = 50, event of interest = GraduationStill enrolled at the end of the study, N = 6No longer enrolled but did not graduate, N = 4Options:Code all 10 as missingCode 4 as missing, 6 as graduated as of study endConsequences:Mean time to degree is over or understatedselection bias riskIgnoring censored records completely or arbitrarily assigning event dates introduces bias into the resultsInclusion of the censored data produces less bias. Newell/Nyun 2011
11Survival analysis – handling censored data Two methods to produce the cumulative probability of survival that the survival graph is based upon:SPSS Life Table: (Each time period) the effective size of the cohort is reduced by ½ of the censored groupKaplan-Meier Survival Table: The survival probability estimate for each time period, except the first, is a compound conditional probability
12Survival analysis - Overview Data required for analysis:Clearly defined event: (death, onset of illness, recovery from illness, marriage, birth, mechanical failure, success, job loss, employment, graduation).Terminal eventEvent status (1 = event occurred, 0 = event did not occur)Time variable = Time measured from the entry of a subject into the study until the defined event. Months, terms, days, years, seconds.Covariates:To determine if different groups have different survival timesGender, age, ethnicity, GPA, treatment, interventionRegression models
13Survival analysis – SPSS Data layout Basic student dataTime variable – terms enrolledEvent status – graduation statusBinary or dummy variablesCensored indicatorGroup into categories
14Cohort DescriptionUndergraduates, one divisionFall 2006, Fall 2007 entering freshmen, N = 884Respondents to 2008 UCUES* surveyFreshmen admits (transfers excluded)1st term gpa >= 3.0Censored = 10 or 1.1%Explanatory variables available: gender, URM status, domestic-foreign status, Pell Grant recipient status, hours worked (survey), double/triple major* UCUES = University of California Undergraduate Survey
16Sample Data – Working in SPSS AnalyzeSurvivalLife Tables
17Survival Analysis – Life Table produced by SPSS primary output of the survival analysis procedureIntervals = terms. count is from admit termCount of still enrolled students at start of term
18Survival Analysis – Life Table produced by SPSS primary output of the survival analysis procedure# exposed to risk: # entering interval minus ½ censored# terminal events = # graduatedProbability Density = Estimated probability of graduating in interval# withdrawing during interval = censoredProportion Terminating: # Terminal events ÷ # exposed to risk: example Term 10 = 38 ÷ = .05Hazard Rate = Instantaneous failure rate. % chance of graduating given not having graduated at start of intervalCumul. Surviving = cumulative % of those surviving at end of interval = ( ) ÷ 884 = 0.90Proportion surviving = 1 – proportion terminating
19Survival Function Graph Produced by SPSS The proportion of the cohort that has survived (still enrolled) at any termEach step of the curve represents an eventThere is a 90% probability of surviving to the end of 10th term.Surviving = remaining enrolled!
20Function & One minus a function y = x2y = 1-x2y = x+1y = 1- (x+1)
21One Minus survival function There is a 10% probability of not-surviving to the end of 10th term.Not surviving = graduating!!
22Survival Analysis: SPSS, with Covariate Factor = Gender AnalyzeSurvivalLife TablesSURVIVAL TABLE=Terms_enrolled BY Gender(1 2)/INTERVAL=THRU 15 BY 1/STATUS=graduated(1)/PRINT=TABLE/PLOTS (SURVIVAL OMS)=Terms_enrolled BY Gender.
23Survival Analysis – SPSS, Life Table by gender Hazard Rate = Instantaneous failure rate. % chance of graduating given not having graduated at start of intervalMedian Survival Time = Time at which 50% of the original cohorts have not-survived (graduated)
24Survival Analysis: Hazard Ratio Hazard Ratio = ratio of the hazard rates.At 12th term, Hazard ratio = / 1.41 = 1.16, females are 16% more likely to graduate in the 12th term than malesAt 13th term, Hazard ratio = .41 / .62 = .66, females are 34% less likely to graduate in the 13th term than males
25Survival functions - SPSS Factor = gender Survival Pattern: SPSS will produce a different colored line for each of the factor’s values
26Survival Analysis: Kaplan-meier Method AssumptionsCensored individual – student who has not experienced the event (graduated) by the end of the study, e.g. they are no longer enrolledCheck for differences between censored and non-censored groupsCohorts should behave similarly – groups entering at different times should be similarAvoid “selection bias” in data
27Survival functions – SPSS, Kaplan_meier Factor = gender KM Terms_enrolled BY Gender/STATUS=graduated(1)/PRINT TABLE MEAN/PLOT SURVIVAL/TEST LOGRANK BRESLOW TARONE/COMPARE OVERALL POOLED.
28Kaplan-Meier Survival TAble This is an example of the survival table produced by the Kaplan-Meier procedure.Kaplan-Meier Survival Probability Estimate calculation example:Interval 4: Cumulative Proportion Surviving =# remaining / # at risk =[(# at start of interval - (# censored + # of events)]÷ [# at start of interval - # of events] =[(46 – (2 + 1)] ÷ [(46 – 2)] = 43 ÷ 44 = 0.978Interval 5: Cumulative Proportion Surviving =[(43 – (2 + 2)] ÷ (43 – 2) = 39 ÷ 41 = x = 0.930Kaplan-Meier Survival Table: The survival probability estimate for each time period, except the first, is a compound conditional probability
29In this way the fudging is kept conceptual, systematic, and automatic. Kaplan & Meier, 1958
31Kaplan-Meier output Log Rank weights all graduations equally Breslow gives more weight to earlier graduationsTaron-Ware is mixture of two
32Kaplan-Meier Results – Gender Null Hypothesis: Female Curve = Male CurveCurves not significantly different at p < .05
33Cox Regression (Proportional hazards) Measures influence of explanatory variablesMost used Survival analysis methodOnly time independent variables are appropriateAssumptions: Hazards are proportional
34Cox Regression, Checking proportional hazards assumption SPSSAnalyzeSurvivalCox RegressionRepeat for each factor!
35Cox Regression: Use log minus log function to check Proportional Hazards Assumption Do not use Cox Regression if the curves cross. This means the hazards are not proportional.
36Cox Regression Model – Example, Gender SPSSAnalyzeSurvivalCox Regression(move gender to Covariates box)
38Interpretation of SPSS Cox Regression Results: The reference category is female because I made that choice for this modelIt is not statistically significant at p < 0.05 that females and males have different survival curvesExp(B) = Hazard ratio: Female vs. Male The null hypothesis is that this ratio = 1.Hazard Ratio = eB = e-0.04 = 0.961
39Cox Regression Model Results: Pell Grant Recipients vs Cox Regression Model Results: Pell Grant Recipients vs. Non-Pell Grant RecipientPer Kaplan-Meier Estimation, Pell-Grant Student curve is not equal to non-Pell Grant students curve, highly significant at p < .001Tip: To edit the default chart, click on the chart until the “Chart Editor” opens
40Cox Regression Model Results: Pell Grant Recipients vs Cox Regression Model Results: Pell Grant Recipients vs. Non-Pell Grant RecipientPell Grant Recipients1. Work more hours than non-Pell Grant Recipients2. Pell Grant Recipients with similar GPAs to non-Pell Grant Recipients have attempted 10 more units
41Summary Survival Analysis provides the following: Handles both censored data and a time variableLife tableGraphical representation of trendsKaplan-Meier survival function estimatorSurvival comparison between 2 or more groupsRegression models – relationships between variables and survival timesp value is produced that indicates if difference between curves is significant or not
42Descriptive power of survival analysis : Terms Enrolled by 1st Term GPA – Using Survival Graph (K-M) to display dataAt end of 12th term:~ 34% probability of continued enrollment~ 9% probability of continued enrollment
43REFERENCESDunn, S. (2002). Kaplan-Meier Survival Probability Estimates. Retrieved fromHarris, S. (2009). Additional Regression techniques, October 2009, Retrieved fromNewell, J. & Hyun, S. (2011). Survival Probabilities With and Without the Use of Censored Failure Times Retrieved from https://www.uscupstate.edu/uploadedFiles/Academics/Undergraduate_Research/Reseach_Journal/2011_007_ARTICLE_NEWELL_HYUN.pdfSingh, R., Mukhopadhyay, K. (2011). Survival analysis in clinical trials: Basics and must know areas, Retrieved fromWiorkowski, J., Moses, A., & Redlinger, L. (2014).The Use of Survival Analysis to Compare Student Cohort Data, Presented at the 2014 Conference of the Association of Institutional ResearchContact Info:Thank you!