What is Survival Model and why it is important?

Slides:



Advertisements
Similar presentations
Dummy Dependent variable Models
Advertisements

Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
Correlation and regression
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
HSRP 734: Advanced Statistical Methods July 24, 2008.
Analysis of frequency counts with Chi square
Chapter 11 Survival Analysis Part 3. 2 Considering Interactions Adapted from "Anderson" leukemia data as presented in Survival Analysis: A Self-Learning.
PH6415 Review Questions. 2 Question 1 A journal article reports a 95%CI for the relative risk (RR) of an event (treatment versus control as (0.55, 0.97).
Lecture 5: Simple Linear Regression
1 CHAPTER M4 Cost Behavior © 2007 Pearson Custom Publishing.
Inference for regression - Simple linear regression
Chapter 13: Inference in Regression
Correlation and Linear Regression
Overall agenda Part 1 and 2  Part 1: Basic statistical concepts and descriptive statistics summarizing and visualising data describing data -measures.
Simple Linear Regression
Dr Laura Bonnett Department of Biostatistics. UNDERSTANDING SURVIVAL ANALYSIS.
STA Lecture 161 STA 291 Lecture 16 Normal distributions: ( mean and SD ) use table or web page. The sampling distribution of and are both (approximately)
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
1 1 Slide © 2016 Cengage Learning. All Rights Reserved. The equation that describes how the dependent variable y is related to the independent variables.
Logistic Regression.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Measures of Dispersion CUMULATIVE FREQUENCIES INTER-QUARTILE RANGE RANGE MEAN DEVIATION VARIANCE and STANDARD DEVIATION STATISTICS: DESCRIBING VARIABILITY.
2007 CAS Predictive Modeling Seminar Estimating Loss Costs at the Address Level Glenn Meyers ISO Innovative Analytics.
HSRP 734: Advanced Statistical Methods July 17, 2008.
Managerial Economics Demand Estimation & Forecasting.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
10B11PD311 Economics REGRESSION ANALYSIS. 10B11PD311 Economics Regression Techniques and Demand Estimation Some important questions before a firm are.
The Anatomy of Household Debt Build Up: What Are the Implications for the Financial Stability in Croatia? Ivana Herceg and Vedran Šošić* *Views expressed.
Simple Linear Regression. The term linear regression implies that  Y|x is linearly related to x by the population regression equation  Y|x =  +  x.
Statistical test for Non continuous variables. Dr L.M.M. Nunn.
Pro gradu –thesis Tuija Hevonkorpi.  Basic of survival analysis  Weibull model  Frailty models  Accelerated failure time model  Case study.
Chapter 13 Multiple Regression
Glenn Meyers ISO Innovative Analytics 2007 CAS Annual Meeting Estimating Loss Cost at the Address Level.
1 1 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Run length and the Predictability of Stock Price Reversals Juan Yao Graham Partington Max Stevenson Finance Discipline, University of Sydney.
LESSON 6: REGRESSION 2/21/12 EDUC 502: Introduction to Statistics.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
X Treatment population Control population 0 Examples: Drug vs. Placebo, Drugs vs. Surgery, New Tx vs. Standard Tx  Let X = decrease (–) in cholesterol.
Nonparametric Statistics
Describing a Score’s Position within a Distribution Lesson 5.
Sampling Distributions Chapter 18. Sampling Distributions A parameter is a number that describes the population. In statistical practice, the value of.
LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.
Nonparametric Statistics
Chapter 4: Basic Estimation Techniques
INTRODUCTION AND DEFINITIONS
BINARY LOGISTIC REGRESSION
Supply Chain Management for Non Supply Chain Management Professionals
Chapter 7. Classification and Prediction
Lecture #26 Thursday, November 17, 2016 Textbook: 14.1 and 14.3
Regression Analysis Module 3.
Survival curves We know how to compute survival curves if everyone reaches the endpoint so there is no “censored” data. Survival at t = S(t) = number still.
Survival Analysis: From Square One to Square Two Yin Bun Cheung, Ph.D. Paul Yip, Ph.D. Readings.
PCB 3043L - General Ecology Data Analysis.
Chapter Six Normal Curves and Sampling Probability Distributions
Psychology 202a Advanced Psychological Statistics
John Loucks St. Edward’s University . SLIDES . BY.
QM222 Class 8 Section A1 Using categorical data in regression
Chapter 6 Predicting Future Performance
Advanced Quantitative Analysis
Statistics 103 Monday, July 10, 2017.
Statistics Branch of mathematics dealing with the collection, analysis, interpretation, presentation, and organization of data. Practice or science of.
Statistical Methods For Engineers
Nonparametric Statistics
Survival Analysis {Chapter 12}
Additional notes on random variables
Additional notes on random variables
I can determine the different sampling techniques used in real life.
Kaplan-Meier survival curves and the log rank test
Some Key Ingredients for Inferential Statistics
Presentation transcript:

What is Survival Model and why it is important?

18 26 Agenda Today What is Survival Model?? September Tuesday 26 What is Survival Model?? Survival Analysis Challenges Censoring: incomplete information Survival Analysis features Non Parametric Model Parametric and semi parametric Models Survival Model Example - BFSI Q&A Today December Thursday 18

Principal Data Scientist Principal Data scientist & Big Data Analytics professional with 10 + years of Advanced Analytics experience providing consultancy services across US, Canada, UK, Europe, Middle East & Australia for different LOBs (Auto Finance, Deposits, Credit Cards, Mortgage, Insurance) Director – Data Science, UpGrad Distinguished member of Leaders Excellence at Harvard Square & Advanced Analytics Expert for Experfy, Harvard Innovation Launch Lab Prior Experience with American Express, GE Money & WNS Masters in Applied Statistics & Informatics (IIT Bombay) speaker Madhukar Kumar Principal Data Scientist

What is Survival Model?? Survival Model is a method which actually predicts the month on month probability of a particular event to occur for a customer It is an important concept which not only tells who but also when the event would occur. Other Modelling techniques can answer the “who” part but not “when” is the event going to occur

Survival Analysis Challenges

Censoring: Incomplete information

Survival Analysis Features

Non Parametric Approach

Parametric and semi parametric models

Cox Proportional Hazard Regression

Post Acquisition Example – Mortgage Customer Survival Model Project Objective To predict the probability of survival of Post Modification Mortgage customers for next 12 months Problem Definition Mortgage customers who had trouble in repayment due to job loss, recession, illness etc. were offered modification in payment, term, rate and bank wanted to predict the probability of survival of those customers for next 12 months Methodology Benefits Delivered Censoring and non-normality of data (i.e. incomplete info for observation) poses the main challenge in using any traditional statistical modeling techniques Dependent variable = # months survived * Censor(0); Censor is a dummy variable(1/0) which =1 if the customer becomes 90+ DPD in 12 Months else =0 Univariate Analysis(categorical variables), A non parametric test Log Rank test of equality used and for (continuous variables), univariate Cox proportional hazard regression was used. Variables were picked if p < 0.2 Stepwise Cox proportional hazard regression was run for all significant numeric variables and dummies for categorical variables 12 month probability of survival generated for each mortgage customer! Survival Model was developed within 4 months timeframe Survival Model actually predicted the default (difference in Actual vs predicted was 2% overall) Customer Attrition reduced by 12% within 1 year timeframe resulting in revenue protection of USD $1.2 MM

Final Survival Models

The estimates of survival are accurate on the development and holdout samples The Average difference b/w Actual and Predicted in the development data is 0.19% The Average difference b/w Actual and Predicted in the development data is 0.15%

Model implementation Survival Equation: S(t) = S0(t) ^ exp(b0 + b1*x1 + b2*x2 + b3*x3 + …...+ bn*xn) The Model output will give the Final survival probability {S(t)} month over month for 12 months How to calculate the survival probability for a new population ?? Pick random 10 customers and put the variable values and survival probability into the above equation Calculate S0(t) for 12 months for all the customers You will realise that the S0(t) is constant for all customers at different time level Pick S0(t) for 1,2,...,12 months Use the above S0(t) and variables and parameter estimates to put into the survival equation You will have 12 different survival equation for 12 months

Thanks Question & Answers