Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Analysis Case Study – Auto Claim Assignment

Similar presentations


Presentation on theme: "Data Analysis Case Study – Auto Claim Assignment"— Presentation transcript:

1 Data Analysis Case Study – Auto Claim Assignment
Ming Sun, American Family Insurance

2 About Myself 2014 - present 1999 -2005 2005 -2014
Application Development J2EE Web App Java Batch Processing Solution Architecture Big Data Analytics Mobile APP Application Integrations Data Warehouse Integrations Data Science Engineering Repeatable Data Science Pipelines Exploratory Data Analysis Data Lake Design Technology Incubation

3 Analytical Solution Life-cycle
Start Here Current State Bottomline CBA Topline Benefits Data Sources Containerization CI/CD Monitor Pipelines Model Registry Solution Deployment Problem Definition Model Techniques Model Performance Model Pipelines Data Domains Data Quality Data Design Data Blend Data Pipelines Model Development Data Preparation

4 Problem Definition Scope – Determine if a damaged vehicle should be totaled or repaired at the early stage of auto claims Current State Point Based Model Accuracy < 80% Bottom Line CBA Annual savings amount 10% lift ≈ $500k-$2M Top Line Benefits Impact to customer satisfaction

5 Problem Definition – Data Sources
3rd normal form DB Claim System – Old (DB2) Partial Data Claims Data Warehouse (DB2) Claim System – New (Oracle) Partial Data No Data 3rd Party Data (daily files)

6 Data Preparation – Data Domains
Handling Assignment (6 - 8 table) 3rd Party Loss Estimates (5 files) Initial Claim ( table) Customer Satisfaction (2 files) Code Description 10+ Table Total Loss Workflow (2 - 4 table) Salvage Info (2 table)

7 Data Preparation – Grain/Quality/Blend
The grain of blended dataset - Vehicle Current snapshot of all closed auto collision claims Identify keys to blend claims, 3rd party estimates, and customer satisfaction Profile the blended dataset: record counts, missing values, column value distribution, correlation, etc. This is where the 60% project time is spent

8 Problem Definition Analysis
Current Process: Vehicle Questionnaire Number of questions: 17  12 Questions not answered > 80% Assignment Accuracy ≈ 80% Assigned Repairable, actual Total Loss ≈ 2x % Assigned Total Loss, actual Repairable ≈ x % Mis-assigned Claim Costs Assigned Repairable, actual Total Loss ≈ $ 3y per claim Assigned Total Loss, actual Repairable ≈ $ y per claim

9 Customer Satisfaction Impact Analysis
5 satisfaction score buckets with 5 being the most satisfied False Positives have the worst impact, followed by False Negatives Customers are happy with True Negatives

10 Model Development Winner – Logistic Regression Models
Misclassification Rate ROC Random Forest 0.136 0.90 Logistic Regression 0.145 0.89 Comparison Category Which Model is Better Technical Performance Random Forest Implementation Cost Logistic Regression (200 vs 1000 hours) Annual Saving Forecast tie The Random Forest model out performs the Points Model from a model performance standpoint. The forecasted annuals savings from these two models are very similar. The time to integrate the Points Model into the claim system is much shorter than the Random Forest model Winner – Logistic Regression

11 Model Development – Cont’d
low Scores Repairable Cutoff Point Manual Review Total Loss Cutoff Point high

12 Solution Deployment Simplified Vehicle Questionnaire Questions: 12  8
Logistic Regression Points Assignment Claim System - New UI Got rid of the questions that cannot be answered easily. Simplified Vehicle Questionnaire Questions: 12  8 Answers: Y/N  List of Choices

13 Takeaways Data analysis is critical throughout
Keep the data scope reasonable Deep knowledge of business process and data Ease of implementation over model techniques Be conservative when estimating savings Pilot the solution first for 3-6 months to test It is a team effort (analysts, engineers, scientists)

14 Parting Thought – Data Preparation
Most time consuming work Tedious and not glamourous Foundational work – Data Lake Venerable of being the scapegoat


Download ppt "Data Analysis Case Study – Auto Claim Assignment"

Similar presentations


Ads by Google