Distinguishing the Forest from the Trees University of Texas November 11, 2009 Richard Derrig, PhD, Opal Consulting www.opalconsulting.com Louise Francis,

Slides:



Advertisements
Similar presentations
The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
Advertisements

1 Statistical Modeling  To develop predictive Models by using sophisticated statistical techniques on large databases.
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
What is Statistical Modeling
ROC Statistics for the Lazy Machine Learner in All of Us Bradley Malin Lecture for COS Lab School of Computer Science Carnegie Mellon University 9/22/2005.
Classification and risk prediction
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Lecture 5 (Classification with Decision Trees)
Data Mining – Best Practices CAS 2008 Spring Meeting Quebec City, Canada Louise Francis, FCAS, MAAA
Three kinds of learning
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
EFFECTIVE PREDICTIVE MODELING- DATA,ANALYTICS AND PRACTICE MANAGEMENT Richard A. Derrig Ph.D. OPAL Consulting LLC Karthik Balakrishnan Ph.D. ISO Innovative.
Severity Distributions for GLMs: Gamma or Lognormal? Presented by Luyang Fu, Grange Mutual Richard Moncher, Bristol West 2004 CAS Spring Meeting Colorado.
Chapter 5 Data mining : A Closer Look.
Classification and Prediction: Regression Analysis
Ensemble Learning (2), Tree and Forest
1 Data Preparation Part 1: Exploratory Data Analysis & Data Cleaning, Missing Data CAS 2007 Ratemaking Seminar Louise Francis, FCAS Francis Analytics and.
Data Mining Techniques
Data Mining. 2 Models Created by Data Mining Linear Equations Rules Clusters Graphs Tree Structures Recurrent Patterns.
Next Generation Techniques: Trees, Network and Rules
Predictive Modeling Project Stephen P. D’Arcy Professor of Finance University of Illinois at Urbana-Champaign ORMIR Presentation October 26, 2005.
Mining Insurance Data to Promote Traffic Safety and Better Match Rates to Risk 2002 CAS Seminar on Ratemaking Greg Hayward.
Lecture Notes 4 Pruning Zhangxi Lin ISQS
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Free and Cheap Sources of External Data CAS 2007 Predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.
Classification and Prediction (cont.) Pertemuan 10 Matakuliah: M0614 / Data Mining & OLAP Tahun : Feb
Chapter 9 – Classification and Regression Trees
Bug Localization with Machine Learning Techniques Wujie Zheng
Chapter 7 Neural Networks in Data Mining Automatic Model Building (Machine Learning) Artificial Intelligence.
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Neural Networks Automatic Model Building (Machine Learning) Artificial Intelligence.
1 Learning Chapter 18 and Parts of Chapter 20 AI systems are complex and may have many parameters. It is impractical and often impossible to encode all.
Data Mining: Neural Network Applications by Louise Francis CAS Annual Meeting, Nov 11, 2002 Francis Analytics and Actuarial Data Mining, Inc.
2007 CAS Predictive Modeling Seminar Estimating Loss Costs at the Address Level Glenn Meyers ISO Innovative Analytics.
1 Combining GLM and data mining techniques Greg Taylor Taylor Fry Consulting Actuaries University of Melbourne University of New South Wales Casualty Actuarial.
Data Mining – Best Practices Part #2 Richard Derrig, PhD, Opal Consulting LLC CAS Spring Meeting June 16-18, 2008.
Recursive Binary Partitioning Old Dogs with New Tricks KDD Conference 2009 David J. Slate and Peter W. Frey.
Predictive Modeling CAS Reinsurance Seminar May 7, 2007 Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining,
CONFIDENTIAL1 Hidden Decision Trees to Design Predictive Scores – Application to Fraud Detection Vincent Granville, Ph.D. AnalyticBridge October 27, 2009.
Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.
If They Cheat, Can We Catch Them With Predictive Modeling Richard A. Derrig, PhD, CFE President Opal Consulting, LLC Senior Consultant Insurance Fraud.
Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Glenn Meyers ISO Innovative Analytics 2007 CAS Annual Meeting Estimating Loss Cost at the Address Level.
Neural Networks Demystified by Louise Francis Francis Analytics and Actuarial Data Mining, Inc.
Dealing with continuous variables and geographical information in non life insurance ratemaking Maxime Clijsters.
Classification Ensemble Methods 1
Practical GLM Analysis of Homeowners David Cummings State Farm Insurance Companies.
Lecture Notes for Chapter 4 Introduction to Data Mining
Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Data Analytics CMIS Short Course part II Day 1 Part 1: Introduction Sam Buttrey December 2015.
Dancing With Dirty Data: Methods for Exploring and Cleaning Data 2005 CAS Ratemaking Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial.
Regression Tree Ensembles Sergey Bakin. Problem Formulation §Training data set of N data points (x i,y i ), 1,…,N. §x are predictor variables (P-dimensional.
1 Data Preparation Part 1: Exploratory Data Analysis & Data Cleaning, Missing Data CAS Predictive Modeling Seminar Louise Francis Francis Analytics and.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Distinguishing the Forest from the Trees 2006 CAS Ratemaking Seminar Richard Derrig, PhD, Opal Consulting Louise Francis, FCAS,
© Deloitte Consulting, 2004 Other Modeling Techniques James Guszcza, FCAS, MAAA CAS Predictive Modeling Seminar Chicago October, 2004.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
BY International School of Engineering {We Are Applied Engineering} Disclaimer: Some of the Images and content have been taken from multiple online sources.
Classification Tree Interaction Detection. Use of decision trees Segmentation Stratification Prediction Data reduction and variable screening Interaction.
Chapter 7. Classification and Prediction
KAIR 2013 Nov 7, 2013 A Data Driven Analytic Strategy for Increasing Yield and Retention at Western Kentucky University Matt Bogard Office of Institutional.
Data Mining CAS 2004 Ratemaking Seminar Philadelphia, Pa.
Dimension Reduction in Workers Compensation
Trees, bagging, boosting, and stacking
Boosting and Additive Trees
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

Distinguishing the Forest from the Trees University of Texas November 11, 2009 Richard Derrig, PhD, Opal Consulting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.

Data Mining Data Mining, also known as Knowledge-Discovery in Databases (KDD), is the process of automatically searching large volumes of data for patterns. In order to achieve this, data mining uses computational techniques from statistics, machine learning and pattern recognition.

A Casualty Actuary’s Perspective on Data Modeling The Stone Age: 1914 – … Simple deterministic methods Use of blunt instruments: the analytical analog of bows and arrows Often ad-hoc Slice and dice data Based on empirical data – little use of parametric models The Pre – Industrial age: … Fit probability distribution to model tails Simulation models and numerical methods for variability and uncertainty analysis Focus is on underwriting, not claims The Industrial Age – 1985 … Begin to use computer catastrophe models The 20 th Century – 1990… European actuaries begin to use GLMs The Computer Age 1996… Begin to discuss data mining at conferences At end of 20 st century, large consulting firms starts to build a data mining practice The Current era – A mixture of above In personal lines, modeling the rule rather than the exception Often GLM based, though GLMs evolving to GAMs Commercial lines beginning to embrace modeling

Why Predictive Modeling? Better use of data than traditional methods Advanced methods for dealing with messy data now available Decision Trees a popular form of data mining

Real Life Insurance Application – The “Boris Gang”

Desirable Features of a Data Mining Method: Any nonlinear relationship can be approximated A method that works when the form of the nonlinearity is unknown The effect of interactions can be easily determined and incorporated into the model The method generalizes well on out-of sample data

Nonlinear Example Data Provider 2 Bill (Binned) Avg Provider 2 Bill Avg Total PaidPercent IME Zero-9,0636% 1 – ,7618% 251 – ,7269% 501 – 1, ,46910% 1,001 – 1,5001,24314,99813% 1,501 – 2,5001,91517,28914% 2,501 – 5,0003,30023,99415% 5,001 – 10,0006,72047,72815% 10, ,35083,26115% All Claims54511,2248%

An Insurance Nonlinear Function: Provider Bill vs. Probability of Independent Medical Exam

The Fraud Surrogates used as Dependent Variables Independent Medical Exam (IME) requested; IME successful Special Investigation Unit (SIU) referral; SIU successful Data: Detailed Auto Injury Claim Database for Massachusetts Accident Years ( )

Predictor Variables Claim file variables Provider bill, Provider type Injury Derived from claim file variables Attorneys per zip code Docs per zip code Using external data Average household income Households per zip

Predictor Variables Put the two tables here

Decision Trees In decision theory (for example risk management), a decision tree is a graph of decisions and their possible consequences, (including resource costs and risks) used to create a plan to reach a goal. Decision trees are constructed in order to help with making decisions. A decision tree is a special form of tree structure.

The Classic Reference on Trees Brieman, Friedman Olshen and Stone, 1993

Regression Trees Tree-based modeling for continuous target variable most intuitively appropriate method for loss ratio analysis Find split that produces greatest separation in ∑  y – E(y)  2 i.e.: find nodes with minimal within variance and therefore greatest between variance like credibility theory i.e.: find nodes with minimal within variance Every record in a node is assigned the same expectation  model is a step function

CART Example of Parent and Children Nodes Total Paid as a Function of Provider 2 Bill

Decision Trees Cont. After splitting data on first node, then Go to each child node Perform same process at each node, i.e. Examine variables one at a time for best split Select best variable to split on Can split on different variables at the different child nodes

Classification Trees: Categorical Dependent Find the split that maximizes the difference in the probability of being in the target class (IME Requested) Find split that minimizes impurity, or number of records not in the dominant class for the node (To Many No IME)

Continue Splitting to get more homogenous groups at terminal nodes

CART Step Function Predictions with One Numeric Predictor Total Paid as a Function of Provider 2 Bill

Recursive Partitioning: Categorical Variables

Different Kinds of Decision Trees Single Trees (CART, CHAID) Ensemble Trees, a more recent development (TREENET, RANDOM FOREST) A composite or weighted average of many trees (perhaps 100 or more) There are many methods to fit the trees and prevent overfitting Boosting: Iminer Ensemble and Treenet Bagging: Random Forest

The Methods and Software Evaluated 1) TREENET5) Iminer Ensemble 2) Iminer Tree6) Random Forest 3) SPLUS Tree7) Naïve Bayes (Baseline) 4) CART8) Logistic (Baseline)

Ensemble Prediction of Total Paid

Ensemble Prediction of IME Requested

Bayes Predicted Probability IME Requested vs. Quintile of Provider 2 Bill

Naïve Bayes Predicted IME vs. Provider 2 Bill

The Fraud Surrogates used as Dependent Variables Independent Medical Exam (IME) requested Special Investigation Unit (SIU) referral IME successful SIU successful DATA: Detailed Auto Injury Claim Database for Massachusetts Accident Years ( )

S-Plus Tree Distribution of Predicted Score

One Goodness of Fit Measure: Confusion Matrix

Specificity/Sensitivity Sensitivity: The proportion of true positives that are identified by model. Specificity: The proportion of True negatives correctly identified by the model

Results for IME Requested

Results for IME Favorable

Results for SIU Referral

Results for SIU Favorable

TREENET ROC Curve – IME AUROC = 0.701

TREENET ROC Curve – SIU AUROC = 0.677

Logistic ROC Curve – IME AUROC = 0.643

Logistic ROC Curve – SIU AUROC = 0.612

Ranking of Methods/Software – IME Requested

Ranking of Methods/Software – SIU Requested

Ranking of Methods/Software – 1 st Two Surrogates

Ranking of Methods/Software – Last Two Surrogates

Plot of AUROC for SIU vs. IME Decision

Plot of AUROC for SIU vs. IME Favorable

Plot of AUROC for SIU vs. IME Decision

Plot of AUROC for SIU vs. IME Favorable – Tree Methods Only

References Francis and Derrig “Distinguishing the Forest From the Trees”, Variance, 2008 Francis, “Neural Networks Demystified”, CAS Forum, 2001 Francis, “Is MARS Better than Neural Networks?”, CAS Forum, 2003 All can be found at