Jennifer Lewis Priestley Presentation of “Assessment of Evaluation Methods for Prediction and Classification of Consumer Risk in the Credit Industry” co-authored.

Slides:

Advertisements

Similar presentations

The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke

Advertisements

Design of Experiments Lecture I

Linear Regression.

Brief introduction on Logistic Regression

Chapter 8 – Logistic Regression

Chapter 4: Linear Models for Classification

Bayesian Network Classifiers for Identifying the Slope of the customer Lifecycle of Long-Life Customers Authored by: Bart Baesens, Geert Vertraeten, Dirk.

Chapter 17 Overview of Multivariate Analysis Methods

1. Abstract 2 Introduction Related Work Conclusion References.

Data Mining: A Closer Look Chapter Data Mining Strategies.

x – independent variable (input)

Classification and risk prediction

Confidence Estimation for Machine Translation J. Blatz et.al, Coling 04 SSLI MTRG 11/17/2004 Takahiro Shinozaki.

Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

Statistical Methods Chichang Jou Tamkang University.

Neural Networks. R & G Chapter Feed-Forward Neural Networks otherwise known as The Multi-layer Perceptron or The Back-Propagation Neural Network.

Learning From Data Chichang Jou Tamkang University.

Data mining and statistical learning, lecture 5 Outline  Summary of regressions on correlated inputs  Ridge regression  PCR (principal components regression)

1 The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller MI – 25. oktober 2007 Kresten Toftgaard Andersen.

Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!

Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.

Rating Systems Vs Machine Learning on the context of sports George Kyriakides, Kyriacos Talattinis, George Stefanides Department of Applied Informatics,

How do we know whether a marker or model is any good? A discussion of some simple decision analytic methods Carrie Bennette on behalf of Andrew Vickers.

Data Mining: A Closer Look Chapter Data Mining Strategies 2.

Chapter 5 Data mining : A Closer Look.

Decision Tree Models in Data Mining

CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.

Assessment of Model Development Techniques and Evaluation Methods for Binary Classification in the Credit Industry DSI Conference Jennifer Lewis Priestley.

Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.

Overview of Data Mining Methods Data mining techniques What techniques do, examples, advantages & disadvantages.

Chapter 6 Regression Algorithms in Data Mining

Performance measurement. Must be careful what performance metric we use For example, say we have a NN classifier with 1 output unit, and we code ‘1 =

Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.

How do we know whether a marker or model is any good? A discussion of some simple decision analytic methods Carrie Bennette (on behalf of Andrew Vickers)

Using Neural Networks to Predict Claim Duration in the Presence of Right Censoring and Covariates David Speights Senior Research Statistician HNC Insurance.

Chapter 7 Neural Networks in Data Mining Automatic Model Building (Machine Learning) Artificial Intelligence.

Zhangxi Lin ISQS Texas Tech University Note: Most slides are from Decision Tree Modeling by SAS Lecture Notes 5 Auxiliary Uses of Trees.

Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.

April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.

Neural Networks Automatic Model Building (Machine Learning) Artificial Intelligence.

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.

Logistic Regression Database Marketing Instructor: N. Kumar.

LOGISTIC REGRESSION A statistical procedure to relate the probability of an event to explanatory variables Used in epidemiology to describe and evaluate.

April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.

Week 5: Logistic regression analysis Overview Questions from last week What is logistic regression analysis? The mathematical model Interpreting the β.

Discriminant Analysis Discriminant analysis is a technique for analyzing data when the criterion or dependent variable is categorical and the predictor.

Multiple Discriminant Analysis

Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.

Limited Dependent Variables Ciaran S. Phibbs. Limited Dependent Variables 0-1, small number of options, small counts, etc. 0-1, small number of options,

Multivariate Data Analysis Chapter 1 - Introduction.

Chapter1: Introduction Chapter2: Overview of Supervised Learning

Linear Methods for Classification Based on Chapter 4 of Hastie, Tibshirani, and Friedman David Madigan.

Linear Discriminant Analysis and Logistic Regression.

Logistic Regression Saed Sayad 1www.ismartsoft.com.

Financial Data mining and Tools CSCI 4333 Presentation Group 6 Date10th November 2003.

CHAPTER 12 FORECASTING. THE CONCEPTS A prediction of future events used for planning purpose Supply chain success, resources planning, scheduling, capacity.

2011 Data Mining Industrial & Information Systems Engineering Pilsung Kang Industrial & Information Systems Engineering Seoul National University of Science.

DEMAND FORECASTING & MARKET SEGMENTATION. Why demand forecasting?  Planning and scheduling production  Acquiring inputs  Making provision for finances.

Machine Learning Usman Roshan Dept. of Computer Science NJIT.

Logistic Regression: Regression with a Binary Dependent Variable.

Prepared by Fayes Salma.  Introduction: Financial Tasks  Data Mining process  Methods in Financial Data mining o Neural Network o Decision Tree  Trading.

Usman Roshan Dept. of Computer Science NJIT

Data Mining CAS 2004 Ratemaking Seminar Philadelphia, Pa.

Multivariate Analysis Lec 4

Introduction to Data Mining and Classification

Introduction to Logistic Regression

Prediction of in-hospital mortality after ruptured abdominal aortic aneurysm repair using an artificial neural network Eric S. Wise, MD, Kyle M. Hocking,

Roc curves By Vittoria Cozza, matr

Chapter 6 Logistic Regression: Regression with a Binary Dependent Variable Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall.

Presentation transcript:

Jennifer Lewis Priestley Presentation of “Assessment of Evaluation Methods for Prediction and Classification of Consumer Risk in the Credit Industry” co-authored with S. Nargundkar. (Accepted for publication as a chapter in "Neural Networks for Business Forecasting" by Peter Zhang, PhD (Ed))

Objectives This paper addresses the answers to two important questions: 1.Does model development technique improve classification accuracy? 2.How will model selection vary based upon the evaluation method used?

Objectives Discussion of Modeling Techniques Discussion of Model Evaluation Methods Empirical Example

Model Development Techniques Modeling plays an increasingly important role in CRM strategies: Target Marketing Response Models Risk Models Customer Behavioral Models Usage Models Attrition Models Activation Models Collections Recovery Models Product Planning Customer Acquisitio n Customer Acquisitio n Customer Management Customer Management Creating Value Creating Value Collection s/Recover y Other Models Segmentation Models Bankruptcy Models Fraud Models

Model Development Techniques Given that even minimal improvements in model classification accuracy can translate into significant savings or incremental revenue, many different modeling techniques are used in practice: Statistical Techniques Linear Discriminant Analysis Logistic Analysis Multiple Regression Analysis Non-Statistical Techniques Neural Networks Cluster Analysis Decision Trees

Model Evaluation Methods But, developing the model is really only half the problem. How do you then determine which model is best?

Model Evaluation Methods In the context of binary classification (one of the most common objectives in CRM modeling), one of four outcomes is possible: 1. True positive (a “good” credit risk is identified as “good”) 2. False positive (a “bad” credit risk is identified as “good”) 3. True negative (a “bad” credit risk is identified as “bad”) 4. False negative (a “good” credit risk is identified as “bad”)

Model Evaluation Methods If all of these outcomes, specifically the errors, have the same associated costs, then a simple global classification rate is a highly appropriate evaluation method: True GoodTrue BadTotal Predicted Good Predicted Bad Total Classification Rate = 75% (( )/1000)

Model Evaluation Methods The global classification method is the most commonly used, but fails when the costs of the misclassification errors are different (Type 1 vs Type 2 errors) - Model 1 results: Global Classification Rate = 75% False Positive Rate = 5% False Negative Rate = 20% Model 2 results: Global Classification Rate = 80% False Positive Rate = 15% False Negative Rate = 5% What if the cost of a false positive was great, and the cost of a false negative was negligible? What if it was the other way around?

Model Evaluation Methods If the misclassification error costs are understood with some certainty, a cost function could be used to evaluate the best model: Loss=π 0 f 0 c 0 +π 1 f 1 c 1 Where, π i is the probability that an element comes from class i, (prior probability), f i is the probability that an element will be misclassified in i class, and c i is the cost associated with that misclassification error.

Model Evaluation Methods An evaluation model that uses the same conceptual foundation as the global classification rate is the Kolmorgorov-Smirnov Test:

Model Evaluation Methods What if you don’t have ANY information regarding misclassification error costs…or…the costs are in the eye of the beholder?

Model Evaluation Methods The area under the ROC (Receiver Operating Characteristics) Curve is an option: 1-Specificity (False Positives) Sensitivity (True Positives) θ=.5 θ=1.5<θ<1

Empirical Example So, given this background, the guiding questions of our research were – 1. Does model development technique impact prediction accuracy? 2. How will model selection vary with the evaluation method used?

Empirical Example We elected to evaluate these questions using a large data set from a pool of car loan applicants. The data set included: 14,042 US applicants for car loans between June 1, 1998 and June 30, Of these applicants, 9442 were considered to have been “good” and 4600 were considered to be “bad” as of December 31, variables, split into two groups – Transaction variables (miles on the vehicle, selling price, age of vehicle, etc.) Applicant variables (bankruptcies, balances on other loans, number of revolving trades, etc.)

Empirical Example The LDA and Logistic models were developed using SAS 8.2, while the Neural Network models were developed using Backpack® 4.0. Because there is no accepted guidelines for the number of hidden nodes in Neural Network development, we tested a range of hidden layers from 5 to 50.

Empirical Example Quick Review on Linear Discriminant Analysis: General Form: Y=X 1 + X 2 + X 3 …+X n  The dependent variable (Y) is categorical (can be 2 or more categories)…the independent variables (X) are metric;  The linear variate maximizes the discrimination between two pre-defined groups;  The primary assumptions include: Normality Linearity Non-multicollinearity among the independent variables  The discriminant weights indicate the contribution of each variable;  Traditionally a “hit” matrix is the output.

Empirical Example Quick Review on Logistic Analysis: General Form: Prob event /Prob non-event = e B0+B1X1+B2X2…+BnXn  The technique requires a binary dependent variable;  Is less sensitive to assumptions of normality;  Function is S-shaped and is bounded between 1 and 0;  Where LDA and Regression use the least squares method of estimation, Logistic Analysis uses a maximum likelihood estimation algorithm;  The weights are measures of changes in the ratio of the probabilities or odds ratios;  Proc Logistic in SAS produces a “classification” matrix that provides sensitivity and specificity information to support the development of an ROC curve.

Empirical Example Quick Review on Neural Networks: Input Layer Hidden Layer Output Layer Σ S Combination Function combines all inputs into a single value, usually as a weighted summation Transfer Function Calculates the output value from the combination function input output

Empirical Example - Results TechniqueClass Rate “Goods” Class Rate “Bads” Class Rate “Global” ThetaK-S Test LDA 73.91% 43.40% 59.74% 68.98% 19% Logistic70.54%59.64% 69.45% 68.00%24% NN-5 Hidden Layers63.50%56.50%58.88%63.59% 38% NN-10 Hidden Layers75.40%44.50%55.07%64.46%11% NN-15 Hidden Layers60.10%62.10%61.40%65.89%24% NN-20 Hidden Layers62.70%59.00%60.29%65.27%24% NN-25 Hidden Layers 76.60% 41.90%53.78%63.55%16% NN-30 Hidden Layers52.70% 68.50% 63.13%65.74%22% NN-35 Hidden Layers60.30%59.00%59.46%63.30%22% NN-40 Hidden Layers62.40%58.30%59.71%64.47%17% NN-45 Hidden Layers54.10%65.20%61.40%64.50%31% NN-50 Hidden Layers53.20% 68.50% 63.27%65.15%37%

Empirical Example - Conclusions What were we able to demonstrate? 1.The “best” model depends upon the evaluation method selected; 2.The appropriate evaluation method depends upon situational and data context; 3.No multivariate technique is “best” under all circumstances.