Loan Default Model Saed Sayad 1www.ismartsoft.com.

Slides:



Advertisements
Similar presentations
The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
Advertisements

R for Classification Jennifer Broughton Shimadzu Research Laboratory Manchester, UK 2 nd May 2013.
Wednesday AM  Presentation of yesterday’s results  Associations  Correlation  Linear regression  Applications: reliability.
Data Analysis of Tennis Matches Fatih Çalışır. 1.ATP World Tour 250  ATP 250 Brisbane  ATP 250 Sydney... 2.ATP World Tour 500  ATP 500 Memphis  ATP.
LINEAR REGRESSION: What it Is and How it Works Overview What is Bivariate Linear Regression? The Regression Equation How It’s Based on r.
LINEAR REGRESSION: What it Is and How it Works. Overview What is Bivariate Linear Regression? The Regression Equation How It’s Based on r.
REGRESSION What is Regression? What is the Regression Equation? What is the Least-Squares Solution? How is Regression Based on Correlation? What are the.
Data Mining CS 341, Spring 2007 Lecture 4: Data Mining Techniques (I)
Modeling Gene Interactions in Disease CS 686 Bioinformatics.
CLUSTERING (Segmentation)
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
General Mining Issues a.j.m.m. (ton) weijters Overfitting Noise and Overfitting Quality of mined models (some figures are based on the ML-introduction.
REGRESSION Predict future scores on Y based on measured scores on X Predictions are based on a correlation from a sample where both X and Y were measured.
Data Mining: A Closer Look Chapter Data Mining Strategies 2.
Chapter 5 Data mining : A Closer Look.
April 11, 2008 Data Mining Competition 2008 The 4 th Annual Business Intelligence Symposium Hualin Wang Manager of Advanced.
Comparison of Classification Methods for Customer Attrition Analysis Xiaohua Hu, Ph.D. Drexel University Philadelphia, PA, 19104
University of Toronto 8/30/20151 Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad.
Copyright © 2006, SAS Institute Inc. All rights reserved. Predictive Modeling Concepts and Algorithms Russ Albright and David Duling SAS Institute.
Assessment of Model Development Techniques and Evaluation Methods for Binary Classification in the Credit Industry DSI Conference Jennifer Lewis Priestley.
B. RAMAMURTHY EAP#2: Data Mining, Statistical Analysis and Predictive Analytics for Automotive Domain CSE651C, B. Ramamurthy 1 6/28/2014.
David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources:
Statistics for Decision Making Bivariate Descriptive Statistics QM Fall 2003 Instructor: John Seydel, Ph.D.
Chapter 3 Describing Bivariate Data General Objectives: Sometimes the data that are collected consist of observations for two variables on the same experimental.
Dr. Russell Anderson Dr. Musa Jafar West Texas A&M University.
ROC 1.Medical decision making 2.Machine learning 3.Data mining research communities A technique for visualizing, organizing, selecting classifiers based.
Performance measurement. Must be careful what performance metric we use For example, say we have a NN classifier with 1 output unit, and we code ‘1 =
Data Mining Overview. Lecture Objectives After this lecture, you should be able to: 1.Explain key data mining tasks in your own words. 2.Draw an overview.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Introduction to Statistics Mr. Joseph Najuch Introduction to statistical concepts including descriptive statistics, basic probability rules, conditional.
Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,
Summary Statistics Review
Discriminant Analysis Discriminant analysis is a technique for analyzing data when the criterion or dependent variable is categorical and the predictor.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Jennifer Lewis Priestley Presentation of “Assessment of Evaluation Methods for Prediction and Classification of Consumer Risk in the Credit Industry” co-authored.
Machine Learning Documentation Initiative Workshop on the Modernisation of Statistical Production Topic iii) Innovation in technology and methods driving.
1 STAT 5814 Statistical Data Mining. 2 Use of SAS Data Mining.
September 18-19, 2006 – Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development Conducting and interpreting multivariate analyses.
Overview of the final test for CSC Overview PART A: 7 easy questions –You should answer 5 of them. If you answer more we will select 5 at random.
Syllabus. We covered Regression in Applied Stats. We will review Regression and cover Time Series and Principle Components Analysis. Reference Book.
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 12 Testing for Relationships Tests of linear relationships –Correlation 2 continuous.
Review Lecture 51 Tue, Dec 13, Chapter 1 Sections 1.1 – 1.4. Sections 1.1 – 1.4. Be familiar with the language and principles of hypothesis testing.
Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc
1 Some more examples Client satisfaction Products sold Trusted advisor score Net growth TOP PERFORMERS Age diversity HIGH Credibility HIGH Absenteeism.
Predicting Voice Elicited Emotions
LESSON 6: REGRESSION 2/21/12 EDUC 502: Introduction to Statistics.
DR. SATISH NARGUNDKAR GEORGIA STATE UNIVERSITY Analytics Overview.
Logistic Regression Saed Sayad 1www.ismartsoft.com.
A Brief Introduction and Issues on the Classification Problem Jin Mao Postdoc, School of Information, University of Arizona Sept 18, 2015.
Risk Solutions & Research © Copyright IBM Corporation 2005 Default Risk Modelling : Decision Tree Versus Logistic Regression Dr.Satchidananda S Sogala,Ph.D.,
BIVARIATE/MULTIVARIATE DESCRIPTIVE STATISTICS Displaying and analyzing the relationship between continuous variables.
2011 Data Mining Industrial & Information Systems Engineering Pilsung Kang Industrial & Information Systems Engineering Seoul National University of Science.
Strategies for Metabolomic Data Analysis Dmitry Grapov, PhD.
Model Evaluation Saed Sayad
Data Resource Management – MGMT An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical.
Regression and Correlation of Data Correlation: Correlation is a measure of the association between random variables, say X and Y. No assumption that one.
 Naïve Bayes  Data import – Delimited, Fixed, SAS, SPSS, OBDC  Variable creation & transformation  Recode variables  Factor variables  Missing.
DATA MODELING & PREPARATION Biz Pro 9 th Study Group.
Thursday, May 12, 2016 Report at 11:30 to Prairieview
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
LENDING CLUB LOAN ANALYSIS
Data Transformation: Normalization
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
Introduction to Data Mining and Classification
Machine Learning & Data Science
Machine Learning Interpretability
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining Overview.
About Data Analysis.
Presentation transcript:

Loan Default Model Saed Sayad 1www.ismartsoft.com

Data Mining Steps 1 Problem Definition 2 Data Preparation 3 Data Exploration 4 Modeling 5 Evaluation 6 Deployment

1. Problem Definition Build loan default prediction model for small business using the historical data to assess the likelihood of default by an obligor. Build loan default prediction model for small business using the historical data to assess the likelihood of default by an obligor.

Data Mining Team Modeler AnalystDBA Domain Expert

2. Data Preparation No of Cases: 35,500 No of Defaults: 2,500 (7%) Number of Variables: 25 Total balance for all cases: $554,000,000 Total balance for defaults: $58,000,000 (10.4%) No of Cases: 35,500 No of Defaults: 2,500 (7%) Number of Variables: 25 Total balance for all cases: $554,000,000 Total balance for defaults: $58,000,000 (10.4%)

3. Data Exploration Data Exploration Univariate Analysis Frequency, Average, Min, Max,... Bar, Line, Pie,... Charts Bivariate Analysis Correlation Z test,... Combination Charts

Data Exploration - Univariate 7www.ismartsoft.com Months in Business

Data Exploration - Bivariate Default% Months in Business and Default

4. Modeling Classification Bayesian Decision Tree Logistic Regression SVM Regression Linear Regression Robust Regression Neural Network Clustering HierarchicalK-Means Association A Priori

Modeling - Classification f DELQ Age Type Default Y or N Logistic Regression

Logistic Regression Model 0 1 Linear Model Logistic Model Default Months in Business 11www.ismartsoft.com

5. Evaluation ChartsStats Variables Contribution Mean Square Error Confusion Matrix K-S ChartLift ChartGain Chart

Evaluation – Variables Contribution

Evaluation - Confusion Matrix % 264 3% 313 4% % 8167 Positive Cases Negative Cases Predicted Positive Predicted Negative

Evaluation – Gain Chart Population% 50%10% 100% 58% 10% Default%

Return On Investment Total Number of Loans = 8,167 Total Number of Defaults = 560 Total Balance for Defaults = $12,281,589 Top 10% Random – Number of Defaults = 56 – Total Balance = $1,230,000 Top 10% Model – Number of Defaults = 305 – Total Balance = $7,655,772 Total Number of Loans = 8,167 Total Number of Defaults = 560 Total Balance for Defaults = $12,281,589 Top 10% Random – Number of Defaults = 56 – Total Balance = $1,230,000 Top 10% Model – Number of Defaults = 305 – Total Balance = $7,655, % ROI

6. Deployment SQL Batch Scoring HTML Web- based Scoring

Questions?