Universidad Complutense de Madrid Máster en Ingeniería Matemática Curso 2007-2008 Modelling Week Second Edition June 16 – June 24, 2008.

Slides:



Advertisements
Similar presentations
McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. A PowerPoint Presentation Package to Accompany Applied Statistics.
Advertisements

Assumptions underlying regression analysis
Credit Risk Plus November 15, 2010 By: A V Vedpuriswar.
Chapter 25 Risk Assessment. Introduction Risk assessment is the evaluation of distributions of outcomes, with a focus on the worse that might happen.
Brief introduction on Logistic Regression
Forecasting Using the Simple Linear Regression Model and Correlation
Inference for Regression
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Sampling distributions. Example Take random sample of 1 hour periods in an ER. Ask “how many patients arrived in that one hour period ?” Calculate statistic,
Simple Linear Regression
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
Introduction to Probability and Statistics Linear Regression and Correlation.
Experimental Evaluation
1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.
Correlation and Regression Analysis
1 Simple Linear Regression 1. review of least squares procedure 2. inference for least squares lines.
C REDIT R ISK M ODELS C ROSS - V ALIDATION – I S T HERE A NY A DDED V ALUE ? Croatian Quants Day Zagreb, June 6, 2014 Vili Krainz The.
Correlation and Linear Regression
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
Regression and Correlation Methods Judy Zhong Ph.D.
Inference for regression - Simple linear regression
Linear Regression and Correlation
Modelling Credit Risk Croatian Quants Day Vančo Balen
Chapter 9 Comparing More than Two Means. Review of Simulation-Based Tests  One proportion:  We created a null distribution by flipping a coin, rolling.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Mrs.Shefa El Sagga F&BMP110/2/ Problems with the VaR Approach   Bankers The first problem with VaR is that it does not give the precise.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Statistical Analysis. Statistics u Description –Describes the data –Mean –Median –Mode u Inferential –Allows prediction from the sample to the population.
Lab 3b: Distribution of the mean
MBP1010H – Lecture 4: March 26, Multiple regression 2.Survival analysis Reading: Introduction to the Practice of Statistics: Chapters 2, 10 and 11.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
Sample Size Considerations for Answering Quantitative Research Questions Lunch & Learn May 15, 2013 M Boyle.
Section 10.1 Confidence Intervals
Univariate Linear Regression Problem Model: Y=  0 +  1 X+  Test: H 0 : β 1 =0. Alternative: H 1 : β 1 >0. The distribution of Y is normal under both.
Copyright © 2010 Pearson Education, Inc. Slide
Model Selection and Validation. Model-Building Process 1. Data collection and preparation 2. Reduction of explanatory or predictor variables (for exploratory.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Review Lecture 51 Tue, Dec 13, Chapter 1 Sections 1.1 – 1.4. Sections 1.1 – 1.4. Be familiar with the language and principles of hypothesis testing.
CIA Annual Meeting LOOKING BACK…focused on the future.
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Robust Regression. Regression Methods  We are going to look at three approaches to robust regression:  Regression with robust standard errors  Regression.
Machine Learning 5. Parametric Methods.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Chapter 1 Introduction to Statistics. Section 1.1 Fundamental Statistical Concepts.
AP Statistics Section 15 A. The Regression Model When a scatterplot shows a linear relationship between a quantitative explanatory variable x and a quantitative.
Classification Tree Interaction Detection. Use of decision trees Segmentation Stratification Prediction Data reduction and variable screening Interaction.
BPS - 5th Ed. Chapter 231 Inference for Regression.
1 Simple Linear Regression Chapter Introduction In Chapters 17 to 19 we examine the relationship between interval variables via a mathematical.
Chapter 13 Simple Linear Regression
Inference for Least Squares Lines
Inference for Regression (Chapter 14) A.P. Stats Review Topic #3
AP Statistics Chapter 14 Section 1.
Simple Linear Regression - Introduction
CHAPTER 29: Multiple Regression*
CHAPTER 26: Inference for Regression
Choice of Methods and Instruments
Chapter 11: Inference for Distributions of Categorical Data
Least-Squares Regression
Regression Chapter 8.
Simple Linear Regression
CHAPTER 12 More About Regression
Least-Squares Regression
Introduction to SAS Essentials Mastering SAS for Data Analytics
Statistical Thinking and Applications
Making Inferences about Slopes
Probabilistic Surrogate Models
Presentation transcript:

Universidad Complutense de Madrid Máster en Ingeniería Matemática Curso Modelling Week Second Edition June 16 – June 24, 2008

Credit Scoring Modelling for Retail Banking Sector Problem raised by Accenture. Coordinators: Ignacio Villanueva (UCM). Estela Luna (Accenture).

Team members: Elena Bartolozzi (Universitá di Firenze) Matthew Cornford (University of Oxford) Leticia García-Ergüín (UCM) Cristina Pascual Deocón (UCM) Oscar Iván Pascual (UCM) Francisco Javier Plaza (UCM) Credit Scoring Modelling for Retail Banking Sector

Index Introduction Methodology and Data Univariate and Multivariate Analysis Model Creation Validation Calibration

Credit Scoring Modelling for Retail Banking Sector Index Introduction Methodology and Data Univariate and Multivariate Analysis Model Creation Validation Calibration

Our problem is concerned with who a bank should loan its money to. When a client applies for a loan, the bank would like to be sure that the client will pay back the full amount of the loan. We need effective models that allow us to predict if a client will pay back the loan. What we have is historical data for several variables. We are trying to fit a model to this historical data so we can estimate a probability of default. Credit Scoring Modelling for Retail Banking Sector

Index Introduction Methodology and Data Univariate and Multivariate Analysis Model Creation Validation Calibration

Our data is provided by Accenture and include details of completed loan agreements The variables included are: Age Income Wealth Marital Status Length as a Client Amount of Loan Maturity Default Credit Scoring Modelling for Retail Banking Sector

Sample Selection We split the sample into two parts The modelling sample The validation sample Credit Scoring Modelling for Retail Banking Sector

Modelling Sample A random sample from the data is selected. The size of the modelling sample is about 2/3 of the original data This new sample is used to create the model. Validation Sample The remaining data is used to validate the model We test how many defaults the model predicted and which of them really did default. Credit Scoring Modelling for Retail Banking Sector

Index Introduction Methodology and Data Univariate and Multivariate Analysis Model Creation Validation Calibration

We have a dependent variable, which is default, and some independent variables (age, income,…) First of all, we do univariate analysis. For each variable, we calculate some statistics like mean, standard deviation, skewness… We plot some histograms… This information can be use as a first check before applying the model. It would be better if the data were homogeneous. Credit Scoring Modelling for Retail Banking Sector

Univariate Analysis Weve used SAS software to generate these statistics: output.htm Credit Scoring Modelling for Retail Banking Sector

This kind of analysis is very useful to detect outliers or transcription mistakes.

Multivariate Analysis Correlations Credit Scoring Modelling for Retail Banking Sector

Chi-squared test We try to calculate which of the variables are explanatory variables, i.e. which variables does default depend on. We use the chi-squared test for that: To begin with, we must discretize the continuous variables using percentiles. After doing Chi-squared test, we look at the p-value. If p-value<0.05, we reject independency If p-value>0.05 we do not reject independency. Credit Scoring Modelling for Retail Banking Sector

Index Introduction Methodology and Data Univariate and Multivariate Analysis Model Creation Validation Calibration

According to the results of the Univariate and Multivariate Analysis, the variables we include in our model are: Age Income Wealth Marital Status Maturity Credit Scoring Modelling for Retail Banking Sector

We apply a logit model using proc logistic in SAS and glmfit in MATLAB as well, obtaining the same results. Credit Scoring Modelling for Retail Banking Sector

Intercept Age Income Wealth Marital Status Maturity There must be some diferences because we randomize the sample. Credit Scoring Modelling for Retail Banking Sector

So, our model is as follows:

Credit Scoring Modelling for Retail Banking Sector Index Introduction Methodology and Data Univariate and Multivariate Analysis Model Creation Validation Calibration

Credit Scoring Modelling for Retail Banking Sector Model statistics:

Powerstat is a method to measure the likelihood of the model The data is sorted from worse to better according to the probability of default calculated with our model. The perfect model will have the total amount of defaults at the beginning. We plot accumulated defaults against accumulated observations. Powerstat compares the area between the perfect model, our model and a random model. Credit Scoring Modelling for Retail Banking Sector

Powerstat (Gini Index): Credit Scoring Modelling for Retail Banking Sector

Validation Once the probability of default for each client is found, the question is how to choose the level that classifies if a client will default or not. We use the validation data to predict with our model how many observations will default and compare with which of them are really did default. Repeating the process with several random samples, the probability has very low deviation and rounds Credit Scoring Modelling for Retail Banking Sector

Index Introduction Methodology and Data Univariate and Multivariate Analysis Model Creation Validation Calibration

The expected Loss is defined as: EL = PD * EAD * LGD PD is the percentage of default. Is defined as default probability calibrated for a year. EAD is the exposition to default. LGD are losses on the exhibition. Credit Scoring Modelling for Retail Banking Sector

Scoring allows us to sort people against default. However, these probabilities do not take into account when the default happens. This is the reason for calibration. We want to obtain the yearly average probability of default We need a sample of people observed in periods of years. The model is applied and the sample is sorted by score. We obtain a default observed rate: Minimizing the Least Squares Error with the MATLAB function fminsearch, we obtain the values: A= B= C=2.7870

Credit Scoring Modelling for Retail Banking Sector The Credit Scoring Model was solved quickly and didnt cause too much difficult. We asked Accenture to bring another, related problem. We now introduce the Problem of Capital Allocation.

The Problem of Capital Allocation Index The Problem of Capital Allocation Implementation Conclusions

The Problem of Capital Allocation Index The Problem of Capital Allocation Implementation Conclusions

In this problem a lender has a fixed amount of money to lend, EAD, between n blocks of similiar customers ¿How to distribute the money between the blocks to maximize the profit? Each block has associated with it an interest rate ρ i, an a priori probabilty of default PD i, the loss given default LGD i and the number of customers N i. If each customer in each block is independent of the rest then we can easily compute the probability of k defaults. The Problem of Capital Allocation

But the customers are correlated via the economy. We can use Gaussian Copula to introduce a default random variable for each customer: Then for a particular state of the economy we have that the independent probability of default for each customer is:

The Problem of Capital Allocation We use the binomial distribution: When N is big enough (in the order of 10^3) we can aproximate this binomial with normal random variable Di:

The Problem of Capital Allocation We define the loss distribution as: As L is a sum of independent normal distributions,

The Problem of Capital Allocation To measure risk we use Value at Risk (VaR) with a 99% confidence level. So the problem becomes: Where VaR99 is the fixed level of risk the lender is willing to take.

The Problem of Capital Allocation Index The Problem of Capital Allocation Implementation Conclusions

The Problem of Capital Allocation We start with 3 blocks to make the problem easier. We have to find the αs that minimise the expected loss. We have two approaches to solve this problem.

The Problem of Capital Allocation First we fix αs and find the VaR99 and Expected Loss for each set of αs (Black dots).

The Problem of Capital Allocation Then we find the αs that minimise the Expected Loss for any fixed VaR99 (Red Dots) using the MATLAB function fmincon. As we can see we got very good agreement between the two approaches, on the order of 10^(4).

The Problem of Capital Allocation Here we have the results for 5 blocks, which took considerably longer than with 3 blocks.

The Problem of Capital Allocation Conclusions Analytical method outperformed the simulation of w i as expected. Optimise for more than 3 blocks the choice of optimiser needs to be investigated furhter. Another interesting question is to look at the relationship between the efficient border and the interest rates charged for each block.

The Problem of Capital Allocation ¿Questions?