Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Analysis for Credit Card Fraud Detection Alejandro Correa Bahnsen Luxembourg University.

Similar presentations


Presentation on theme: "Data Analysis for Credit Card Fraud Detection Alejandro Correa Bahnsen Luxembourg University."— Presentation transcript:

1 Data Analysis for Credit Card Fraud Detection Alejandro Correa Bahnsen Luxembourg University

2 Introduction

3

4 Simplify transaction flow Fraud?? Network

5 Introduction Database Evaluation of algorithms Logistic Regression Financial measure Cost Sensitive Logistic Regression Agenda

6 Database Larger European card processing company 2012 card present transactions 750,000 Transactions 3500 Frauds 0.467% Fraud rate 148,562 EUR lost due to fraud on test dataset Dec Nov Oct Sep Aug Jul Jun May Apr Mar Feb Jan Test Train

7 Raw attributes Other attributes: Age, country of residence, postal code, type of card Database TRXIDClient IDDateAmountLocationType Merchant Group Fraud 112/1/12 6:00580LuxInternetAirlinesNo 212/1/12 6:15120LuxPresentCar RentingNo 322/1/12 8:2012BelPresentHotelYes 413/1/12 4:1560LuxATM No 523/1/12 9:188FraPresentRetailNo 613/1/12 9:551210LuxInternetAirlinesYes 7

8 Derived attributes Combination of following criteria: Database ID Num CC DateAmtLocationType Merchant Group Fraud No. of Trx – same client – last 6 hour Sum – same client – last 7 days 112/1/12 6:00580LuxInternetAirlinesNo00 212/1/12 6:15120LuxPresentCar RentingNo /1/12 8:2012BelPresentHotelYes00 413/1/12 4:1560LuxATM No /1/12 9:188FraPresentRetailNo /1/12 9:551210LuxInternetAirlinesYes1760 ByGroupLastFunction ClientNonehourCount Credit CardTransaction TypedaySum(Amount) MerchantweekAvg(Amount) Merchant Categorymonth Merchant Group 13 months Merchant Group 2 Merchant Country 8

9 Evaluation TP FP FNTN Confusion matrix

10 Introduction Database Evaluation of algorithms Logistic Regression Financial measure Cost Sensitive Logistic Regression Agenda

11 01 10 Model Cost Function Cost Matrix Logistic Regression

12 1% 5% 10% 20% 50% Logistic Regression Under sampling procedure 0.467% Select all the frauds and a random sample of the legitimate transactions.

13 Logistic Regression Results

14 Motivation False positives carry a different cost than false negatives Frauds range from few to thousands of euros (dollars, pounds, etc) Financial evaluation There is a need for a real comparison measure

15 Cost matrix where: Financial evaluation Ca Administrative costs Amt Amount of transaction i Ca Amt0 Evaluation measure

16 Logistic Regression Results Selecting the algorithm by F1-Score Selecting the algorithm by Cost

17 Logistic Regression Best model selected using traditional F1-Score does not give the best results in terms of cost Model selected by cost, is trained using less than 1% of the database, meaning there is a lot of information excluded The algorithm is trained to minimize the miss-classification (approx.) but then is evaluated based on cost Why not train the algorithm to minimize the cost instead?

18 Ca Amt0 Cost Matrix Cost Sensitive Logistic Regression Cost Function

19 Cost sensitive Logistic Regression Results

20 Cost sensitive Logistic Regression Results

21 Conclusion Selecting models based on traditional statistics does not give the best results in terms of cost Models should be evaluated taking into account real financial costs of the application Algorithms should be developed to incorporate those financial costs

22 Thank you!

23 Contact information Alejandro Correa Bahnsen University of Luxembourg Luxembourg


Download ppt "Data Analysis for Credit Card Fraud Detection Alejandro Correa Bahnsen Luxembourg University."

Similar presentations


Ads by Google