Download presentation

Presentation is loading. Please wait.

Published byRoderick Chisnell Modified over 2 years ago

1
Data Analysis for Credit Card Fraud Detection Alejandro Correa Bahnsen Luxembourg University

2
Introduction

4
Simplify transaction flow Fraud?? Network

5
Introduction Database Evaluation of algorithms Logistic Regression Financial measure Cost Sensitive Logistic Regression Agenda

6
Database Larger European card processing company 2012 card present transactions 750,000 Transactions 3500 Frauds 0.467% Fraud rate 148,562 EUR lost due to fraud on test dataset Dec Nov Oct Sep Aug Jul Jun May Apr Mar Feb Jan Test Train

7
Raw attributes Other attributes: Age, country of residence, postal code, type of card Database TRXIDClient IDDateAmountLocationType Merchant Group Fraud 112/1/12 6:00580LuxInternetAirlinesNo 212/1/12 6:15120LuxPresentCar RentingNo 322/1/12 8:2012BelPresentHotelYes 413/1/12 4:1560LuxATM No 523/1/12 9:188FraPresentRetailNo 613/1/12 9:551210LuxInternetAirlinesYes 7

8
Derived attributes Combination of following criteria: Database ID Num CC DateAmtLocationType Merchant Group Fraud No. of Trx – same client – last 6 hour Sum – same client – last 7 days 112/1/12 6:00580LuxInternetAirlinesNo00 212/1/12 6:15120LuxPresentCar RentingNo1580 322/1/12 8:2012BelPresentHotelYes00 413/1/12 4:1560LuxATM No0700 523/1/12 9:188FraPresentRetailNo012 613/1/12 9:551210LuxInternetAirlinesYes1760 ByGroupLastFunction ClientNonehourCount Credit CardTransaction TypedaySum(Amount) MerchantweekAvg(Amount) Merchant Categorymonth Merchant Group 13 months Merchant Group 2 Merchant Country 8

9
Evaluation TP FP FNTN Confusion matrix

10
Introduction Database Evaluation of algorithms Logistic Regression Financial measure Cost Sensitive Logistic Regression Agenda

11
01 10 Model Cost Function Cost Matrix Logistic Regression

12
1% 5% 10% 20% 50% Logistic Regression Under sampling procedure 0.467% Select all the frauds and a random sample of the legitimate transactions.

13
Logistic Regression Results

14
Motivation False positives carry a different cost than false negatives Frauds range from few to thousands of euros (dollars, pounds, etc) Financial evaluation There is a need for a real comparison measure

15
Cost matrix where: Financial evaluation Ca Administrative costs Amt Amount of transaction i Ca Amt0 Evaluation measure

16
Logistic Regression Results Selecting the algorithm by F1-Score Selecting the algorithm by Cost

17
Logistic Regression Best model selected using traditional F1-Score does not give the best results in terms of cost Model selected by cost, is trained using less than 1% of the database, meaning there is a lot of information excluded The algorithm is trained to minimize the miss-classification (approx.) but then is evaluated based on cost Why not train the algorithm to minimize the cost instead?

18
Ca Amt0 Cost Matrix Cost Sensitive Logistic Regression Cost Function

19
Cost sensitive Logistic Regression Results

20
Cost sensitive Logistic Regression Results

21
Conclusion Selecting models based on traditional statistics does not give the best results in terms of cost Models should be evaluated taking into account real financial costs of the application Algorithms should be developed to incorporate those financial costs

22
Thank you!

23
Contact information Alejandro Correa Bahnsen University of Luxembourg Luxembourg al.bahnsen@gmail.com http://www.linkedin.com/in/albahnsen http://www.slideshare.net/albahnsen

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google