Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Mining For Credit Card Fraud : A Comparative Study Xxxxxxxx DSCI 5240 | Dr. Nick Evangelopoulos Graduate Presentation.

Similar presentations


Presentation on theme: "Data Mining For Credit Card Fraud : A Comparative Study Xxxxxxxx DSCI 5240 | Dr. Nick Evangelopoulos Graduate Presentation."— Presentation transcript:

1 Data Mining For Credit Card Fraud : A Comparative Study Xxxxxxxx DSCI 5240 | Dr. Nick Evangelopoulos Graduate Presentation

2 Overview O Credit Card Fraud O Data Mining Techniques O Data O Experimental Setup O Results Graduate Presentation | DSCI 5240 | Xxxxxxx2

3 Credit Card Fraud O Two Types: O Application Fraud O Obtain new cards using false information O Behavioral Fraud O Mail theft O Stolen/lost card O Counterfeit card Graduate Presentation | DSCI 5240 | Xxxxxxx3

4 Credit Card Fraud O Online Revenue loss due to Fraud (cybersource.com) Graduate Presentation | DSCI 5240 | Xxxxxxx4

5 Data Mining Techniques O Logistic Regression O Used to predict outcome of categorical dependent variable O Fraud variable is binary O Support Vector Machines O Random Forest Graduate Presentation | DSCI 5240 | Xxxxxxx5

6 Support Vector Machines (SVM) O Supervised learning models with associated learning algorithms that analyze and recognize patterns O Linear classifiers that work in high dimensional feature space that is non-linear mapping of input space O Two properties of SVM O Kernel representation O Margin optimization Graduate Presentation | DSCI 5240 | Xxxxxxx6

7 Random Forest (RF) O Ensemble of classification trees O Performs well when individual members are dissimilar Graduate Presentation | DSCI 5240 | Xxxxxxx7

8 Data: Datasets O 13 Months of data (Jan 2006 – Jan 2007) O 50 Million credit card transactions on 1 Million credit cards O 2420 known fraudulent transactions with 506 credit cards Graduate Presentation | DSCI 5240 | Xxxxxxx8

9 Percentage of Transaction by transaction type Graduate Presentation | DSCI 5240 | Xxxxxxx9

10 Data Selection Graduate Presentation | DSCI 5240 | Xxxxxxx10

11 Primary attributes in Dataset Graduate Presentation | DSCI 5240 | Xxxxxxx11

12 Derived Attributes Graduate Presentation | DSCI 5240 | Xxxxxxx12

13 Experimental Setup O For SVM, Gaussian radial basis function was used as the kernel function O For Random Forest, number of attributes considered at the node and number of trees was set. O Data were sampled at different rates using random under sampling of majority class Graduate Presentation | DSCI 5240 | Xxxxxxx13

14 Training and testing data Graduate Presentation | DSCI 5240 | Xxxxxxx14

15 Results Graduate Presentation | DSCI 5240 | Xxxxxxx15

16 Proportion of fraud captured at different depths Graduate Presentation | DSCI 5240 | Xxxxxxx16

17 Fraud Capture Rate w/ Different Fraud Rates in Training Data Graduate Presentation | DSCI 5240 | Xxxxxxx17

18 Conclusion O Examine the performance of two data mining techniques O SVM and RF together with logistic regression O Used real life data set from Jan 2006 – Jan 2007 O Used data undersampling approach to sample data O Random forest showed much higher performance at upper file depths O SVM performance at the upper file depths tended to increase with lower proportion of fraud in the training data O Random forest demonstrated overall better performance Graduate Presentation | DSCI 5240 | Xxxxxxx18

19 Questions Graduate Presentation | DSCI 5240 | Xxxxxxx19


Download ppt "Data Mining For Credit Card Fraud : A Comparative Study Xxxxxxxx DSCI 5240 | Dr. Nick Evangelopoulos Graduate Presentation."

Similar presentations


Ads by Google