Presentation is loading. Please wait.

Presentation is loading. Please wait.

Comparison of Classification Methods for Customer Attrition Analysis Xiaohua Hu, Ph.D. Drexel University Philadelphia, PA, 19104

Similar presentations


Presentation on theme: "Comparison of Classification Methods for Customer Attrition Analysis Xiaohua Hu, Ph.D. Drexel University Philadelphia, PA, 19104"— Presentation transcript:

1 Comparison of Classification Methods for Customer Attrition Analysis Xiaohua Hu, Ph.D. Drexel University Philadelphia, PA, 19104 thu@cis.drexel.edu

2 Outline Introduction of the Business Problem Data Selection and Data Processing Data Mining Model Development Process Data Mining Findings Q & A

3 Data Mining for Customer Attrition Analysis In the financial industry, data mining has been applied successfully in determining: Target-oriented campaign Identify and understand customer segment: attriter vs. loyal customers, profitable customers vs. regular Identify cross-sell, up-sell opportunity increase the wallet-share of the customers Risk analysis for loan applications, credit fraud detection Finance planning and asset evaluation

4 Customer Attrition Analysis The goal of attrition analysis is to identify a group of customers who have a high probability to attrite, and then the company can conduct marketing campaigns to change the behavior in the desired direction (change their behavior, reduce the attrition rate).

5 Business Problem Our client is one of the largest banks in the world This attrition analysis project related to one type of credit load service, Over 750,000 customers currently use this service with $1.5 billion in outstanding, every month, about 5,700 customer close their accounts/ transfer to other banks mostly due to rate, credit line, and fees

6 Problem Definition Slow attriters: Customers who slowly pay down their outstanding balance until they become inactive. Fast attriters: Customers who quickly pay down their balance and either lapse it or close it via phone call or write in.

7 Data Mining Tasks 1.Utilizing data on accounts that remained continuously open in the last 4 months, predict, with 60 days in advance notice, the likelihood that a particular customer will opt to voluntarily close his/her account either by phone or write-in. 2.Utilizing data on accounts that remained continuously open in the last 4 months, predict, with 60 days advance notice, the likelihood that a particular customer will have his account transferred to a competing institution. The account may or may not remain open.

8 Challenging issues in our project Data highly skewed: 3% attriters vs 97% regular customers Time-series data: our data warehouse has the past 12 month credit loan service information, High dimensions: 850 attributes for each customers Lots of dirty data and missing values in the records

9 Data Mining Process for Customer Attrition Analysis 1.Problem definition: formulation of business problems in the area of customer retention. 2.Data review and initial selection 3.Problem formulation in terms of existing data 4.Data gathering, cataloging and formatting 5.Data Processing: (a) Data cleansing, data unfolding and time- sensitive variable definition, target variable definition, (b) Statistical analysis, (c) Sensitivity analysis, (d) Feature selection, (d) Leaker detection 6.Data modeling via classification model: Decision Trees, Neural Networks, Bayesian Networks, an ensemble of classifiers 7.Result review and analysis: use the data mining model to predict the likely attriters among the current customers 8.Result Deployment: target the likely attriters (called rollout)

10 Data Source Data Warehouse: Credit Card Data Warehouse containing about 200 product specific fields Third Party Data : A set of account related demographic and credit bureau information Segmentation files :Set of account related segmentation values based on our client's segmentation scheme which combines Risk, Profitability and External potential Payment Database :Database that stores all checks processed. The database can categorize source of checks

11 Data Processing Goals Reflects data changes over time. Recognizes and removes statistically insignificant fields Defines and introduces the "target" field Allows for second stage preprocessing and statistical analysis.

12 Data Processing Steps Time series "unrolling" Target value definition First stage statistical analysis Field sensitivity analysis and field reduction Files set generation

13 Data Mining Algorithms for Attrition Analysis 1.Boosted Naïve Bayesian (BNB) 2.NeuralWare Predict (a commercial neural network from NeuralWare Inc) 3.Decision Tree (based on C4.5 with some modification) 4.Selective Naïve Bayesian (SNB). 5.An ensemble of classifier of the above four methods

14 Classification accuracy is not a proper measure for attrition analysis The goal of attrition analysis is not to to predict the behavior of every customer, but to find a good subset of customers where the percentage of attriters is high Classification error (false positive, false negative) have different economic consequence in attrition analysis, need to be treated differently

15 Criterion for Attrition Analysis: Lift Lift rather than classification accuracy is a better measure for the attrition analysis, a lift reflects the redistribution of responders in the testing set after the testing examples are ranked lift can be calculated by looking at the cumulative targets captured up to p% as a percentage of all targets and dividing by p%. For example, the top 10% of the sorted list may contain 35% of likely attriters, then the model has a lift of 35/10=3.5.

16 Boosted Naïve Bayesian Network PctCasesHits BBN%hitsliftHits (no model) 17034.31.91.5 5354339.34.27.8 10709628.74.015.6 151063716.73.023.4 201418785.52.531.2 251772935.22.439.0 3021271004.72.146.8 4031091154.11.862.4 5035451344.81.778.0 8060271522.71.2124.8 10070911562.21.0156.0

17 Decision Tree (revised 4.5) PctCasesHits Decision Tree %hitsliftHits (no model) 17068.63.91.5 4283258.84.06.2 8567478.33.812.5 9638568.84.014.0 10709608.53.815.6 201418956.73.031.2 2517721015.72.639.0

18 Neural Network (Predict) PctCasesHits NN%hitsliftHits (no model) 170912.95.81.5 53544111.65.37.8 10709537.53.415.6 151063736.93.123.4 201418866.12.831.2 2517721055.92.739.0 3021271165.52.546.8 4031091254.42.062.4 5035451343.81.778.0 8060271502.61.2124.8 10070911562.21.0156.0

19 Selective Naïve Bayesian Network PctCasesHits BBN%hitsliftHits (no model) 17057.13.21.5 5354349.64.47.8 10709699.74.015.6 151063837.83.523.4 201418926.52.931.2 2517721055.92.739.0 3021271125.32.446.8 4031091184.21.962.4 5035451253.51.678.0 8060271532.71.2124.8 10070911562.21.0156.0

20 An Ensemble of Classifiers PctCasesHits BBN%hitsliftHits (no model) 17045.72.61.5 53543610.34.67.8 10709638.94.015.6 151063817.73.523.4 201418966.53.031.2 2517721045.92.639.0 3021271215.72.646.8 4031091445.42.362.4 5035451544.42.078.0 8060271562.71.2124.8 10070911562.21.0156.0

21 Field Test Try to verify the following two points: the top percentage of the customer attrition list does contain concentrated attriters the data mining based marketing approach is effective for attrition analysis purpose.

22 Field Test Results Top 5% of 750000 customer = 37500 (output from the data mining prediction list), create 2 groups with 10000 customers each by random sampling from 37500 top customers from the prediction list sorted by the score Group 1: the marketing department contacted each customer and offered some incentive packages to encourage the customers to stay with the company Group 2: no action. Two months later, examines the customers in Group 1 and Group 2. Group 1 has a attrition rate 0.8%, while Group 2 has 10.6% (the average attrition rate is 2.2%). Lift is 4.8

23 Q & A ?


Download ppt "Comparison of Classification Methods for Customer Attrition Analysis Xiaohua Hu, Ph.D. Drexel University Philadelphia, PA, 19104"

Similar presentations


Ads by Google