Presentation on theme: "Making Data Mining Models Useful to Model Non-paying Customers of Exchange Carriers Wei Fan, IBM T.J.Watson Janek Mathuria, and Chang-tien Lu Virginia."— Presentation transcript:
Making Data Mining Models Useful to Model Non-paying Customers of Exchange Carriers Wei Fan, IBM T.J.Watson Janek Mathuria, and Chang-tien Lu Virginia Tech, Northern Virginia Center
Our Selling Points A real practical problem for an actual CLEC company. A whole process: Start with a great goal. Reality taught us a lesson Settle down with a realistic solution A new set of algorithms to calibrate probability outputs (as distinguished from Zadrozny and Elkans calibration methods)
Challenging Problem Differentiate between Late and Default: Late: 1 month past due Default: two month past due. Default Percentage: 20%. Designed feature set: Details in Paper Calling summary. Billing summary. Obvious ones. Other ones out there? Maybe.
Failure Failure of Commonly Used Methods: Nearly predicting every customer as paying on time and still has 80% What this means: Our feature set not complete? Probably. Problem itself is just stochastic in nature. Natural next step: cost-sensitive learning? Impossible to define precisely due to complexity.
A Compromised Solution Predict a reliable probability score. A customer is uniquely distinguished by its feature vector. If the model predict that a customer has 20% chance to default Indeed the customer has 20% chance to default The predicted score is considered reliable
Previously Proposed Calibration Methods Existing approaches that output scores are not reliable (Zadrozny and Elkan) Decision trees. Naïve Bayes SVM Logistic Regression Use function mapping to calibrate unreliable score to reliable ones. Assumption: original unreliable score need to be monotonous. Otherwise, it is not applicable.
A Good Calibration
A Bad Calibration
Random Decision Trees Amazingly Simple and Counter-intuitive: Do not use any purity check function. Pick a feature randomly. Continuous feature, pick a random splitting point. Discrete feature can be picked only once in one decision path. Continuous feature can be picked multiple times. Tree depth up to the number of features. Original feature set. No bootstrap! Each tree computes probability at the leaf node. 10 fraud and 90 normal transaction, p(fraud|x) = 0.1 Multiple trees, 10 min and 30 enough, average probability.
Random Forest + Marriage between Random Decision Tree and Random Forest Pick a feature subset randomly. Compute info gain for each feature. Choose the one with highest info gain. Original dataset. Not bootstrap. Leaf node computes probability. 10 to 30 trees.