Asymmetric Gradient Boosting with Application to Spam Filtering

Asymmetric Gradient Boosting with Application to Spam Filtering
Jingrui He Bo Thiesson Carnegie Mellon University Microsoft Research

Roadmap Background MarginBoost Framework
Boosting with Different Costs (BDC) Cost Functions BDC in the Low False Positive Region Parameter Study Experimental Results Conclusion

Background Classification Boosting Symmetric Loss Function
Neural networks, Support Vector Machines Ensemble classifier Boosting Symmetric Loss Function The same cost for misclassified instances from different classes Weak learner Training data Reweight

Boosting with Different Costs to the rescue!
Spam Filtering Classification task Logistic regression, AdaBoost, SVMs, Naïve Bayes, Decision Trees, Neural Networks, etc Misclassification of good s: false positives False positives more expensive than false negatives Stratification Spam s de-emphasized in the same way Unable to differentiate between the noisy and characteristic spam s Boosting with Different Costs to the rescue!

MarginBoost Framework Mason et al., 1999
Training set: Strong classifier: voted combination of weak learners Loss functional Weak learner: Classification result Weight: Margin Correct prediction: + Incorrect prediction: - Sample average of the cost function Cost function:

MarginBoost Framework cont. Mason et al., 1999
To minimize the loss functional NO traditional parameter optimization Gradient descent in the function space In iteration t with classifier , find the direction s.t decreases most rapidly. Negative functional derivative of S at Indicator function at Derivative of C with respect to the margin

MarginBoost Framework cont. Mason et al., 1999
If comes from some fixed parameterized class, it should maximize maximizes the weighted margins for all the data points, where weight Coefficient for Line search; more sophisticated method Stopping criterion Maximum number of iteration reached

MarginBoost Specialization
Cost function + Cost Function Differentiable Monotonically decreasing AdaBoost LogitBoost Logistic regression

MarginBoost Specialization cont.
Cost Function Weak Learner: Decision stumps is the most discriminating feature in that iteration Strong Classifier: Output: Upon convergence: logistic regression Stop earlier: feature selection

Boosting with Different Costs
Advantages Weights of mislabeled spam Regular boosting BDC Weights of mislabeled ham Regular Boosting Larger and larger weights as more weak learners are combined Large weights for moderately misclassified spam Small weights for extremely misclassified spam Always High

Boosting with Different Costs cont.
Cost Function Ham: Spam: Weight for Training Instances

BDC at Low False Positive Region
Linear threshold Noisy spam message After one iteration in regular boosting After one iteration in BDC 4 3 High false positive region Low false positive region

Parameter Study in BDC Noisy Data Sets
: the maximum cost for spam (stratification) : the slope of the cost around Noisy Data Sets Noise probability 0.03 Noise probability 0.05 Noise probability 0.1

Parameter Study in BDC cont.
The effect of with FN at FP 0.03 FN at FP 0.05 FN at FP 0.1 FN at FP 0.03 FN at FP 0.05 FN at FP 0.1

Experimental Results Data Methods for Comparison
Hotmail Feedback Loop data Training set: 200,000 messages received between July 1st, 2005 and August 9th, 2005 Test set: 60,000 messages received between December 1st, 2005 and December 6th, 2005 Methods for Comparison Logistic regression, regularized logistic regression LogitBoost LogitBoost and logistic regression with stratification

Experimental Results cont.
Weak learner: decision stumps Weak learner: decision trees of depth 2

Conclusion MarginBoost in Email Spam Filtering
Logistic regression as a special instance Smart feature selection in logistic regression BDC: Asymmetric Boosting Method Different cost functions for ham and spam Misclassified ham always have large weight Moderately misclassified spam have large weight Extremely misclassified spam have small weight Able to improve the false negative rates at the low false positive region

Thank you! www.cs.cmu.edu/~jingruih
Q&A Thank you!

Asymmetric Gradient Boosting with Application to Spam Filtering

Similar presentations

Presentation on theme: "Asymmetric Gradient Boosting with Application to Spam Filtering"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Asymmetric Gradient Boosting with Application to Spam Filtering

Similar presentations

Presentation on theme: "Asymmetric Gradient Boosting with Application to Spam Filtering"— Presentation transcript:

Similar presentations

About project

Feedback