Asymmetric Gradient Boosting with Application to Spam Filtering

Slides:



Advertisements
Similar presentations
Lectures 17,18 – Boosting and Additive Trees Rice ECE697 Farinaz Koushanfar Fall 2006.
Advertisements

EE462 MLCV Lecture 5-6 Object Detection – Boosting Tae-Kyun Kim.
EE462 MLCV Lecture 5-6 Object Detection – Boosting Tae-Kyun Kim.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
CMPUT 466/551 Principal Source: CMU
Longin Jan Latecki Temple University
A Survey on Text Categorization with Machine Learning Chikayama lab. Dai Saito.
Cos 429: Face Detection (Part 2) Viola-Jones and AdaBoost Guest Instructor: Andras Ferencz (Your Regular Instructor: Fei-Fei Li) Thanks to Fei-Fei Li,
Sparse vs. Ensemble Approaches to Supervised Learning
Logistic Regression Rong Jin. Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c.
Generic Object Detection using Feature Maps Oscar Danielsson Stefan Carlsson
A Brief Introduction to Adaboost
Ensemble Learning: An Introduction
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
Data mining and statistical learning - lecture 13 Separating hyperplane.
Sparse vs. Ensemble Approaches to Supervised Learning
Boosting for tumor classification
Learning at Low False Positive Rate Scott Wen-tau Yih Joshua Goodman Learning for Messaging and Adversarial Problems Microsoft Research Geoff Hulten Microsoft.
Machine Learning CS 165B Spring 2012
Predicting Income from Census Data using Multiple Classifiers Presented By: Arghya Kusum Das Arnab Ganguly Manohar Karki Saikat Basu Subhajit Sidhanta.
Chapter 10 Boosting May 6, Outline Adaboost Ensemble point-view of Boosting Boosting Trees Supervised Learning Methods.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
CSE 446 Perceptron Learning Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.
Benk Erika Kelemen Zsolt
Boosting of classifiers Ata Kaban. Motivation & beginnings Suppose we have a learning algorithm that is guaranteed with high probability to be slightly.
BOOSTING David Kauchak CS451 – Fall Admin Final project.
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Ensemble Methods: Bagging and Boosting
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
An Ensemble of Three Classifiers for KDD Cup 2009: Expanded Linear Model, Heterogeneous Boosting, and Selective Naive Bayes Members: Hung-Yi Lo, Kai-Wei.
Ensemble Learning (1) Boosting Adaboost Boosting is an additive model
Linear Discrimination Reading: Chapter 2 of textbook.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Non-Bayes classifiers. Linear discriminants, neural networks.
Linear Models for Classification
Ensemble Methods.  “No free lunch theorem” Wolpert and Macready 1995.
1 CHUKWUEMEKA DURUAMAKU.  Machine learning, a branch of artificial intelligence, concerns the construction and study of systems that can learn from data.
Ensemble Methods in Machine Learning
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16.
Classification Ensemble Methods 1
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
NTU & MSRA Ming-Feng Tsai
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
 Effective Multi-Label Active Learning for Text Classification Bishan yang, Juan-Tao Sun, Tengjiao Wang, Zheng Chen KDD’ 09 Supervisor: Koh Jia-Ling Presenter:
Boosting and Additive Trees (Part 1) Ch. 10 Presented by Tal Blum.
Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.
Cost-Sensitive Boosting algorithms: Do we really need them?
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning – Classification David Fenyő
Bagging and Random Forests
Deep Feedforward Networks
Can-CSC-GBE: Developing Cost-sensitive Classifier with Gentleboost Ensemble for breast cancer classification using protein amino acids and imbalanced data.
Boosting and Additive Trees (2)
Trees, bagging, boosting, and stacking
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
Boosting and Additive Trees
10701 / Machine Learning.
COMP61011 : Machine Learning Ensemble Models
Basic machine learning background with Python scikit-learn
ECE 5424: Introduction to Machine Learning
Cos 429: Face Detection (Part 2) Viola-Jones and AdaBoost Guest Instructor: Andras Ferencz (Your Regular Instructor: Fei-Fei Li) Thanks to Fei-Fei.
CSCI B609: “Foundations of Data Science”
Boosting For Tumor Classification With Gene Expression Data
Introduction to Boosting
Overfitting and Underfitting
Ensemble learning.
Model Combination.
Derek Hoiem CS 598, Spring 2009 Jan 27, 2009
Presentation transcript:

Asymmetric Gradient Boosting with Application to Spam Filtering Jingrui He Bo Thiesson Carnegie Mellon University Microsoft Research

Roadmap Background MarginBoost Framework Boosting with Different Costs (BDC) Cost Functions BDC in the Low False Positive Region Parameter Study Experimental Results Conclusion

Background Classification Boosting Symmetric Loss Function Neural networks, Support Vector Machines Ensemble classifier Boosting Symmetric Loss Function The same cost for misclassified instances from different classes Weak learner Training data Reweight

Boosting with Different Costs to the rescue! Email Spam Filtering Classification task Logistic regression, AdaBoost, SVMs, Naïve Bayes, Decision Trees, Neural Networks, etc Misclassification of good emails: false positives False positives more expensive than false negatives Stratification Spam emails de-emphasized in the same way Unable to differentiate between the noisy and characteristic spam emails Boosting with Different Costs to the rescue!

MarginBoost Framework Mason et al., 1999 Training set: Strong classifier: voted combination of weak learners Loss functional Weak learner: Classification result Weight: Margin Correct prediction: + Incorrect prediction: - Sample average of the cost function Cost function:

MarginBoost Framework cont. Mason et al., 1999 To minimize the loss functional NO traditional parameter optimization Gradient descent in the function space In iteration t with classifier , find the direction s.t. decreases most rapidly. Negative functional derivative of S at Indicator function at Derivative of C with respect to the margin

MarginBoost Framework cont. Mason et al., 1999 If comes from some fixed parameterized class, it should maximize maximizes the weighted margins for all the data points, where weight Coefficient for Line search; more sophisticated method Stopping criterion Maximum number of iteration reached

MarginBoost Specialization Cost function + Cost Function Differentiable Monotonically decreasing AdaBoost LogitBoost Logistic regression

MarginBoost Specialization cont. Cost Function Weak Learner: Decision stumps is the most discriminating feature in that iteration Strong Classifier: Output: Upon convergence: logistic regression Stop earlier: feature selection

Boosting with Different Costs Advantages Weights of mislabeled spam Regular boosting BDC Weights of mislabeled ham Regular Boosting Larger and larger weights as more weak learners are combined Large weights for moderately misclassified spam Small weights for extremely misclassified spam Always High

Boosting with Different Costs cont. Cost Function Ham: Spam: Weight for Training Instances

BDC at Low False Positive Region Linear threshold Noisy spam message After one iteration in regular boosting After one iteration in BDC 4 3 High false positive region Low false positive region

Parameter Study in BDC Noisy Data Sets : the maximum cost for spam (stratification) : the slope of the cost around Noisy Data Sets Noise probability 0.03 Noise probability 0.05 Noise probability 0.1

Parameter Study in BDC cont. The effect of with FN at FP 0.03 FN at FP 0.05 FN at FP 0.1 FN at FP 0.03 FN at FP 0.05 FN at FP 0.1

Experimental Results Data Methods for Comparison Hotmail Feedback Loop data Training set: 200,000 messages received between July 1st, 2005 and August 9th, 2005 Test set: 60,000 messages received between December 1st, 2005 and December 6th, 2005 Methods for Comparison Logistic regression, regularized logistic regression LogitBoost LogitBoost and logistic regression with stratification

Experimental Results cont. Weak learner: decision stumps Weak learner: decision trees of depth 2

Conclusion MarginBoost in Email Spam Filtering Logistic regression as a special instance Smart feature selection in logistic regression BDC: Asymmetric Boosting Method Different cost functions for ham and spam Misclassified ham always have large weight Moderately misclassified spam have large weight Extremely misclassified spam have small weight Able to improve the false negative rates at the low false positive region

Thank you! www.cs.cmu.edu/~jingruih Q&A Thank you! www.cs.cmu.edu/~jingruih