Presentation on theme: "Boosting Ashok Veeraraghavan. Boosting Methods Combine many weak classifiers to produce a committee. Resembles Bagging and other committee based methods."— Presentation transcript:
Boosting Methods Combine many weak classifiers to produce a committee. Resembles Bagging and other committee based methods. Many weak classifiers combined to produce a powerful committee.
General Idea of Boosting Weak Classifiers ( error rate slightly better than random). Sequentially apply weak classifiers to modified versions of data. Predictions of these classifiers are combined to produce a powerful classifier. Remember “Bagging”.
Preliminaries Two class classification. Output: Predictor Variables: X Weak Classifier: G(X) Err train =
Schematic of Adaboost Training Sample Weighted Sample ………….. G 1 (x) G 2 (x) G 3 (x) G M (x)
AdaBoost Initialize the observation weights w i =1/N. For m=1 to M Fit a classifier G m (x) using weights w i. Compute err training. Compute Output
Additive model Additive Model : eg., Neural Network, Wavelets, MARS etc. Optimization problem. Computationally intensive. Resort to “Forward Stagewise Additive Modeling”.
Exponential Loss and AdaBoost AdaBoost is equivalent to “Forward Stagewise additive modeling”. Loss Function: Basis Functions: Individual weak classifiers G m (x). Exponential Loss makes AdaBoost Computationally efficient.
Loss Functions and Performance Misclassification Exponential Binomial Deviance Squared Error Choice of loss function depends upon task at hand (Classification Vs Regression) and the distribution of the data.
Off-the shelf Procedures for Data-Mining Off-the shelf minimum preprocessing of data. Requirements: Qualitative understanding of relationship between predictors and output. Filter out irrelevant features. Tackle Outliers. Decision Trees
Trees Trees:Partition space into disjoint regions R j (j=1,2,…J). f(x ε R j )=ρ j Formally, Computationally intensive search.
Trees Contd. Divide minimization into two parts and iterate. Finding ρ j given R j. Finding R j : Greedy recursive partitioning. Advantage Qualitative understanding of output. Internal feature selection.
Boosting Trees Boosting done by Forward Stagewise Additive Modeling or by AdaBoost. Squared error loss Regression Tree.
Optimal Size of Trees Usually oversized Trees grown and then pruned to predetermined size. Instead grow all trees to same size. Optimal size of each tree estimated beforehand from data and target function. ANOVA( analysis of variance) expansion
Regularization Number of Boosting Iterations ??? M too large then poor generalization i.e., overfitting. Usually M selected by Validation.
Shrinkage Scale the importance each tree by 0<ε<1. Trade off between ε and M. Small ε Large M Computations proportional to M!
Applications Non intrusive powing monitoring system Tumor classification with Gene Expression data Text Classification
References www.boosting.org T.Hastie, R.Tibshirani, J.Friedman. “The Elements of Statistical Learning- Data Mining,Inference, Prediction.” Springer Verlag. R. Meir and G. Rätsch. An introduction to boosting and leveraging. In S. Mendelson and A. Smola, editors, Advanced Lectures on Machine Learning, LNCS, pages 119-184. Springer, 2003. In press. Copyright by Springer Verlag. (PDF)An introduction to boosting and leveragingPDF R.E. Schapire. A brief introduction to boosting. In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, 1999.A brief introduction to boosting Y.Freund and R.E. Schapire. A short introduction to boosting. Journal of Japanese Society for Artificial Intelligence, 14(5):771-780, September 1999. Appearing in Japanese, translation by Naoki Abe.A short introduction to boosting