Optimization of Signal Significance by Bagging Decision Trees Ilya Narsky, Caltech presented by Harrison Prosper.

Slides:



Advertisements
Similar presentations
The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
Advertisements

ECG Signal processing (2)
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Classification with Multiple Decision Trees
Data Mining Classification: Alternative Techniques
Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
Concluding Talk: Physics Gary Feldman Harvard University PHYSTAT 05 University of Oxford 15 September, 2005.
Comparison of Data Mining Algorithms on Bioinformatics Dataset Melissa K. Carroll Advisor: Sung-Hyuk Cha March 4, 2003.
Kalanand Mishra BaBar Coll. Meeting Sept 26, /12 Development of New SPR-based Kaon, Pion, Proton, and Electron Selectors Kalanand Mishra University.
A Quick Overview By Munir Winkel. What do you know about: 1) decision trees 2) random forests? How could they be used?
Model Assessment, Selection and Averaging
CMPUT 466/551 Principal Source: CMU
Chapter 4: Linear Models for Classification
Longin Jan Latecki Temple University
Paper presentation for CSI5388 PENGCHENG XI Mar. 23, 2005
Searching for Single Top Using Decision Trees G. Watts (UW) For the DØ Collaboration 5/13/2005 – APSNW Particles I.
Sparse vs. Ensemble Approaches to Supervised Learning
Measurement of B + K + νν David Doll Ilya Narsky David Hitlin 1 California Institute of Technology Department of Energy Review July 24, 2007.
A Brief Introduction to Adaboost
Ensemble Learning: An Introduction
Three kinds of learning
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 6 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
End of Chapter 8 Neil Weisenfeld March 28, 2005.
Machine Learning: Ensemble Methods
Sparse vs. Ensemble Approaches to Supervised Learning
Ensemble Learning (2), Tree and Forest
1 Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data Presented by: Tun-Hsiang Yang.
Chapter 10 Boosting May 6, Outline Adaboost Ensemble point-view of Boosting Boosting Trees Supervised Learning Methods.
G. Cowan Lectures on Statistical Data Analysis Lecture 7 page 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem 2Random variables and.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines.
G. Cowan Statistical Methods in Particle Physics1 Statistical Methods in Particle Physics Day 3: Multivariate Methods (II) 清华大学高能物理研究中心 2010 年 4 月 12—16.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Comparison of Bayesian Neural Networks with TMVA classifiers Richa Sharma, Vipin Bhatnagar Panjab University, Chandigarh India-CMS March, 2009 Meeting,
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
B-tagging Performance based on Boosted Decision Trees Hai-Jun Yang University of Michigan (with X. Li and B. Zhou) ATLAS B-tagging Meeting February 9,
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Ensemble Methods: Bagging and Boosting
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
CLASSIFICATION: Ensemble Methods
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Classification Ensemble Methods 1
Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
DATA MINING LECTURE 10b Classification k-nearest neighbor classifier
Kalanand Mishra BaBar Coll. Meeting February, /8 Development of New Kaon Selectors Kalanand Mishra University of Cincinnati.
G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools 1 Statistical Analysis Tools for Particle Physics IDPASC.
B-tagging based on Boosted Decision Trees
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Supervised learning in high-throughput data  General considerations  Dimension reduction with outcome variables  Classification models.
G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 1 Statistical Data Analysis: Lecture 6 1Probability, Bayes’ theorem 2Random variables and.
1 Bios 760R, Lecture 1 Overview  Overview of the course  Classification and Clustering  The “curse of dimensionality”  Reminder of some background.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.
Ensemble Classifiers.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning: Ensemble Methods
Ch 14. Combining Models Pattern Recognition and Machine Learning, C. M
Trees, bagging, boosting, and stacking
Introduction Feature Extraction Discussions Conclusions Results
Introduction to Data Mining, 2nd Edition
Computing and Statistical Data Analysis Stat 5: Multivariate Methods
Statistical Learning Dong Liu Dept. EEIS, USTC.
Ensemble learning Reminder - Bagging of Trees Random Forest
Model generalization Brief summary of methods
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

Optimization of Signal Significance by Bagging Decision Trees Ilya Narsky, Caltech presented by Harrison Prosper

Phystat Harrison Prosper Tools for Multivariate Classification in HEP Physicists have some experience with  linear discriminant analysis (Fisher)  feedforward neural nets  radial basis function neural nets  Bayesian neural nets  decision trees  support vector machines  probability density estimation using kernels  nearest neighbor methods

Phystat Harrison Prosper Tools for Multivariate Classification in HEP New methods for combining classifiers:  developed by statisticians in the last decade and largely unexplored in HEP.  boosting (AdaBoost). Example: Particle identification at MiniBoone using boosted decision trees (B. Roe).  bagging and random forests. Example: B→  l search at BaBar (presented in this talk).

Phystat Harrison Prosper Problems with some popular methods Kernel-based methods, methods based on local averaging, nearest neighbors  Suffer from the “curse of dimensionality” (data are sparse in many dimensions):  Kernel methods are very sensitive to the choice of the distance metric.  “Local” methods can become non-local. Feedforward neural nets  Slow to train – CPU time scales as O(D 2 ) if the number of hidden units is proportional to the number of input units D.  Strongly correlated inputs can be a problem.  Continuous and discrete inputs can be difficult to handle.

Phystat Harrison Prosper Boosted and bagged decision trees Why are they better?  Decision trees are robust in many dimensions (D). CPU time generally scales as O(D).  A decision tree by itself is not powerful; need to boost or bag.  With boosting or bagging one obtains what some regard as the best off-the-shelf method for classification in many dimensions.

Phystat Harrison Prosper StatPatternRecognition: A C++ package for multivariate classification Implemented classifiers and algorithms:  binary split  linear and quadratic discriminant analysis  decision trees  bump hunting algorithm (PRIM, Friedman & Fisher)  AdaBoost  bagging and random forest algorithms  AdaBoost and Bagger are capable of boosting/bagging any classifier implemented in the package Described in: I. Narsky, physics/ and physics/

Phystat Harrison Prosper StatPatternRecognition C++ framework:  tools for imposing selection criteria and histogramming output  generic-purpose tools: bootstrap, estimation of data moments and correlation test  OO design – the package can be extended by supplying new implementations to existing interfaces Can be installed outside BaBar without too much effort. More details in the Poster session!

Decision trees in StatPatternRecognition S, B S 1, B 1 S 2, B 2 S = S 1 +S 2 B = B 1 +B 2 Conventional decision tree, e.g., CART: - Each split minimizes Gini index: - The tree spends 50% of time finding clean signal nodes and 50% of time finding clean background nodes. Decision tree in StatPatternRecognition: - Each split maximizes signal significance: StatPatternRecognition allows the user to supply an arbitrary criterion for tree optimization by providing an implementation to the abstract C++ interface. At the moment, following criteria are implemented: Conventional: Gini index, cross- entropy, and correctly classified fraction of events. Physics: signal purity, S/√(S+B), 90% Bayesian UL, and 2*(√(S+B)- √B).

Phystat Harrison Prosper Boost or bag? Boosting  increase weights of misclassified events and reduce weights of correctly classified events  train a new classifier on the re-weighted sample  repeat => apply many classifiers sequentially  classify new data by a weighted vote of the classifiers Bagging (Bootstrap AGGregatING)  draw a bootstrap replica of the training sample  random forest: at each split, select a random sub-set of variables  train a new classifier on this replica  parallel algorithm => classifiers built on bootstrap replicas  classify new data by the majority vote of the classifiers

Phystat Harrison Prosper Boost or bag? Boosting  Boosting generally gives better predictive power but bagging tends to perform better in the presence of outliers and noise Bauer and Kohavi, “An empirical comparison of voting classification algorithms”, Machine Learning 36 (1999) An interesting possibility for physics analysis:  bag decision trees optimizing a figure of merit suited for physics analysis

Phystat Harrison Prosper Application to Search for B→  l at BaBar 11-D data: 10 continuous and 1 discrete variable. Some variables are strongly correlated: ρ(Fisher,acthrust)=0.8 and ρ(costheblg,nuEP)= Signal significance S/√(S+B) is optimized assuming: Br( B→  l  ) = 3×10 -6 in both electron and muon modes Monte Carlo samples for each mode: - training = 500K - validation = 250K - test = 250K

Phystat Harrison Prosper Results Signal significances obtained with different classification methods Training 50 boosted decision trees or 100 bagged decision trees took several hours in a batch queue at SLAC. W 1 and W 0 are expected numbers of signal and background events in 210 fb -1 of data. The best signal significance is obtained by bagging decision trees. Bagged decision trees give a 14% improvement over boosted decision trees: 8% due to using bagging instead of boosting 9% due to optimizing S/√(S+B) instead of Gini index

Phystat Harrison Prosper Bagging 100 decision trees optimizing the signal significance Boosting 50 decision trees optimizing the Gini index

Phystat Harrison Prosper Summary Bagging decision trees that optimize a figure of merit suited for physics analysis is an interesting option. There is at least one example where this method is superior to all others. StatPatternRecognition is available for distribution to physicists. to to request a copy.