BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 “BOF” Trees Diagram as a Visual Way to Improve Interpretability.

Slides:



Advertisements
Similar presentations
Ensemble Learning – Bagging, Boosting, and Stacking, and other topics
Advertisements

Random Forest Predrag Radenković 3237/10
Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.
Boosting Ashok Veeraraghavan. Boosting Methods Combine many weak classifiers to produce a committee. Resembles Bagging and other committee based methods.
The controlled assessment is worth 25% of the GCSE The project has three stages; 1. Planning 2. Collecting, processing and representing data 3. Interpreting.
Paper presentation for CSI5388 PENGCHENG XI Mar. 23, 2005
Introduction to Boosting Slides Adapted from Che Wanxiang( 车 万翔 ) at HIT, and Robin Dhamankar of Many thanks!
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Sparse vs. Ensemble Approaches to Supervised Learning
About ISoft … What is Decision Tree? Alice Process … Conclusions Outline.
A Brief Introduction to Adaboost
Ensemble Learning: An Introduction
Optimization of Signal Significance by Bagging Decision Trees Ilya Narsky, Caltech presented by Harrison Prosper.
Bagging and Boosting in Data Mining Carolina Ruiz
Adaboost and its application
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
Data mining and statistical learning - lecture 13 Separating hyperplane.
Bagging LING 572 Fei Xia 1/24/06. Ensemble methods So far, we have covered several learning methods: FSA, HMM, DT, DL, TBL. Question: how to improve results?
Machine Learning: Ensemble Methods
Sparse vs. Ensemble Approaches to Supervised Learning
Ensemble Learning (2), Tree and Forest
1 Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data Presented by: Tun-Hsiang Yang.
Machine Learning CS 165B Spring 2012
Data Mining Chun-Hung Chou
1 Putting Croatia on the Map: 2001 Census Data Made Available on the Web in Graphic Form Vesna Lužar-Stiffler, Ph.D. University Computing Centre and CAIR.
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
Chapter 10 Boosting May 6, Outline Adaboost Ensemble point-view of Boosting Boosting Trees Supervised Learning Methods.
Data mining and machine learning A brief introduction.
Use of web scraping and text mining techniques in the Istat survey on “Information and Communication Technology in enterprises” Giulio Barcaroli(*), Alessandra.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Application of Data Mining Algorithms in Atmospheric Neutrino Analyses with IceCube Tim Ruhe, TU Dortmund.
Data Mining - Volinsky Columbia University 1 Topic 10 - Ensemble Methods.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Ensemble Methods: Bagging and Boosting
CLASSIFICATION: Ensemble Methods
Ensemble with Neighbor Rules Voting Itt Romneeyangkurn, Sukree Sinthupinyo Faculty of Computer Science Thammasat University.
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte.
Konstantina Christakopoulou Liang Zeng Group G21
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Classification Ensemble Methods 1
Data Analytics CMIS Short Course part II Day 1 Part 3: Ensembles Sam Buttrey December 2015.
Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Finding τ → μ−μ−μ+ Decays at LHCb with Data Mining Algorithms
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.
… Algo 1 Algo 2 Algo 3 Algo N Meta-Learning Algo.
Regression Tree Ensembles Sergey Bakin. Problem Formulation §Training data set of N data points (x i,y i ), 1,…,N. §x are predictor variables (P-dimensional.
Machine Learning Recitation 8 Oct 21, 2009 Oznur Tastan.
1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.
Adaboost (Adaptive boosting) Jo Yeong-Jun Schapire, Robert E., and Yoram Singer. "Improved boosting algorithms using confidence- rated predictions."
Combining Bagging and Random Subspaces to Create Better Ensembles
Machine Learning with Spark MLlib
JMP Discovery Summit 2016 Janet Alvarado
Week 2 Presentation: Project 3
Chapter 13 – Ensembles and Uplift
Boosting and Additive Trees (2)
Lecture 17. Boosting¶ CS 109A/AC 209A/STAT 121A Data Science: Harvard University Fall 2016 Instructors: P. Protopapas, K. Rader, W. Pan.
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
Saisai Gong, Wei Hu, Yuzhong Qu
Supervised Time Series Pattern Discovery through Local Importance
COMP61011 : Machine Learning Ensemble Models
Ensemble Learning Introduction to Machine Learning and Data Mining, Carla Brodley.
ECE 5424: Introduction to Machine Learning
Combining Base Learners
Data Mining Practical Machine Learning Tools and Techniques
Multiple Decision Trees ISQS7342
Ensemble learning Reminder - Bagging of Trees Random Forest
INTRODUCTION TO Machine Learning 3rd Edition
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 “BOF” Trees Diagram as a Visual Way to Improve Interpretability of Tree Ensembles Vesna Luzar-Stiffler, Ph.D. University Computing Centre, and CAIR Research Centre, Zagreb, Croatia Charles Stiffler, Ph.D. CAIR Research Centre, Zagreb, Croatia

BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 Outline Introduction/Background  Trees  Ensemble Trees  Visualization Tools Simulation Results Web Survey Results Conclusions/Recommendations

BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 Introduction / Background Classification / Decision Trees  Data mining (statistical learning) method for classification  Invented twice:  Statistical community: Breiman: Friedman et.al. (1984)  Machine Learning community: Quinlan (1986) Many positive features  Interpretability, ability to handle data of mixed type and missing values, robustness to outliers, etc. Disadvantage  unstable vis-à-vis seemingly minor data perturbations  low predictive power

BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 Introduction / Background Possible improvements: Ensembles  Bagging i.e., Bootstraping trees (Breiman, 1996)  Boosting, e.g., AdaBoost (Freund & Schapire, 1997)  Random Forests (Breiman, 2001)  Stacking, randomized trees, etc. Advantage:  Improved prediction Disadvantage  Loss of interpretability (“black box”)

BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 Classification Tree Let be the classification tree prediction at input x obtained from the full “training” data Z= {(x 1,y 1 ),(x 2,y 2 )…(x N,y N )}

BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 Bagging Classification Tree Let be the classification tree prediction at input x obtained from the bootstrap sample Z* b, b=1,2,…B. Bagging estimate: 1 2 B

BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 Visualization tools Graphs based on predictor “importances” (Bxp) matrix F (p=# of predictors) For bagged trees, we take the avg:  Diagram 1, importance mean bar chart  Diagram 2, (“BOF Clusters”) is the cluster means chart (NEW)  Diagram 3, (“BOF MDPREF”) is the multidimensional preference bi-plot (NEW)

BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 Visualization tools Graphs based on proximity (nxn) matrix P, (n=# of cases)  Diagram 4 (“Proximity Clusters”) is the cluster means chart (Breiman,2002)  Diagram 5 (“Proximity MDS”) is the multidimensional scaling plot of “similar” cases (Breiman,2002)

BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 Simulation experiments S1: Generate a sample of size n=30, two classes, and p=5 variables (x 1 -x 5 ), with a standard normal distribution and pair-wise correlation The responses are generated according to Pr(Y=1|x 1 ≤0.5) = 0.2, Pr(Y=1|x 1 >0.5)=0.8. S2: Generate a sample of size n=30, two classes, and p=5 variables (x 1 -x 5 ), with a standard normal distribution and pair-wise correlation 0.95 between x 1 and x 2, and 0 among other predictors. The responses are generated according to Pr(Y=1|x 1 ≤0.5) = 0.2, Pr(Y=1|x 1 >0.5)=0.8.

BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 Diagram 1, Mean importance S1 S2

BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 Diagram 2, “BOF Clusters” S1 S2

BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 Diagram 3, “BOF MDPREF” S1 S2

BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 Diagram 4, “Proximity Clusters” S1 S2

BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 Web Survey data ICT infrastructure/usage in Croatian primary and secondary schools 25,000+ teachers (cases) 200+ variables Response: “classroom use of a computer by educators” (yes/no) Partition  50% training  25% validation  25% test

BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 Initial tree (before bagging)

BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 Diagram 1, “Mean importance”

BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 Diagram 2, “BOF Clusters”

BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 Diagram 3, “BOF MDPREF”

BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 Bootstrap tree 11

BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 Bootstrap tree 22

BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 Bootstrap tree 12

BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 Clustering trees

BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 Diagram 5, “Proximity MDS”

BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 Conclusions/ Recommendations There are SWs for trees There are some SWs for tree ensembles There are some visualization tools (old and new) The problem is  they are not “interfaced” (integrated)