Presentation is loading. Please wait.

Presentation is loading. Please wait.

BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 “BOF” Trees Diagram as a Visual Way to Improve Interpretability.

Similar presentations


Presentation on theme: "BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 “BOF” Trees Diagram as a Visual Way to Improve Interpretability."— Presentation transcript:

1 BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 “BOF” Trees Diagram as a Visual Way to Improve Interpretability of Tree Ensembles Vesna Luzar-Stiffler, Ph.D. University Computing Centre, and CAIR Research Centre, Zagreb, Croatia Charles Stiffler, Ph.D. CAIR Research Centre, Zagreb, Croatia vluzar@srce.hrvluzar@srce.hr, charles.stiffler@cair-center.hrcharles.stiffler@cair-center.hr

2 BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 Outline Introduction/Background  Trees  Ensemble Trees  Visualization Tools Simulation Results Web Survey Results Conclusions/Recommendations

3 BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 Introduction / Background Classification / Decision Trees  Data mining (statistical learning) method for classification  Invented twice:  Statistical community: Breiman: Friedman et.al. (1984)  Machine Learning community: Quinlan (1986) Many positive features  Interpretability, ability to handle data of mixed type and missing values, robustness to outliers, etc. Disadvantage  unstable vis-à-vis seemingly minor data perturbations  low predictive power

4 BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 Introduction / Background Possible improvements: Ensembles  Bagging i.e., Bootstraping trees (Breiman, 1996)  Boosting, e.g., AdaBoost (Freund & Schapire, 1997)  Random Forests (Breiman, 2001)  Stacking, randomized trees, etc. Advantage:  Improved prediction Disadvantage  Loss of interpretability (“black box”)

5 BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 Classification Tree Let be the classification tree prediction at input x obtained from the full “training” data Z= {(x 1,y 1 ),(x 2,y 2 )…(x N,y N )}

6 BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 Bagging Classification Tree Let be the classification tree prediction at input x obtained from the bootstrap sample Z* b, b=1,2,…B. Bagging estimate: 1 2 B

7 BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 Visualization tools Graphs based on predictor “importances” (Bxp) matrix F (p=# of predictors) For bagged trees, we take the avg:  Diagram 1, importance mean bar chart  Diagram 2, (“BOF Clusters”) is the cluster means chart (NEW)  Diagram 3, (“BOF MDPREF”) is the multidimensional preference bi-plot (NEW)

8 BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 Visualization tools Graphs based on proximity (nxn) matrix P, (n=# of cases)  Diagram 4 (“Proximity Clusters”) is the cluster means chart (Breiman,2002)  Diagram 5 (“Proximity MDS”) is the multidimensional scaling plot of “similar” cases (Breiman,2002)

9 BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 Simulation experiments S1: Generate a sample of size n=30, two classes, and p=5 variables (x 1 -x 5 ), with a standard normal distribution and pair-wise correlation 0.95. The responses are generated according to Pr(Y=1|x 1 ≤0.5) = 0.2, Pr(Y=1|x 1 >0.5)=0.8. S2: Generate a sample of size n=30, two classes, and p=5 variables (x 1 -x 5 ), with a standard normal distribution and pair-wise correlation 0.95 between x 1 and x 2, and 0 among other predictors. The responses are generated according to Pr(Y=1|x 1 ≤0.5) = 0.2, Pr(Y=1|x 1 >0.5)=0.8.

10 BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 Diagram 1, Mean importance S1 S2

11 BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 Diagram 2, “BOF Clusters” S1 S2

12 BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 Diagram 3, “BOF MDPREF” S1 S2

13 BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 Diagram 4, “Proximity Clusters” S1 S2

14 BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 Web Survey data ICT infrastructure/usage in Croatian primary and secondary schools 25,000+ teachers (cases) 200+ variables Response: “classroom use of a computer by educators” (yes/no) Partition  50% training  25% validation  25% test

15 BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 Initial tree (before bagging)

16 BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 Diagram 1, “Mean importance”

17 BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 Diagram 2, “BOF Clusters”

18 BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 Diagram 3, “BOF MDPREF”

19 BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 Bootstrap tree 11

20 BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 Bootstrap tree 22

21 BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 Bootstrap tree 12

22 BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 Clustering trees

23 BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 Diagram 5, “Proximity MDS”

24 BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 Conclusions/ Recommendations There are SWs for trees There are some SWs for tree ensembles There are some visualization tools (old and new) The problem is  they are not “interfaced” (integrated)


Download ppt "BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 “BOF” Trees Diagram as a Visual Way to Improve Interpretability."

Similar presentations


Ads by Google