# Introduction to Machine Learning Fall 2013 Decision Trees Koby Crammer Department of EE Technion Most figures courtesy of Ben Taskar zl.

## Presentation on theme: "Introduction to Machine Learning Fall 2013 Decision Trees Koby Crammer Department of EE Technion Most figures courtesy of Ben Taskar zl."— Presentation transcript:

Introduction to Machine Learning Fall 2013 Decision Trees Koby Crammer Department of EE Technion Most figures courtesy of Ben Taskar zl

Course outline Supervised Unsupervised

supervised Parameter Estimation Decision Tree Regression Bayesian Reasoning ClassificationBoosting Nearest Neighbor Theory Regularization Linear Mainly Generative Models Mainly Discriminative Models

Material Section 9.5.2Section 9.2

Outline Example and inference (8.1) Tree learning (8.2) Impurity (8.3) Issues (8.4) Regression (8.5)

Usage http://research.microsoft.com/pubs/145347/ CVPR%202011%20-%20Final%20Video.mp4 http://research.microsoft.com/pubs/145347/ CVPR%202011%20-%20Final%20Video.mp4 http://www.slate.com/articles/news_and_poli tics/politics/2010/08/can_rangel_hold_on.ht ml http://www.slate.com/articles/news_and_poli tics/politics/2010/08/can_rangel_hold_on.ht ml

Example and inference (8.1)

example

Example Regression (HTF, 2001)

Building decision trees (8.2) Input to algorithm Output: tree Q: can we fit a tree to any sample? Goals: – accuracy – size (simplicity, generalization)

Approach Top-down – Start from the root Greedy / myopic search – One node at a time Main question: – Given a tree, how to grow it – In other words, choose a feature and a criteria

example

Intuition A2B2A1B1 Feature a {8,12} {8,0}{0,12} Feature b {8,12} {0,0}{8,12}

Intuition II E3 C2D2C1D1 Feature c {8,12} {4,6} Feature d {8,12} {2,3}{6,9} E2 E1 Feature e {8,12} {2,3}{3,5}{3,4}

Stage 1

Stage 2

Impurity (8.3) Given a set (training set or subset of it) Denote empirical distribution of labels Goal: measure the impurity of the distribution

Impurity functions Bayes-optimal error Gini index Entropy Properties: – For point-distribution – For uniform distribution

illustration

Information of a split Pick a node, with a set S of size N Compute the impurity of the set Q(S) Pick a criteria A split the set S into M subsets The average impurity of these sets is Reduction of impurity (or increase of purity)

Algorithm Pick the test A which maximizes Q: how many values to consider? Lemma: ( see code below )

Algorithm Initialize: single leaf (what label?) Iterate: – Go over all leafs – Go over all features d – Go over all splitting values N – Pick (leaf, feature, splitting value) that reduces most impurity – Replace leaf with: new node two new leafs (their label?)

Issues (8.4) number of splits Missing features Prevent over-fitting – Early stopping – pruning Optimality vs greediness (Rivest et al, 76)

Example: xor Function: Tree with single node? Tree with two nodes labelinput 1(1,1) 1(-1,-1) (-1,1) (1,-1) X 1 >0 +1+1 X 2 >0 -1 -11-11 +1+1 yes No no

Regression (8.5) Value of leaf – Replace a single label with majority of outputs Impurity of a leaf – Replace discrete functions above with variance

Similar presentations