Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Machine Learning Fall 2013 Decision Trees Koby Crammer Department of EE Technion Most figures courtesy of Ben Taskar zl.

Similar presentations


Presentation on theme: "Introduction to Machine Learning Fall 2013 Decision Trees Koby Crammer Department of EE Technion Most figures courtesy of Ben Taskar zl."— Presentation transcript:

1 Introduction to Machine Learning Fall 2013 Decision Trees Koby Crammer Department of EE Technion Most figures courtesy of Ben Taskar zl

2 Course outline Supervised Unsupervised

3 supervised Parameter Estimation Decision Tree Regression Bayesian Reasoning ClassificationBoosting Nearest Neighbor Theory Regularization Linear Mainly Generative Models Mainly Discriminative Models

4 Material Section 9.5.2Section 9.2

5 Outline Example and inference (8.1) Tree learning (8.2) Impurity (8.3) Issues (8.4) Regression (8.5)

6 Usage http://research.microsoft.com/pubs/145347/ CVPR%202011%20-%20Final%20Video.mp4 http://research.microsoft.com/pubs/145347/ CVPR%202011%20-%20Final%20Video.mp4 http://www.slate.com/articles/news_and_poli tics/politics/2010/08/can_rangel_hold_on.ht ml http://www.slate.com/articles/news_and_poli tics/politics/2010/08/can_rangel_hold_on.ht ml

7 Example and inference (8.1)

8 example

9 Example Regression (HTF, 2001)

10 Building decision trees (8.2) Input to algorithm Output: tree Q: can we fit a tree to any sample? Goals: – accuracy – size (simplicity, generalization)

11 Approach Top-down – Start from the root Greedy / myopic search – One node at a time Main question: – Given a tree, how to grow it – In other words, choose a feature and a criteria

12 example

13 Intuition A2B2A1B1 Feature a {8,12} {8,0}{0,12} Feature b {8,12} {0,0}{8,12}

14 Intuition II E3 C2D2C1D1 Feature c {8,12} {4,6} Feature d {8,12} {2,3}{6,9} E2 E1 Feature e {8,12} {2,3}{3,5}{3,4}

15 mpgcylindersdisplacementhorsepowerweightaccelerationmodelyearmaker Bad8350150469914.574America Bad840017047461271America Bad840017543851272America Bad625072315819.575America Bad8304150389212.572America Bad835014544401475America Bad6250105389718.575America Bad6163133341015.878Asia Bad826011040601977America Bad8305130384015.479America Bad6250110352016.477America Bad625895319317.876America Bad4121112293314.572Asia Bad6225105361316.574America Bad4121112286815.573Asia Bad62259532641675America Bad620085299018.279America OK412198294514.575Asia OK623290308517.676America OK412097250614.572Europe OK415185285517.678America OK411675215815.573Asia OK41199725451775Europe OK6146120293013.881Europe OK411681222016.976Asia OK415692262014.481America OK414088287018.180America OK4976018341971Asia OK413495256014.278Europe OK4977521711675Europe OK49778194014.577Asia OK49883221916.574Asia Good47970207419.571Asia Good49168197017.682Europe Good4897119251479Asia Good4836120031974Europe Good41128823951882America Good48160176016.181Europe Good41358423701382America Good410563212514.782America Bad41358423701382America Bad410563212514.782America

16 Stage 1

17 mpgcylindersdisplacementhorsepowerweightaccelerationmodelyearmaker Bad8350150469914.574America Bad840017047461271America Bad840017543851272America Bad625072315819.575America Bad8304150389212.572America Bad835014544401475America Bad6250105389718.575America Bad6163133341015.878Asia Bad826011040601977America Bad8305130384015.479America Bad6250110352016.477America Bad625895319317.876America Bad4121112293314.572Asia Bad6225105361316.574America Bad4121112286815.573Asia Bad62259532641675America Bad620085299018.279America OK412198294514.575Asia OK623290308517.676America OK412097250614.572Europe OK415185285517.678America OK411675215815.573Asia OK41199725451775Europe OK6146120293013.881Europe OK411681222016.976Asia OK415692262014.481America OK414088287018.180America OK4976018341971Asia OK413495256014.278Europe OK4977521711675Europe OK49778194014.577Asia OK49883221916.574Asia Good47970207419.571Asia Good49168197017.682Europe Good4897119251479Asia Good4836120031974Europe Good41128823951882America Good48160176016.181Europe Good41358423701382America Good410563212514.782America Bad41358423701382America Bad410563212514.782America

18 Stage 2

19 mpgcylindersdisplacementhorsepowerweightaccelerationmodelyearmaker Bad8350150469914.574America Bad840017047461271America Bad840017543851272America Bad625072315819.575America Bad8304150389212.572America Bad835014544401475America Bad6250105389718.575America Bad6163133341015.878Asia Bad826011040601977America Bad8305130384015.479America Bad6250110352016.477America Bad625895319317.876America Bad4121112293314.572Asia Bad6225105361316.574America Bad4121112286815.573Asia Bad62259532641675America Bad620085299018.279America OK412198294514.575Asia OK623290308517.676America OK412097250614.572Europe OK415185285517.678America OK411675215815.573Asia OK41199725451775Europe OK6146120293013.881Europe OK411681222016.976Asia OK415692262014.481America OK414088287018.180America OK4976018341971Asia OK413495256014.278Europe OK4977521711675Europe OK49778194014.577Asia OK49883221916.574Asia Good47970207419.571Asia Good49168197017.682Europe Good4897119251479Asia Good4836120031974Europe Good41128823951882America Good48160176016.181Europe Good41358423701382America Good410563212514.782America Bad41358423701382America Bad410563212514.782America

20

21 mpgcylindersdisplacementhorsepowerweightaccelerationmodelyearmaker Bad8350150469914.574America Bad840017047461271America Bad840017543851272America Bad625072315819.575America Bad8304150389212.572America Bad835014544401475America Bad6250105389718.575America Bad6163133341015.878Asia Bad826011040601977America Bad8305130384015.479America Bad6250110352016.477America Bad625895319317.876America Bad4121112293314.572Asia Bad6225105361316.574America Bad4121112286815.573Asia Bad62259532641675America Bad620085299018.279America OK412198294514.575Asia OK623290308517.676America OK412097250614.572Europe OK415185285517.678America OK411675215815.573Asia OK41199725451775Europe OK6146120293013.881Europe OK411681222016.976Asia OK415692262014.481America OK414088287018.180America OK4976018341971Asia OK413495256014.278Europe OK4977521711675Europe OK49778194014.577Asia OK49883221916.574Asia Good47970207419.571Asia Good49168197017.682Europe Good4897119251479Asia Good4836120031974Europe Good41128823951882America Good48160176016.181Europe Good41358423701382America Good410563212514.782America Bad41358423701382America Bad410563212514.782America

22

23 mpgcylindersdisplacementhorsepowerweightaccelerationmodelyearmaker Bad8350150469914.574America Bad840017047461271America Bad840017543851272America Bad625072315819.575America Bad8304150389212.572America Bad835014544401475America Bad6250105389718.575America Bad6163133341015.878Asia Bad826011040601977America Bad8305130384015.479America Bad6250110352016.477America Bad625895319317.876America Bad4121112293314.572Asia Bad6225105361316.574America Bad4121112286815.573Asia Bad62259532641675America Bad620085299018.279America OK412198294514.575Asia OK623290308517.676America OK412097250614.572Europe OK415185285517.678America OK411675215815.573Asia OK41199725451775Europe OK6146120293013.881Europe OK411681222016.976Asia OK415692262014.481America OK414088287018.180America OK4976018341971Asia OK413495256014.278Europe OK4977521711675Europe OK49778194014.577Asia OK49883221916.574Asia Good47970207419.571Asia Good49168197017.682Europe Good4897119251479Asia Good4836120031974Europe Good41128823951882America Good48160176016.181Europe Good41358423701382America Good410563212514.782America Bad41358423701382America Bad410563212514.782America

24

25 mpgcylindersdisplacementhorsepowerweightaccelerationmodelyearmaker Bad8350150469914.574America Bad840017047461271America Bad840017543851272America Bad625072315819.575America Bad8304150389212.572America Bad835014544401475America Bad6250105389718.575America Bad6163133341015.878Asia Bad826011040601977America Bad8305130384015.479America Bad6250110352016.477America Bad625895319317.876America Bad4121112293314.572Asia Bad6225105361316.574America Bad4121112286815.573Asia Bad62259532641675America Bad620085299018.279America OK412198294514.575Asia OK623290308517.676America OK412097250614.572Europe OK415185285517.678America OK411675215815.573Asia OK41199725451775Europe OK6146120293013.881Europe OK411681222016.976Asia OK415692262014.481America OK414088287018.180America OK4976018341971Asia OK413495256014.278Europe OK4977521711675Europe OK49778194014.577Asia OK49883221916.574Asia Good47970207419.571Asia Good49168197017.682Europe Good4897119251479Asia Good4836120031974Europe Good41128823951882America Good48160176016.181Europe Good41358423701382America Good410563212514.782America Bad41358423701382America Bad410563212514.782America

26 Impurity (8.3) Given a set (training set or subset of it) Denote empirical distribution of labels Goal: measure the impurity of the distribution

27 Impurity functions Bayes-optimal error Gini index Entropy Properties: – For point-distribution – For uniform distribution

28 illustration

29 Information of a split Pick a node, with a set S of size N Compute the impurity of the set Q(S) Pick a criteria A split the set S into M subsets The average impurity of these sets is Reduction of impurity (or increase of purity)

30 Algorithm Pick the test A which maximizes Q: how many values to consider? Lemma: ( see code below )

31 Algorithm Initialize: single leaf (what label?) Iterate: – Go over all leafs – Go over all features d – Go over all splitting values N – Pick (leaf, feature, splitting value) that reduces most impurity – Replace leaf with: new node two new leafs (their label?)

32 Issues (8.4) number of splits Missing features Prevent over-fitting – Early stopping – pruning Optimality vs greediness (Rivest et al, 76)

33 Example: xor Function: Tree with single node? Tree with two nodes labelinput 1(1,1) 1(-1,-1) (-1,1) (1,-1) X 1 >0 +1+1 X 2 >0 -1 -11-11 +1+1 yes No no

34 Regression (8.5) Value of leaf – Replace a single label with majority of outputs Impurity of a leaf – Replace discrete functions above with variance


Download ppt "Introduction to Machine Learning Fall 2013 Decision Trees Koby Crammer Department of EE Technion Most figures courtesy of Ben Taskar zl."

Similar presentations


Ads by Google