Download presentation
Presentation is loading. Please wait.
Published byAlexander Nelson Modified over 9 years ago
1
Machine Learning Lecture 10 Decision Trees G53MLE Machine Learning Dr Guoping Qiu1
2
Trees Node Root Leaf Branch Path Depth 2
3
Decision Trees A hierarchical data structure that represents data by implementing a divide and conquer strategy Can be used as a non-parametric classification method. Given a collection of examples, learn a decision tree that represents it. Use this representation to classify new examples 3
4
Decision Trees Each node is associated with a feature (one of the elements of a feature vector that represent an object); Each node test the value of its associated feature; There is one branch for each value of the feature Leaves specify the categories (classes) Can categorize instances into multiple disjoint categories – multi-class 4 Yes Humidity No Yes Wind No Yes Outlook RainSunnyOvercast High WeakStrongNormal
5
Decision Trees Play Tennis Example Feature Vector = (Outlook, Temperature, Humidity, Wind) 5 Yes Humidity No Yes Wind No Yes Outlook RainSunnyOvercast High WeakStrongNormal
6
Decision Trees 6 Yes Humidity No Yes Wind No Yes Outlook RainSunnyOvercast High WeakStrongNormal Node associated with a feature
7
Decision Trees Play Tennis Example Feature values: Outlook = (sunny, overcast, rain) Temperature =(hot, mild, cool) Humidity = (high, normal) Wind =(strong, weak) 7
8
Decision Trees Outlook = (sunny, overcast, rain) 8 Yes Humidity No Yes Wind No Yes Outlook RainSunnyOvercast High WeakStrongNormal One branch for each value
9
Decision Trees Class = (Yes, No) 9 Yes Humidity No Yes Wind No Yes Outlook RainSunnyOvercast High WeakStrongNormal Leaf nodes specify classes
10
Decision Trees Design Decision Tree Classifier Picking the root node Recursively branching 10
11
Decision Trees Picking the root node Consider data with two Boolean attributes (A,B) and two classes + and – { (A=0,B=0), - }: 50 examples { (A=0,B=1), - }: 50 examples { (A=1,B=0), - }: 3 examples { (A=1,B=1), + }: 100 examples 11
12
Decision Trees Picking the root node Trees looks structurally similar; which attribute should we choose? 12 B - - 01 A + + - - 01 A - - 01 B + + - - 01
13
Decision Trees Picking the root node The goal is to have the resulting decision tree as small as possible (Occam’s Razor) The main decision in the algorithm is the selection of the next attribute to condition on (start from the root node). We want attributes that split the examples to sets that are relatively pure in one label; this way we are closer to a leaf node. The most popular heuristics is based on information gain, originated with the ID3 system of Quinlan. 13
14
Entropy S is a sample of training examples p + is the proportion of positive examples in S p - is the proportion of negative examples in S Entropy measures the impurity of S 14 p+p+
15
15 + - - + + + - - + - + - + + - - + + + - - + - + - - + - - + - + - - + - + - + + - - + + - - - + - + - + + - - + + + - - + - + - + + - - + - + - - + + + - + - + + - + - + + + - - + - + - - + - + - - + - + - + - - - + - - - - + - - + - - - + + + + - - - - - - + + + + + + + - - + - + - + - + + + - - - - - - - - - - - - + + + - - - - - Highly Disorganized High Entropy Much Information Required Highly Organized Low Entropy Little Information Required
16
Information Gain Gain (S, A) = expected reduction in entropy due to sorting on A Values (A) is the set of all possible values for attribute A, S v is the subset of S which attribute A has value v, |S| and | S v | represent the number of samples in set S and set S v respectively Gain(S,A) is the expected reduction in entropy caused by knowing the value of attribute A. 16
17
Information Gain Example: Choose A or B ? 17 A 01 100 + 3 - 100 - B 01 100 + 50- 53- Split on ASplit on B
18
Example Play Tennis Example 18
19
Example 19 Humidity HighNormal 3+,4-6+,1- E =.985 E =.592 Gain(S, Humidity ) =.94 - 7/14 * 0.985 - 7/14 *.592 = 0.151
20
Example 20 Wind WeakStrong 6+2-3+,3- E =.811 E =1.0 Gain(S, Wind ) =.94 - 8/14 * 0.811 - 6/14 * 1.0 = 0.048
21
Example 21 Outlook OvercastRain 3,7,12,13 4,5,6,10,14 3+,2- Sunny 1,2,8,9,11 4+,0-2+,3- 0.0 0.970 Gain(S, Outlook ) = 0.246
22
Example Pick Outlook as the root 22 Outlook OvercastRain Sunny Gain(S, Wind) = 0.048 Gain(S, Humidity) = 0.151 Gain(S, Temperature) = 0.029 Gain(S, Outlook) = 0.246
23
Example Pick Outlook as the root 23 Outlook 3,7,12,13 4,5,6,10,14 3+,2- 1,2,8,9,11 4+,0- 2+,3- Yes ? Continue until: Every attribute is included in path, or, all examples in the leaf have same label Overcast RainSunny ?
24
Example 24 Outlook 3,7,12,13 1,2,8,9,11 4+,0- 2+,3- Yes ? Overcast RainSunny Gain (S sunny, Humidity) =.97-(3/5) * 0-(2/5) * 0 =.97 Gain (S sunny, Temp) =.97- 0-(2/5) *1 =.57 Gain (S sunny, Wind) =.97-(2/5) *1 - (3/5) *.92 =.02
25
Example 25 Outlook Yes Humidity Overcast RainSunny Gain (S sunny, Humidity) =.97-(3/5) * 0-(2/5) * 0 =.97 Gain (S sunny, Temp) =.97- 0-(2/5) *1 =.57 Gain (S sunny, Wind) =.97-(2/5) *1 - (3/5) *.92 =.02 No Yes NormalHigh
26
Example 26 Outlook Yes Humidity Overcast RainSunny Gain (S rain, Humidity) = Gain (S rain, Temp) = Gain (S rain, Wind) = No Yes NormalHigh 4,5,6,10,14 3+,2- ?
27
Example 27 Outlook Yes Humidity Overcast RainSunny No Yes NormalHigh Wind No Yes WeakStrong
28
Tutorial/Exercise Questions G53MLE Machine Learning Dr Guoping Qiu28 An experiment has produced the following 3d feature vectors X = (x 1, x 2, x 3 ) belonging to two classes. Design a decision tree classifier to class an unknown feature vector X = (1, 2, 1). X = (x 1, x 2, x 3 ) x 1 x 2 x 3 Classes 1111 1112 1111 2112 2121 2222 2221 2212 1222 1121 121= ?
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.