Presentation is loading. Please wait.

Presentation is loading. Please wait.

Machine Learning Lecture 10 Decision Trees G53MLE Machine Learning Dr Guoping Qiu1.

Similar presentations


Presentation on theme: "Machine Learning Lecture 10 Decision Trees G53MLE Machine Learning Dr Guoping Qiu1."— Presentation transcript:

1 Machine Learning Lecture 10 Decision Trees G53MLE Machine Learning Dr Guoping Qiu1

2 Trees  Node  Root  Leaf  Branch  Path  Depth 2

3 Decision Trees  A hierarchical data structure that represents data by implementing a divide and conquer strategy  Can be used as a non-parametric classification method.  Given a collection of examples, learn a decision tree that represents it.  Use this representation to classify new examples 3

4 Decision Trees  Each node is associated with a feature (one of the elements of a feature vector that represent an object);  Each node test the value of its associated feature;  There is one branch for each value of the feature  Leaves specify the categories (classes)  Can categorize instances into multiple disjoint categories – multi-class 4 Yes Humidity No Yes Wind No Yes Outlook RainSunnyOvercast High WeakStrongNormal

5 Decision Trees  Play Tennis Example  Feature Vector = (Outlook, Temperature, Humidity, Wind) 5 Yes Humidity No Yes Wind No Yes Outlook RainSunnyOvercast High WeakStrongNormal

6 Decision Trees 6 Yes Humidity No Yes Wind No Yes Outlook RainSunnyOvercast High WeakStrongNormal Node associated with a feature

7 Decision Trees  Play Tennis Example  Feature values:  Outlook = (sunny, overcast, rain)  Temperature =(hot, mild, cool)  Humidity = (high, normal)  Wind =(strong, weak) 7

8 Decision Trees  Outlook = (sunny, overcast, rain) 8 Yes Humidity No Yes Wind No Yes Outlook RainSunnyOvercast High WeakStrongNormal One branch for each value

9 Decision Trees  Class = (Yes, No) 9 Yes Humidity No Yes Wind No Yes Outlook RainSunnyOvercast High WeakStrongNormal Leaf nodes specify classes

10 Decision Trees  Design Decision Tree Classifier  Picking the root node  Recursively branching 10

11 Decision Trees  Picking the root node  Consider data with two Boolean attributes (A,B) and two classes + and – { (A=0,B=0), - }: 50 examples { (A=0,B=1), - }: 50 examples { (A=1,B=0), - }: 3 examples { (A=1,B=1), + }: 100 examples 11

12 Decision Trees  Picking the root node  Trees looks structurally similar; which attribute should we choose? 12 B - - 01 A + + - - 01 A - - 01 B + + - - 01

13 Decision Trees  Picking the root node  The goal is to have the resulting decision tree as small as possible (Occam’s Razor)  The main decision in the algorithm is the selection of the next attribute to condition on (start from the root node).  We want attributes that split the examples to sets that are relatively pure in one label; this way we are closer to a leaf node.  The most popular heuristics is based on information gain, originated with the ID3 system of Quinlan. 13

14 Entropy  S is a sample of training examples  p + is the proportion of positive examples in S  p - is the proportion of negative examples in S  Entropy measures the impurity of S 14 p+p+

15 15 + - - + + + - - + - + - + + - - + + + - - + - + - - + - - + - + - - + - + - + + - - + + - - - + - + - + + - - + + + - - + - + - + + - - + - + - - + + + - + - + + - + - + + + - - + - + - - + - + - - + - + - + - - - + - - - - + - - + - - - + + + + - - - - - - + + + + + + + - - + - + - + - + + + - - - - - - - - - - - - + + + - - - - - Highly Disorganized High Entropy Much Information Required Highly Organized Low Entropy Little Information Required

16 Information Gain  Gain (S, A) = expected reduction in entropy due to sorting on A  Values (A) is the set of all possible values for attribute A, S v is the subset of S which attribute A has value v, |S| and | S v | represent the number of samples in set S and set S v respectively  Gain(S,A) is the expected reduction in entropy caused by knowing the value of attribute A. 16

17 Information Gain Example: Choose A or B ? 17 A 01 100 + 3 - 100 - B 01 100 + 50- 53- Split on ASplit on B

18 Example Play Tennis Example 18

19 Example 19 Humidity HighNormal 3+,4-6+,1- E =.985 E =.592 Gain(S, Humidity ) =.94 - 7/14 * 0.985 - 7/14 *.592 = 0.151

20 Example 20 Wind WeakStrong 6+2-3+,3- E =.811 E =1.0 Gain(S, Wind ) =.94 - 8/14 * 0.811 - 6/14 * 1.0 = 0.048

21 Example 21 Outlook OvercastRain 3,7,12,13 4,5,6,10,14 3+,2- Sunny 1,2,8,9,11 4+,0-2+,3- 0.0 0.970 Gain(S, Outlook ) = 0.246

22 Example Pick Outlook as the root 22 Outlook OvercastRain Sunny Gain(S, Wind) = 0.048 Gain(S, Humidity) = 0.151 Gain(S, Temperature) = 0.029 Gain(S, Outlook) = 0.246

23 Example Pick Outlook as the root 23 Outlook 3,7,12,13 4,5,6,10,14 3+,2- 1,2,8,9,11 4+,0- 2+,3- Yes ? Continue until: Every attribute is included in path, or, all examples in the leaf have same label Overcast RainSunny ?

24 Example 24 Outlook 3,7,12,13 1,2,8,9,11 4+,0- 2+,3- Yes ? Overcast RainSunny Gain (S sunny, Humidity) =.97-(3/5) * 0-(2/5) * 0 =.97 Gain (S sunny, Temp) =.97- 0-(2/5) *1 =.57 Gain (S sunny, Wind) =.97-(2/5) *1 - (3/5) *.92 =.02

25 Example 25 Outlook Yes Humidity Overcast RainSunny Gain (S sunny, Humidity) =.97-(3/5) * 0-(2/5) * 0 =.97 Gain (S sunny, Temp) =.97- 0-(2/5) *1 =.57 Gain (S sunny, Wind) =.97-(2/5) *1 - (3/5) *.92 =.02 No Yes NormalHigh

26 Example 26 Outlook Yes Humidity Overcast RainSunny Gain (S rain, Humidity) = Gain (S rain, Temp) = Gain (S rain, Wind) = No Yes NormalHigh 4,5,6,10,14 3+,2- ?

27 Example 27 Outlook Yes Humidity Overcast RainSunny No Yes NormalHigh Wind No Yes WeakStrong

28 Tutorial/Exercise Questions G53MLE Machine Learning Dr Guoping Qiu28 An experiment has produced the following 3d feature vectors X = (x 1, x 2, x 3 ) belonging to two classes. Design a decision tree classifier to class an unknown feature vector X = (1, 2, 1). X = (x 1, x 2, x 3 ) x 1 x 2 x 3 Classes 1111 1112 1111 2112 2121 2222 2221 2212 1222 1121 121= ?


Download ppt "Machine Learning Lecture 10 Decision Trees G53MLE Machine Learning Dr Guoping Qiu1."

Similar presentations


Ads by Google