2 Agenda Basics of Decision Tree Introduction to ID3 Entropy and Information GainTwo Examples
3 Basics What is a decision tree? A tree where each branching (decision) node represents a choice between 2 or more alternatives, with every branching node being part of a path to a leaf nodeDecision node: Specifies a test of some attributeLeaf node: Indicates classification of an example
4 ID3 Invented by J. Ross Quinlan Employs a top-down greedy search through the space of possible decision trees.Greedy because there is no backtracking. It picks highest values first.Select attribute that is most useful for classifying examples (attribute that has the highest Information Gain).
5 EntropyEntropy measures the impurity of an arbitrary collection of examples.For a collection S, entropy is given as:For a collection S having positive and negative examplesEntropy(S) = -p+log2p+ - p-log2p-where p+ is the proportion of positive examplesand p- is the proportion of negative examplesIn general, Entropy(S) = 0 if all members of S belong to the same class.Entropy(S) = 1 (maximum) when all members are split equally.
6 Information GainMeasures the expected reduction in entropy. The higher the IG, more is the expected reduction in entropy.where Values(A) is the set of all possible values for attribute A,Sv is the subset of S for which attribute A has value v.
7 Example 1Sample training data to determine whether an animal lays eggs.Independent/Condition attributesDependent/Decision attributesAnimalWarm-bloodedFeathersFurSwimsLays EggsOstrichYesNoCrocodileRavenAlbatrossDolphinKoala
8 Entropy(4Y,2N): -(4/6)log2(4/6) – (2/6)log2(2/6) =Now, we have to find the IG for all four attributes Warm-blooded, Feathers, Fur, Swims
11 Gain(S,Warm-blooded) = 0.10916 Gain(S,Feathers) = 0.45914 Gain(S,Fur) =Gain(S,Swims) =Gain(S,Feathers) is maximum, so it is considered as the root nodeThe ‘Y’ descendant has only positive examples and becomes the leaf node with classification ‘Lays Eggs’AnimalWarm-bloodedFeathersFurSwimsLays EggsOstrichYesNoCrocodileRavenAlbatrossDolphinKoalaFeathersYN[Ostrich, Raven,Albatross][Crocodile, Dolphin,Koala]Lays Eggs?
12 We now repeat the procedure, S: [Crocodile, Dolphin, Koala] S: [1+,2-] AnimalWarm-bloodedFeathersFurSwimsLays EggsCrocodileNoYesDolphinKoalaWe now repeat the procedure,S: [Crocodile, Dolphin, Koala]S: [1+,2-]Entropy(S) = -(1/3)log2(1/3) – (2/3)log2(2/3)=