Presentation on theme: "Iterative Dichotomiser 3 (ID3) Algorithm"— Presentation transcript:
1Iterative Dichotomiser 3 (ID3) Algorithm Medha PradhanCS 157B, Spring 2007
2Agenda Basics of Decision Tree Introduction to ID3 Entropy and Information GainTwo Examples
3Basics What is a decision tree? A tree where each branching (decision) node represents a choice between 2 or more alternatives, with every branching node being part of a path to a leaf nodeDecision node: Specifies a test of some attributeLeaf node: Indicates classification of an example
4ID3 Invented by J. Ross Quinlan Employs a top-down greedy search through the space of possible decision trees.Greedy because there is no backtracking. It picks highest values first.Select attribute that is most useful for classifying examples (attribute that has the highest Information Gain).
5EntropyEntropy measures the impurity of an arbitrary collection of examples.For a collection S, entropy is given as:For a collection S having positive and negative examplesEntropy(S) = -p+log2p+ - p-log2p-where p+ is the proportion of positive examplesand p- is the proportion of negative examplesIn general, Entropy(S) = 0 if all members of S belong to the same class.Entropy(S) = 1 (maximum) when all members are split equally.
6Information GainMeasures the expected reduction in entropy. The higher the IG, more is the expected reduction in entropy.where Values(A) is the set of all possible values for attribute A,Sv is the subset of S for which attribute A has value v.
7Example 1Sample training data to determine whether an animal lays eggs.Independent/Condition attributesDependent/Decision attributesAnimalWarm-bloodedFeathersFurSwimsLays EggsOstrichYesNoCrocodileRavenAlbatrossDolphinKoala
8Entropy(4Y,2N): -(4/6)log2(4/6) – (2/6)log2(2/6) =Now, we have to find the IG for all four attributes Warm-blooded, Feathers, Fur, Swims
11Gain(S,Warm-blooded) = 0.10916 Gain(S,Feathers) = 0.45914 Gain(S,Fur) =Gain(S,Swims) =Gain(S,Feathers) is maximum, so it is considered as the root nodeThe ‘Y’ descendant has only positive examples and becomes the leaf node with classification ‘Lays Eggs’AnimalWarm-bloodedFeathersFurSwimsLays EggsOstrichYesNoCrocodileRavenAlbatrossDolphinKoalaFeathersYN[Ostrich, Raven,Albatross][Crocodile, Dolphin,Koala]Lays Eggs?
12We now repeat the procedure, S: [Crocodile, Dolphin, Koala] S: [1+,2-] AnimalWarm-bloodedFeathersFurSwimsLays EggsCrocodileNoYesDolphinKoalaWe now repeat the procedure,S: [Crocodile, Dolphin, Koala]S: [1+,2-]Entropy(S) = -(1/3)log2(1/3) – (2/3)log2(2/3)=