Download presentation

1
**Iterative Dichotomiser 3 (ID3) Algorithm**

Medha Pradhan CS 157B, Spring 2007

2
**Agenda Basics of Decision Tree Introduction to ID3**

Entropy and Information Gain Two Examples

3
**Basics What is a decision tree?**

A tree where each branching (decision) node represents a choice between 2 or more alternatives, with every branching node being part of a path to a leaf node Decision node: Specifies a test of some attribute Leaf node: Indicates classification of an example

4
**ID3 Invented by J. Ross Quinlan**

Employs a top-down greedy search through the space of possible decision trees. Greedy because there is no backtracking. It picks highest values first. Select attribute that is most useful for classifying examples (attribute that has the highest Information Gain).

5
Entropy Entropy measures the impurity of an arbitrary collection of examples. For a collection S, entropy is given as: For a collection S having positive and negative examples Entropy(S) = -p+log2p+ - p-log2p- where p+ is the proportion of positive examples and p- is the proportion of negative examples In general, Entropy(S) = 0 if all members of S belong to the same class. Entropy(S) = 1 (maximum) when all members are split equally.

6
Information Gain Measures the expected reduction in entropy. The higher the IG, more is the expected reduction in entropy. where Values(A) is the set of all possible values for attribute A, Sv is the subset of S for which attribute A has value v.

7
Example 1 Sample training data to determine whether an animal lays eggs. Independent/Condition attributes Dependent/Decision attributes Animal Warm-blooded Feathers Fur Swims Lays Eggs Ostrich Yes No Crocodile Raven Albatross Dolphin Koala

8
**Entropy(4Y,2N): -(4/6)log2(4/6) – (2/6)log2(2/6)**

= Now, we have to find the IG for all four attributes Warm-blooded, Feathers, Fur, Swims

9
**For attribute ‘Warm-blooded’:**

Values(Warm-blooded) : [Yes,No] S = [4Y,2N] SYes = [3Y,2N] E(SYes) = SNo = [1Y,0N] E(SNo) = 0 (all members belong to same class) Gain(S,Warm-blooded) = – [(5/6)* (1/6)*0] = For attribute ‘Feathers’: Values(Feathers) : [Yes,No] SYes = [3Y,0N] E(SYes) = 0 SNo = [1Y,2N] E(SNo) = Gain(S,Feathers) = – [(3/6)*0 + (3/6)* ] =

10
For attribute ‘Fur’: Values(Fur) : [Yes,No] S = [4Y,2N] SYes = [0Y,1N] E(SYes) = 0 SNo = [4Y,1N] E(SNo) = Gain(S,Fur) = – [(1/6)*0 + (5/6)*0.7219] = For attribute ‘Swims’: Values(Swims) : [Yes,No] SYes = [1Y,1N] E(SYes) = 1 (equal members in both classes) SNo = [3Y,1N] E(SNo) = Gain(S,Swims) = – [(2/6)*1 + (4/6)* ] =

11
**Gain(S,Warm-blooded) = 0.10916 Gain(S,Feathers) = 0.45914 **

Gain(S,Fur) = Gain(S,Swims) = Gain(S,Feathers) is maximum, so it is considered as the root node The ‘Y’ descendant has only positive examples and becomes the leaf node with classification ‘Lays Eggs’ Animal Warm-blooded Feathers Fur Swims Lays Eggs Ostrich Yes No Crocodile Raven Albatross Dolphin Koala Feathers Y N [Ostrich, Raven, Albatross] [Crocodile, Dolphin, Koala] Lays Eggs ?

12
**We now repeat the procedure, S: [Crocodile, Dolphin, Koala] S: [1+,2-]**

Animal Warm-blooded Feathers Fur Swims Lays Eggs Crocodile No Yes Dolphin Koala We now repeat the procedure, S: [Crocodile, Dolphin, Koala] S: [1+,2-] Entropy(S) = -(1/3)log2(1/3) – (2/3)log2(2/3) =

13
**For attribute ‘Warm-blooded’:**

Values(Warm-blooded) : [Yes,No] S = [1Y,2N] SYes = [0Y,2N] E(SYes) = 0 SNo = [1Y,0N] E(SNo) = 0 Gain(S,Warm-blooded) = – [(2/3)*0 + (1/3)*0] = For attribute ‘Fur’: Values(Fur) : [Yes,No] SYes = [0Y,1N] E(SYes) = 0 SNo = [1Y,1N] E(SNo) = 1 Gain(S,Fur) = – [(1/3)*0 + (2/3)*1] = For attribute ‘Swims’: Values(Swims) : [Yes,No] SYes = [1Y,1N] E(SYes) = 1 SNo = [0Y,1N] E(SNo) = 0 Gain(S,Swims) = – [(2/3)*1 + (1/3)*0] = Gain(S,Warm-blooded) is maximum

14
**The final decision tree will be:**

Feathers Y N Lays eggs Warm-blooded Lays Eggs Does not lay eggs

15
**Example 2 Factors affecting sunburn Name Hair Height Weight Lotion**

Sunburned Sarah Blonde Average Light No Yes Dana Tall Alex Brown Short Annie Emily Red Heavy Pete John Katie

16
**In this case, the final decision tree will be**

Hair Blonde Brown Red Sunburned Not Sunburned Lotion N Y Not Sunburned Sunburned

17
**References "Machine Learning", by Tom Mitchell, McGraw-Hill, 1997**

"Building Decision Trees with the ID3 Algorithm", by: Andrew Colin, Dr. Dobbs Journal, June 1996 Professor Sin-Min Lee, SJSU.

Similar presentations

OK

CS-424 Gregory Dudek Today’s outline Administrative issues –Assignment deadlines: 1 day = 24 hrs (holidays are special) –The project –Assignment 3 –Midterm.

CS-424 Gregory Dudek Today’s outline Administrative issues –Assignment deadlines: 1 day = 24 hrs (holidays are special) –The project –Assignment 3 –Midterm.

© 2018 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on loan against property Ppt on brake system in automotive Mp ppt online form 2013 Ppt on conservation of trees Waters view ppt on android Make a ppt on election system in india Ppt on economic order quantity graph Ppt on online shopping cart application Ppt on water cycle for class 7 Ppt on sources of data collection