Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS344: Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 29 and 30– Decision Tree Learning; ID3;Entropy.

Similar presentations


Presentation on theme: "CS344: Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 29 and 30– Decision Tree Learning; ID3;Entropy."— Presentation transcript:

1 CS344: Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 29 and 30– Decision Tree Learning; ID3;Entropy

2 Information in the data

3 Outlook (0) Temp (T) Humidit y (H) Windy (W) Decision to play (D) SunnyHigh FN SunnyHigh TN CloudyHigh FY RainMedHighFY RainColdLowNY RainColdLowTN CloudyColdLowTY To-play-or-not-to-play-tennis data vs. Climatic-Condition from Ross Quinlan’s paper on ID3 (1986), C4.5 (1993)

4 Weather (0) Temp (T) Humidit y (H) Windy (W) Decision (D) SunnyMedHighFN SunnyColdLowFY RainMedLowFY SunnyMedLowTY CloudyMedHighTY CloudyHighLowFY RainHigh TN

5 Outlook Rainy Cloudy Sunny Windy Yes Humidity High Low Yes No T F Yes No

6 Rule Base R1: If outlook is sunny and if humidity is high then Decision is No. R2: If outlook is sunny and if humidity is low then Decision is Yes. R3: If outlook is cloudy then Decision is Yes.

7 Making Sense of Information Classification Clustering Giving a short and nice description

8 Short Description Occam Razor principle (Shortest/simplest description is the best for generalization)

9 Representation Language Decision tree. Neural network. Rule base. Boolean expression.

10 The example data presented in the form of rows and labels has low ordered/structured information compared to the succinct description (Decision Trees and Rule Base). Lack of structure in information: measured by “Entropy” Information & Entropy

11 Define Entropy of S (Labeled data) E(S) = - ( P + log 2 P + + P - log 2 P - ) P + = proportion of positively labeled data. P - = proportion of negatively labeled data.

12 Example P + = 9/14 P - = 5/14 E(S) = - 9/14 log 2 (9/14) – 5/14 log 2 (5/14) = 0.940

13 Partitioning the Data “Windy” as the attribute Windy = [ T, F] Windy = T : Partitioning the data

14 Partitioning by focusing on a particular attribute produced “Information gain” “Reduction in Entropy” Partitioning the Data (Contd)

15 Information gain when we choose windy = [ T, F ] Windy = T,# + = 6, # - = 2 Windy = F, # + = 3, # - = 3 # denotes “number of” Partitioning the Data (Contd)

16 Windy TF 6, + 2, - 3, + 3, - Partitioning the Data (Contd)

17 Gain(S,A) = = E(S) -∑( |S v | / |S| )E(S v ) v є values of A Partitioning the Data (Contd) E(S) = 0.940 E(S, Windy): E( Windy=T) = - 6/8 log 6/8 – 2/8 log 2/8 = 0.811

18 E( Windy=F) = - 3/6 log 3/6 – 3/6 log 3/6 = 1.0 Partitioning the Data (Contd)

19 Gain(S,Windy) = = 0.940 – (8/14 *0.811 + 6/14* 1.0) = 0.048 Exercise: Find information gain for each attribute: outlook, Temp, Humidity and windy. Partitioning the Data (Contd)

20 ID3 Algorithm Calculating the gain for every attribute and choosing the one with maximum gain to finally arrive at the decision tree is called “ID3” algorithm to build a classifier.

21 Summary Haphazard presentation of data is not acceptable to MIND. Focusing attention on an attribute, automatically leads to information gain. Designed entropy. Parallely designed information gain. Related this to message communication.


Download ppt "CS344: Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 29 and 30– Decision Tree Learning; ID3;Entropy."

Similar presentations


Ads by Google