ID3 Algorithm.

Slides:



Advertisements
Similar presentations
Decision Trees Decision tree representation ID3 learning algorithm
Advertisements

Pavan J Joshi 2010MCS2095 Special Topics in Database Systems
Classification Techniques: Decision Tree Learning
Decision Trees Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei Han.
ID3 Algorithm Abbas Rizvi CS157 B Spring What is the ID3 algorithm? ID3 stands for Iterative Dichotomiser 3 Algorithm used to generate a decision.
Classification: Decision Trees
Induction of Decision Trees
1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
Classification.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Ordinal Decision Trees Qinghua Hu Harbin Institute of Technology
Classification: Decision Trees 2 Outline  Top-Down Decision Tree Construction  Choosing the Splitting Attribute  Information Gain and Gain Ratio.
Machine Learning Lecture 10 Decision Trees G53MLE Machine Learning Dr Guoping Qiu1.
ID3 Algorithm Allan Neymark CS157B – Spring 2007.
Machine Learning Chapter 3. Decision Tree Learning
Learning what questions to ask. 8/29/03Decision Trees2  Job is to build a tree that represents a series of questions that the classifier will ask of.
Mohammad Ali Keyvanrad
Chapter 9 – Classification and Regression Trees
Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.
Decision Trees DefinitionDefinition MechanismMechanism Splitting FunctionSplitting Function Issues in Decision-Tree LearningIssues in Decision-Tree Learning.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.3: Decision Trees Rodney Nielsen Many of.
Data Mining – Algorithms: Decision Trees - ID3 Chapter 4, Section 4.3.
CS690L Data Mining: Classification
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Decision Trees, Part 1 Reading: Textbook, Chapter 6.
Machine Learning Decision Trees. E. Keogh, UC Riverside Decision Tree Classifier Ross Quinlan Antenna Length Abdomen Length.
Decision Tree Learning
Presentation on Decision trees Presented to: Sir Marooof Pasha.
Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.
Decision Trees by Muhammad Owais Zahid
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
CSE343/543 Machine Learning: Lecture 4.  Chapter 3: Decision Trees  Weekly assignment:  There are lot of applications and systems using machine learning.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
CSE573 Autumn /09/98 Machine Learning Administrative –Last topic: Decision Tree Learning Reading: 5.1, 5.4 Last time –finished NLP sample system’s.
CSE573 Autumn /11/98 Machine Learning Administrative –Finish this topic –The rest of the time is yours –Final exam Tuesday, Mar. 17, 2:30-4:20.
Machine Learning Inductive Learning and Decision Trees
DECISION TREES An internal node represents a test on an attribute.
Decision Trees an introduction.
Data Mining Decision Tree Induction
Decision Tree Learning
Classification Algorithms
Decision Trees.
Decision Tree Learning
Teori Keputusan (Decision Theory)
Prepared by: Mahmoud Rafeek Al-Farra
Artificial Intelligence
Decision Trees (suggested time: 30 min)
Chapter 6 Classification and Prediction
ID3 Vlad Dumitriu.
Data Science Algorithms: The Basic Methods
Issues in Decision-Tree Learning Avoiding overfitting through pruning
Decision Tree Saed Sayad 9/21/2018.
Classification and Prediction
Jianping Fan Dept of Computer Science UNC-Charlotte
Machine Learning Chapter 3. Decision Tree Learning
Decision Trees.
Machine Learning: Lecture 3
Decision Trees Decision tree representation ID3 learning algorithm
Classification with Decision Trees
Machine Learning Chapter 3. Decision Tree Learning
Induction of Decision Trees and ID3
Decision trees.
Statistical Learning Dong Liu Dept. EEIS, USTC.
Dept. of Computer Science University of Liverpool
COMP61011 : Machine Learning Decision Trees
Decision Trees Decision tree representation ID3 learning algorithm
©Jiawei Han and Micheline Kamber
Data Mining CSCI 307, Spring 2019 Lecture 15
Presentation transcript:

ID3 Algorithm

Agenda Decision Trees What is ID3? Entropy Calculating Entropy with Code Information Gain Advantages and Disadvantages Example

Decision Trees Rules for classifying data using attributes. The tree consists of decision nodes and leaf nodes. A decision node has two or more branches, each representing values for the attribute tested. A leaf node attribute produces a homogeneous result (all in one class), which does not require additional classification testing.

Decision Tree Example Outlook sunny rain overcast Humidity Yes Windy high normal true false No Yes No Yes

What is ID3? A mathematical algorithm for building the decision tree. Invented by J. Ross Quinlan in 1979. Uses Information Theory invented by Shannon in 1948. Builds the tree from the top down, with no backtracking. Information Gain is used to select the most useful attribute for classification.

Entropy A formula to calculate the homogeneity of a sample. A completely homogeneous sample has entropy of 0. An equally divided sample has entropy of 1. Entropy(s) = - p+log2 (p+) -p-log2 (p-) for a sample of negative and positive elements. The formula for entropy is:

Entropy Example Entropy(S) = - (9/14) Log2 (9/14) - (5/14) Log2 (5/14) = 0.940

Calculating Entropy with Code Most programming languages and calculators do not have a log2 function. Use a conversion factor Take log function of 2, and divide by it. Example: log10(2) = .301 Then divide to get log2(n): log10(3/5) / .301 = log2(3/5)

Calculating Entropy with Code (cont’d) Taking log10(0) produces an error.  Substitute 0 for (0/3)log10(0/3) Do not try to calculate log10(0/3)

Information Gain (IG) The information gain measure the expected decrease in entropy after a dataset is split on an attribute. Which attribute creates the most homogeneous branches? First the entropy of the total dataset is calculated. The dataset is then split on the different attributes. The entropy for each branch is calculated. Then it is added proportionally, to get total entropy for the split. The resulting entropy is subtracted from the entropy before the split. The result is the Information Gain, or decrease in entropy. The attribute that yields the largest IG is chosen for the decision node.

Information Gain (cont’d) A branch set with entropy of 0 is a leaf node. Otherwise, the branch needs further splitting to classify its dataset. The ID3 algorithm is run recursively on the non-leaf branches, until all data is classified.

Advantages of using ID3 Understandable prediction rules are created from the training data. Builds the fastest tree. Builds a short tree. Only need to test enough attributes until all data is classified. Finding leaf nodes enables test data to be pruned, reducing number of tests. Whole dataset is searched to create tree.

Disadvantages of using ID3 It can only deal with nominal data. Data may be over-fitted or over-classified, if a small sample is tested. Only one attribute at a time is tested for making a decision. It is not able to deal with noisy data sets. Classifying continuous data may be computationally expensive, as many trees must be generated to see where to break the continuum.

Advantages of C4.5 overID3 Enables the system to: Be more robust in the presence of noise. Avoid overfitting. Deals with continuous attribute. Deals with missing data. Convert trees to rule

Gain Ratio Information Gain of ID3 is not suitable for attributes that have a large number of values. For example, if we have an attribute D that has a distinct value for each record, then Info(D,T) is 0, thus Gain(D,T) is maximal. To compensate for this Quinlan suggests using the following ratio instead of Gain: Gain(D,T) GainRatio(D,T) = ---------- SplitInfo(D,T)   where SplitInfo(D,T) is the information due to the split of T on the basis of the value of the categorical attribute D.

Example: The Simpsons

Person Homer 0” 250 36 M Marge 10” 150 34 F Bart 2” 90 10 Lisa 6” 78 8 Hair Length Weight Age Class Homer 0” 250 36 M Marge 10” 150 34 F Bart 2” 90 10 Lisa 6” 78 8 Maggie 4” 20 1 Abe 1” 170 70 Selma 8” 160 41 Otto 180 38 Krusty 200 45 Comic 8” 290 38 ?

Let us try splitting on Hair length Entropy(4F,5M) = -(4/9)log2(4/9) - (5/9)log2(5/9) = 0.9911 yes no Hair Length <= 5? Let us try splitting on Hair length Entropy(1F,3M) = -(1/4)log2(1/4) - (3/4)log2(3/4) = 0.8113 Entropy(3F,2M) = -(3/5)log2(3/5) - (2/5)log2(2/5) = 0.9710 Gain(Hair Length <= 5) = 0.9911 – (4/9 * 0.8113 + 5/9 * 0.9710 ) = 0.0911

Let us try splitting on Weight Entropy(4F,5M) = -(4/9)log2(4/9) - (5/9)log2(5/9) = 0.9911 yes no Weight <= 160? Let us try splitting on Weight Entropy(4F,1M) = -(4/5)log2(4/5) - (1/5)log2(1/5) = 0.7219 Entropy(0F,4M) = -(0/4)log2(0/4) - (4/4)log2(4/4) = 0 Gain(Weight <= 160) = 0.9911 – (5/9 * 0.7219 + 4/9 * 0 ) = 0.5900

Let us try splitting on Age Entropy(4F,5M) = -(4/9)log2(4/9) - (5/9)log2(5/9) = 0.9911 yes no age <= 40? Let us try splitting on Age Entropy(3F,3M) = -(3/6)log2(3/6) - (3/6)log2(3/6) = 1 Entropy(1F,2M) = -(1/3)log2(1/3) - (2/3)log2(2/3) = 0.9183 Gain(Age <= 40) = 0.9911 – (6/9 * 1 + 3/9 * 0.9183 ) = 0.0183

This time we find that we can split on Hair length, and we are done! Of the 3 features we had, Weight was best. But while people who weigh over 160 are perfectly classified (as males), the under 160 people are not perfectly classified… So we simply recurse! yes no Weight <= 160? This time we find that we can split on Hair length, and we are done! yes no Hair Length <= 2?

We need don’t need to keep the data around, just the test conditions. Weight <= 160? yes no How would these people be classified? Hair Length <= 2? Male yes no Male Female

Male Male Female It is trivial to convert Decision Trees to rules… yes Weight <= 160? yes no Hair Length <= 2? Male yes no Male Female Rules to Classify Males/Females If Weight greater than 160, classify as Male Elseif Hair Length less than or equal to 2, classify as Male Else classify as Female

References Quinlan, J.R. 1986, Machine Learning, 1, 81 http://dms.irb.hr/tutorial/tut_dtrees.php http://www.dcs.napier.ac.uk/~peter/vldb/dm/node11.html http://www2.cs.uregina.ca/~dbd/cs831/notes/ml/dtrees/4_dtrees2.html Professor Sin-Min Lee, SJSU. http://cs.sjsu.edu/~lee/cs157b/cs157b.html