Decision Tree Algorithm (C4.5)

Slides:



Advertisements
Similar presentations
Lecture 3: CBR Case-Base Indexing
Advertisements

COMP3740 CR32: Knowledge Management and Adaptive Systems
1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)
Decision Tree Learning
Decision Tree Learning - ID3
Decision Trees Decision tree representation ID3 learning algorithm
Machine Learning III Decision Tree Induction
ICS320-Foundations of Adaptive and Learning Systems
Classification Techniques: Decision Tree Learning
Decision Trees Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei Han.
ID3 Algorithm Abbas Rizvi CS157 B Spring What is the ID3 algorithm? ID3 stands for Iterative Dichotomiser 3 Algorithm used to generate a decision.
Part 7.3 Decision Trees Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting.
An overview of The IBM Intelligent Miner for Data By: Neeraja Rudrabhatla 11/04/1999.
Decision Tree Learning Learning Decision Trees (Mitchell 1997, Russell & Norvig 2003) –Decision tree induction is a simple but powerful learning paradigm.
Induction of Decision Trees
1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
Machine Learning Reading: Chapter Text Classification  Is text i a finance new article? PositiveNegative.
Machine Learning Lecture 10 Decision Trees G53MLE Machine Learning Dr Guoping Qiu1.
Machine Learning Chapter 3. Decision Tree Learning
Learning what questions to ask. 8/29/03Decision Trees2  Job is to build a tree that represents a series of questions that the classifier will ask of.
Artificial Intelligence 7. Decision trees
Mohammad Ali Keyvanrad
Classification I. 2 The Task Input: Collection of instances with a set of attributes x and a special nominal attribute Y called class attribute Output:
Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.
Machine Learning Lecture 10 Decision Tree Learning 1.
CpSc 810: Machine Learning Decision Tree Learning.
Inferring Decision Trees Using the Minimum Description Length Principle J. R. Quinlan and R. L. Rivest Information and Computation 80, , 1989.
Decision-Tree Induction & Decision-Rule Induction
Data Mining – Algorithms: Decision Trees - ID3 Chapter 4, Section 4.3.
CS690L Data Mining: Classification
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.
Machine Learning II Decision Tree Induction CSE 573.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Sections 4.1 Inferring Rudimentary Rules Rodney Nielsen.
DECISION TREE Ge Song. Introduction ■ Decision Tree: is a supervised learning algorithm used for classification or regression. ■ Decision Tree Graph:
Decision Tree Learning
Decision Trees Reading: Textbook, “Learning From Examples”, Section 3.
COM24111: Machine Learning Decision Trees Gavin Brown
Machine Learning Recitation 8 Oct 21, 2009 Oznur Tastan.
Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.
SEEM Tutorial 1 Classification: Decision tree Siyuan Zhang,
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
CSE573 Autumn /11/98 Machine Learning Administrative –Finish this topic –The rest of the time is yours –Final exam Tuesday, Mar. 17, 2:30-4:20.
Decision Tree Learning
Machine Learning Inductive Learning and Decision Trees
DECISION TREES An internal node represents a test on an attribute.
Decision Trees an introduction.
Decision Tree Learning
Decision trees (concept learnig)
Machine Learning Lecture 2: Decision Tree Learning.
Classification Algorithms
Decision Tree Learning
Teori Keputusan (Decision Theory)
Decision Trees: Another Example
Artificial Intelligence
ID3 Vlad Dumitriu.
Data Science Algorithms: The Basic Methods
Decision Tree Saed Sayad 9/21/2018.
Advanced Artificial Intelligence
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning: Lecture 3
Decision Trees Decision tree representation ID3 learning algorithm
Play Tennis ????? Day Outlook Temperature Humidity Wind PlayTennis
Machine Learning Chapter 3. Decision Tree Learning
COMP61011 : Machine Learning Decision Trees
Decision Trees Decision tree representation ID3 learning algorithm
Machine Learning: Decision Tree Learning
Data Mining CSCI 307, Spring 2019 Lecture 15
Presentation transcript:

Decision Tree Algorithm (C4.5)

Training Examples for PlayTennis (Mitchell 1997)

Decision Tree for PlayTennis (Mitchell 1997)

Decision Tree Algorithm (C4.5) All the equations of C4.5 algorithm are as follow: Calculate Info(S) (Entropy, 熵) to identify the class in the training set S. where |S| is the number of cases in the training set, Ci is a class, i=1,2,...,k, k is the number of classes, freq(Ci, S) and is the number of cases in Ci.. (9Y, 5N)

Decision Tree Algorithm (C4.5) Calculate the expected information value, for feature X to partition S. where L is the number of outputs for feature X, Si is a subset of S corresponding to the ith output, and |Si| is the number of cases in subset Si.

Decision Tree Algorithm (C4.5) Calculate the information gained after partitioning according to feature X. Calculate the partition information value, acquired for S partitioned into L subsets.

Decision Tree Algorithm (C4.5) Calculate the gain ratio of Gain(X) over SplitInfo(X). The gain ratio can reduce the probability of choosing the node with more attribute values. If gain ratio for two attribute values were the same smallest, then you can choice one randomly.

Decision Tree Algorithm (C4.5) Step 1 Decide which attribute should consider first? Outlook? Temperature? Humidity? Wind? Use Entropy:-P+log2P+-P-log2P- At first, we have 9(YES) and 5(NO) Starting Entropy:-9/14 log2 9/14 -5/14 log25/14=0.94

Entropy (Info)

Decision Tree Algorithm (C4.5) Step 2 Compute the gain ratio for each attribute Information gain: (The last E – The possible E)… (desire MAX) Humidity Outlook High Normal Sunny Overcast Rain

Decision Tree Algorithm (C4.5) Step 2(Con`t) Compute the gain ratio for each attribute Wind Temperature Strong Weak Hot Mild Cool

Decision Tree Algorithm (C4.5) Step 2(Con`t) Compute the gain ratio for each attribute E1=(-2/5 log2 2/5 -3/5 log2 3/5)=(-0.4 *(-1.3219) -0.6 *(-0.7369)) = (0.5281+0.4421)=0.9702 E2=(-4/4 log2 4/4 -0) =(-1 log2 1 -0)=0; E3=(-3/5 log2 3/5 -2/5 log2 2/5) =(-0.6 * (-0.7369) +0.4 *(-1.3219))= 0.9702 Outlook Sunny Overcast Rain 2+ 3- 4+ 0- 3+ 2- SplitInfo = -5/14 log2 5/14 - 4/14 log2 4/14 -5/14 log2 5/14=1.577 =0.94 - 5/14*0.9702 - 4/14*0 - 5/14*0.9702 =0.94 - 0.693=0.247 Gain ratio =0.247/1.577=0.157

Decision Tree Algorithm (C4.5) Step 2(Con`t) Compute the gain ratio for each attribute E1=(-3/7 log2 3/7 -4/7 log2 4/7)=0.985 E2=(-6/7 log2 6/7 -1/7 log2 1/7)=0.592 Humidity Info gain=0.94 - 7/14*0.985 - 7/14*0.592 =0.151 Split Info =-7/14 log2 7/14 -7/14 log27/14=1 High Normal Gain ratio=0.151/1=0.151 3+ 4- 6+ 1-

Decision Tree Algorithm (C4.5) Step 2(Con`t) Compute the gain ratio for each attribute E1=(-2/4 log2 2/4 -2/4 log2 2/4)=1 E2=(-4/6 log2 4/6 -2/6 log2 2/6)=0.9182 E3=(-3/4 log2 3/4 -1/4 log2 1/4)=0.8112 Temperature Info gain=0.94 - 4/14*1 - 6/14*0.9182 - 4/14*0.8112 = 0.0292 Hot Mild Cool Split Info =-4/14 log2 4/14 -6/14 log2 6/14- 4/14 log2 4/14=1.556 2+ 2- 4+ 2- 3+ 1- Gain ratio=0.0292/1.556=0.01876

Decision Tree Algorithm (C4.5) Step 2(Con`t) Compute the gain ratio for each attribute E1=(-3/6 log2 3/6 - 3/6 log2 3/6)=1 E2=(-6/8 log2 6/8 - 2/8 log2 2/8)=0.8112 Wind Info gain=0.94 - 6/14*1 - 8/14*0.8112 = 0.048 Strong Weak Split Info=-6/14 log2 6/14 -8/14 log2 8/14 =0.9852 3+ 3- 6+ 2- Gain ratio=0.048/0.9852=0.049

Decision Tree Algorithm (C4.5) Step 2(Con`t) Summary of gain ratio Gain ratio Outlook=0.157 Gain ratio Humidify=0.151 Gain ratio Wind=0.049 Gain ratio Temperature=0.01876 So the root node is Outlook

Decision Tree Algorithm (C4.5) Step 3 Decide other attribute under root node May choice T, H, W, under Sunny and Rain (Don’t care Overcast, because no information contained) Outlook Sunny Rain Overcast ……. …… Yes

Decision Tree Algorithm (C4.5) Step 3(Con`t) Look under Outlook=Sunny May choice T, H, W, under Outlook =Sunny Outlook E (Outlook=Sunny) = - 2/5 log2 2/5 - 3/5 log2 3/5 = 0.97 Sunny 2+3-

Decision Tree Algorithm (C4.5) Step 3(Con`t) Look under Outlook=Sunny May choice T, H, W, under Outlook =Sunny Outlook E1 (Under Outlook=Sunny and Humidity=High) = - 0/3 log2 0/3 - 3/3 log2 3/3 = 0 E2 (Under Outlook=Sunny and Humidity=Normal) = - 2/2 log2 2/2 - 0/2 log2 0/2 = 0 Info gain (Under Outlook=Sunny and Humidity) = 0.97- 3/5*0- 2/5 *0 = 0.971 Sunny Humidity Split Info=-3/5 log2 3/5 -2/5 log2 2/5 =0.971 High Normal Gain ratio=0.971/0.971=1 0+3- 2+0-

Decision Tree Algorithm (C4.5) Step 3(Con`t) May choice T, H, W, under Outlook=Sunny Look under Sunny E1 (Under Outlook=Sunny and Temperature=Hot) = - 0/2 log2 0/2 - 2/2 log2 2/2 = 0 E2 (Under Outlook=Sunny and Temperature=Mild) = - 1/2 log2 1/2 - 1/2 log2 1/2 = 1 E3 (Under Outlook=Sunny and Temperature=Cool) = - 1/1 log2 1/1 - 0/1 log2 0/1 = 0 Info gain (Under Outlook=Sunny and Temperature) = 0.97 - 2/5*0 - 2/5 *1 - 1/5 *0=0.57 Outlook Sunny Temperature Split Info=-2/5 log2 2/5 -2/5 log2 2/5 -1/5 log2 1/5 =1.522 Hot Mild Cool Gain ratio=0.57/1.522=0.375 0+2- 1+1- 1+0-

Decision Tree Algorithm (C4.5) Step 3(Con`t) May choice T, H, W, under Outlook=Sunny Look under Sunny Outlook E1 (Under Outlook=Sunny and Wind=Strong) = - 1/2 log2 1/2 - 1/2 log2 1/2 = 1 E2 (Under Outlook=Sunny and Wind=Weak) = - 1/3 log2 1/3 - 2/3 log2 2/3 = 0.918 Info gain (Under Outlook=Sunny and Wind) = 0.97- 2/5*1- 3/5 *0.918 = 0.0192 Sunny Wind Split Info =-2/5 log2 2/5 -3/5 log2 3/5=0.971 Strong Weak Gain ratio=0.0192/0.971=0.02 1+1- 1+2-

Decision Tree Algorithm (C4.5) Step 3(Con`t) Summary of gain ratio Outlook ……. Gain ratio under Outlook =Sunny and Humidity: 1 Gain ratio under Outlook =Sunny and Temperature: 0.375 Gain ratio under Outlook =Sunny and Wind: 0.02 Sunny Overcast YES Humidity Normal High YES No So choice Humidity 2 4 3

Decision Tree Algorithm (C4.5) May choice T, H, W, under Rain Step 4(Con`t) E (Rain) = - 3/5 log2 3/5 - 2/5 log2 2/5 = 0.97 Look under Rain E1 (Under Outlook=Rain and Humidify=High) = - 1/2 log2 1/2 - 1/2 log2 1/2 = 1 E2 (Under Outlook=Rain and Humidify=Normal) = - 2/3 log2 2/3 - 1/3 log2 1/3 = 0.918 Info gain (Under Outlook=Rain and Humidify) = 0.97- 2/5*1- 3/5 *0.918 = 0.019 S 1+1- Outlook O High Normal 2+1- Rain Humidify Split Info =-2/5 log2 2/5 -3/5 log2 3/5=0.971 Gain ratio=0.019/0.971=0.02

Decision Tree Algorithm (C4.5) May choice T, H, W, under Rain Step 4(Con`t) E1 (Under Outlook=Rain and T=Hot) = 0 E2 (Under Outlook=Rain and T=Mild) = - 2/3 log2 2/3 - 1/3 log2 1/3 = 0.918 E3 (Under Outlook=Rain and T=Cool) = - 1/2 log2 1/2 - 1/2 log2 1/2 = 1 Info gain (Under Outlook=Rain and Temperature) = 0.97- 0/5*0 - 3/5 *0.918 - 2/5 * 1 = 0.0192 Look under Rain Outlook Rain S O Temperature Split Info =-3/5 log2 3/5 -2/5 log2 2/5=0.971 Gain ratio=0.0192/0.971=0.02 Hot Mild Cool 0+0- 2+1- 1+1-

Decision Tree Algorithm (C4.5) May choice T, H, W, under Rain Step 4(Con`t) Look under Rain E1 (Under Outlook=Rain and Wind=Strong) = - 0/2 log2 0/2 - 2/2 log2 2/2 = 0 E2 (Under Outlook=Rain and Wind=Weak) = - 3/3 log2 3/3 - 0/3 log2 0/3 = 0 Info gain (Under Outlook=Rain) = 0.97- 2/5*0- 3/5 *0 =0.971 Outlook Rain S O Wind Split Info=-2/5 log2 2/5 -3/5 log2 3/5=0.971 Strong Weak Gain ratio=0.971/0.971=1 0+2- 3+0-

Decision Tree Algorithm (C4.5) Step 4(Con`t) Summary of gain ratio Outlook Gain ratio Under O=Rain and Wind =1 Gain ratio Under O=Rain and Humidity =0.02 Gain ratio Under O=Rain and Temperature = 0.02 Rain Sunny Overcast Humidity YES Wind 4 High Normal Strong Weak So choice Wind No YES YES No 3 2 3 2

Decision Tree Algorithm (C4.5) Additional situation about continuous value Outlook Temperature Humidity Windy Play sunny 85 High false no 80 true overcast 83 yes rainy 70 68 Normal 65 64 72 69 75 81 71

Decision Tree Algorithm (C4.5) Additional situation about continuous value 85 80 83 70 68 65 64 72 69 75 75 72 81 71 Step 1. Sort 64 65 68 69 70 71 72 75 80 81 83 85 72 75 Mapping with target Y N Y Y Y N N Y N Y Y N Y Y

Decision Tree Algorithm (C4.5) Step 2 Decide the cut point When Y N or N Y 64 65 68 69 70 71 72 75 80 81 83 85 72 75 Y N Y Y Y N N Y N Y Y N Y Y Cut Point[E1(1+ 0-)&E2(8+ 5-)] Cut Point[E1(9+ 4-)&E2(0+ 1-)] Cut Point[E1(1+ 1-)&E2(8+ 4-)] Cut Point[E1(4+ 1-)&E2(5+ 4-)] Ex. Cut Point at 64.5 E1 (64.5<) = - 1/1 log2 1/1 - 0/1 log2 0/1 = 0 E2 (>64.5) = - 8/13 log2 8/13 - 5/13 log2 5/13 = 0.9611 info (Under Sunny H) = 0.97- 1/14*0 -13/14*0.9611 =0.077 Cut Point[E1(4+ 2-)&E2(5+ 3-)] Cut Point[E1(5+ 3-)&E2(4+ 2-)] Cut Point[E1(7+ 3-)&E2(2+ 2-)] Cut Point[E1(7+ 4-)&E2(2+ 1-)] 29 Then compare with other non-continuous factor

Decision Tree Algorithm (C4.5) If cut point is 64.5, gain ratio=0.077423/0.371232 =0.208557 [E1(1+ 0-)&E2(8+ 5-)] If cut point is 66.5, gain ratio=0.040032/0.591673 =0.067659 [E1(1+ 1-)&E2(8+ 4-)] If cut point is 70.5, gain ratio=0.075048/0.940628 =0.079814 [E1(4+ 1-)&E2(5+ 4-)] If cut point is 71.5, gain ratio=0.031051/1.000000 =0.031051 [E1(4+ 2-)&E2(5+ 3-)]

Decision Tree Algorithm (C4.5) If the cut point is 73.5, gain ratio=0.031054/0.985228 =0.031520 [E1(5+ 3-)&E2(4+ 2-)] If the cut point is 77.5, gain ratio=0.054792/0.863121 =0.063481 [E1(7+ 3-)&E2(2+ 2-)] If the cut point is 80.5, gain ratio=0.030204/0.749595 =0.040294 [E1(7+ 4-)&E2(2+ 1-)]

Decision Tree Algorithm (C4.5) And we need to find out the Max. gain ratio. If the cut point is 84.0, gain ratio=0.143115/0.371232 =0.385513 [E1(9+ 4-)&E2(0+ 1-)] Max. gain ratio is 0.38 and the cut point is 84.0.

Decision Tree Algorithm (C4.5) Parameter Min. Case How many Min. Case should we set? Ans. If the number of total cases in training set is under 1000, 2 is recommendation. Change Min. Case can reproduce the tree structure, the rule length and the number of rules.

Decision Tree Algorithm (C4.5) Parameter Min. Case In order to avoid the over-fitting, splits can be created if certain specified threshold (e.g. the minimum number of cases for a split search) is met. This is the so-called minimum case.

Decision Tree Algorithm (C4.5) Outlook (14 cases) Outlook (14 cases) If minimum case is set to 6 Sunny Sunny Humidity (5 cases) ……. ……. No 2+3- Normal High YES No 2+0- 0+3- 35

Decision Tree Algorithm (C4.5) Parameter prune confidence level UCF(E,N) where E is the number of error; N is number of training instance (EX: U0.25(0,6)=0.206 (預估錯誤率) and the expected number of error is 6*0.206=1.236) Use the estimated error to determine whether the tree built in growth phase is required to prune or not at certain nodes. The probability of error cannot be determined exactly; however, there exists a probability distribution that is generally summarized as a pair of confidence limits. (binomial distribution.) C4.5 simply equates the estimated error rate at a leaf with this upper limit, based on the argument that the tree has been constructed to minimize observed error rate

Decision Tree Algorithm (C4.5) Parameter prune confidence level UCF(E,N) where E is the number of error; N is number of training instance (EX: U0.25(0,6)= 0.206 and the expected error is 6*0.206=1.236) root node root node If the expect error is 4.21 in node 1 Node 1 ……. node 1 ……. If the expect error is 2.63 in node 6 node 6 If the expect error is 3.273 Leaf (6) Leaf (9) Leaf (1) 6*0.206+9*0.143+1*0.750=3.273

Decision Tree Algorithm (C4.5) Future research: Multiple cut points of continuous values.