Classification Algorithms

Slides:



Advertisements
Similar presentations
Artificial Intelligence 11. Decision Tree Learning Course V231 Department of Computing Imperial College, London © Simon Colton.
Advertisements

COMP3740 CR32: Knowledge Management and Adaptive Systems
Decision Tree Approach in Data Mining
Decision Tree Algorithm (C4.5)
ICS320-Foundations of Adaptive and Learning Systems
Classification Techniques: Decision Tree Learning
Decision Trees Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei Han.
ID3 Algorithm Abbas Rizvi CS157 B Spring What is the ID3 algorithm? ID3 stands for Iterative Dichotomiser 3 Algorithm used to generate a decision.
Decision Tree Algorithm
Decision Tree Learning Learning Decision Trees (Mitchell 1997, Russell & Norvig 2003) –Decision tree induction is a simple but powerful learning paradigm.
Classification: Decision Trees
Induction of Decision Trees
1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
Decision Trees an Introduction.
Decision Trees Chapter 18 From Data to Knowledge.
Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
Classification: Decision Trees 2 Outline  Top-Down Decision Tree Construction  Choosing the Splitting Attribute  Information Gain and Gain Ratio.
Machine Learning Lecture 10 Decision Trees G53MLE Machine Learning Dr Guoping Qiu1.
Machine Learning Chapter 3. Decision Tree Learning
Learning what questions to ask. 8/29/03Decision Trees2  Job is to build a tree that represents a series of questions that the classifier will ask of.
Classification I. 2 The Task Input: Collection of instances with a set of attributes x and a special nominal attribute Y called class attribute Output:
Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.
Decision-Tree Induction & Decision-Rule Induction
Decision Trees. Decision trees Decision trees are powerful and popular tools for classification and prediction. The attractiveness of decision trees is.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.3: Decision Trees Rodney Nielsen Many of.
CS690L Data Mining: Classification
Decision Trees, Part 1 Reading: Textbook, Chapter 6.
Machine Learning Recitation 8 Oct 21, 2009 Oznur Tastan.
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
CSE343/543 Machine Learning: Lecture 4.  Chapter 3: Decision Trees  Weekly assignment:  There are lot of applications and systems using machine learning.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
CSE573 Autumn /09/98 Machine Learning Administrative –Last topic: Decision Tree Learning Reading: 5.1, 5.4 Last time –finished NLP sample system’s.
CSE573 Autumn /11/98 Machine Learning Administrative –Finish this topic –The rest of the time is yours –Final exam Tuesday, Mar. 17, 2:30-4:20.
Decision Tree Learning
Machine Learning Inductive Learning and Decision Trees
DECISION TREES An internal node represents a test on an attribute.
Decision Trees an introduction.
CS 9633 Machine Learning Decision Tree Learning
Chapter 18 From Data to Knowledge
Decision Tree Learning
Decision trees (concept learnig)
Machine Learning Lecture 2: Decision Tree Learning.
Decision Tree Learning
CSE543: Machine Learning Lecture 2: August 6, 2014
Teori Keputusan (Decision Theory)
Prepared by: Mahmoud Rafeek Al-Farra
Decision Trees: Another Example
Artificial Intelligence
Data Science Algorithms: The Basic Methods
Decision Tree Saed Sayad 9/21/2018.
Advanced Artificial Intelligence
Classification by Decision Tree Induction
Machine Learning Chapter 3. Decision Tree Learning
Data Mining – Chapter 3 Classification
Machine Learning: Lecture 3
Decision Trees Decision tree representation ID3 learning algorithm
Classification with Decision Trees
Lecture 05: Decision Trees
Play Tennis ????? Day Outlook Temperature Humidity Wind PlayTennis
Machine Learning Chapter 3. Decision Tree Learning
Dept. of Computer Science University of Liverpool
Decision Trees Decision tree representation ID3 learning algorithm
Artificial Intelligence 6. Decision Tree Learning
Entropy S is the sample space, or Data set D
Junheng, Shengming, Yunsheng 10/19/2018
Machine Learning: Decision Tree Learning
Data Mining CSCI 307, Spring 2019 Lecture 15
A task of induction to find patterns
Data Mining CSCI 307, Spring 2019 Lecture 6
Data Mining CSCI 307, Spring 2019 Lecture 9
Presentation transcript:

Classification Algorithms Decision Tree Algorithms

The problem Given a set of training cases/objects and their attribute values, try to determine the target attribute value of new cases. Classification Prediction

Why decision tree? Decision trees are powerful and popular tools for classification and prediction. Decision trees represent rules, which can be understood by humans and used in knowledge system such as database.

key requirements Attribute-value description: object or case must be expressible in terms of a fixed collection of properties or attributes (e.g., hot, mild, cold). Predefined classes (target values): the target function has discrete output values (Boolean or multiclass) Sufficient data: enough training cases should be provided to learn the model.

Random split The tree can grow huge These trees are hard to understand. Larger trees are typically less accurate than smaller trees.

Principled Criterion Selection of an attribute to test at each node - choosing the most useful attribute for classifying examples. information gain measures how well a given attribute separates the training examples according to their target classes This measure is used to select among the candidate attributes at each step while growing the tree

Entropy A measure of homogeneity of the set of examples. Given a set S of positive and negative examples of some target concept (a 2-class problem), the entropy of set S relative to this binary classification is E(S) = - p1*log2 p1 – p2*log2 p2

Example Suppose S has 25 examples, 15 positive and 10 negatives [15+, 10-]. Then the entropy of S relative to this classification is E(S)=-(15/25) log2(15/25) - (10/25) log2 (10/25)

Entropy Entropy is minimized when all values of the target attribute are the same. If we know that Joe always plays Center Offence, then entropy of Offence is 0 Entropy is maximized when there is an equal chance of all values for the target attribute (i.e. the result is random) If offence = center in 9 instances and forward in 9 instances, entropy is maximized

Information Gain Information gain measures the expected reduction in entropy, or uncertainty. Values(A) is the set of all possible values for attribute A, and Sv the subset of S for which attribute A has value v Sv = {s in S | A(s) = v}. the first term in the equation for Gain is just the entropy of the original collection S the second term is the expected value of the entropy after S is partitioned using attribute A

Information Gain It is simply the expected reduction in entropy caused by partitioning the examples according to this attribute. It is the number of bits saved when encoding the target value of an arbitrary member of S, by knowing the value of attribute A.

Examples Before partitioning, the entropy is H(10/20, 10/20) = - 10/20 log(10/20) - 10/20 log(10/20) = 1 Using the ``where’’ attribute, divide into 2 subsets Entropy of the first set H(home) = - 6/12 log(6/12) - 6/12 log(6/12) = 1 Entropy of the second set H(away) = - 4/8 log(4/8) - 4/8 log(4/8) = 1 Expected entropy after partitioning 12/20 * H(home) + 8/20 * H(away) = 1

Using the ``when’’ attribute, divide into 3 subsets Entropy of the first set H(5pm) = - 1/4 log(1/4) - 3/4 log(3/4); Entropy of the second set H(7pm) = - 9/12 log(9/12) - 3/12 log(3/12); Entropy of the second set H(9pm) = - 0/4 log(0/4) - 4/4 log(4/4) = 0 Expected entropy after partitioning 4/20 * H(1/4, 3/4) + 12/20 * H(9/12, 3/12) + 4/20 * H(0/4, 4/4) = 0.65 Information gain 1-0.65 = 0.35

Decision Knowing the ``when’’ attribute values provides larger information gain than ``where’’. Therefore the ``when’’ attribute should be chosen for testing prior to the ``where’’ attribute. Similarly, we can compute the information gain for other attributes. At each node, choose the attribute with the largest information gain.

Decision Tree: Example Day Outlook Temperature Humidity Wind Play Tennis 1 Sunny Hot High Weak No 2 Sunny Hot High Strong No 3 Overcast Hot High Weak Yes 4 Rain Mild High Weak Yes 5 Rain Cool Normal Weak Yes 6 Rain Cool Normal Strong No 7 Overcast Cool Normal Strong Yes 8 Sunny Mild High Weak No 9 Sunny Cool Normal Weak Yes 10 Rain Mild Normal Weak Yes 11 Sunny Mild Normal Strong Yes 12 Overcast Mild High Strong Yes 13 Overcast Hot Normal Weak Yes 14 Rain Mild High Strong No Outlook Sunny Overcast Rain Humidity Yes Wind High Normal No Strong Weak

Weather Data: Play or not Play? Outlook Temperature Humidity Windy Play? sunny hot high false No true overcast Yes rain mild cool normal Note: Outlook is the Forecast, no relation to Microsoft email program

Example Tree for “Play?” Outlook sunny rain overcast Humidity Yes Windy high normal true false No Yes No Yes

Which attribute to select?

Example: attribute “Outlook” “Outlook” = “Sunny”: “Outlook” = “Overcast”: “Outlook” = “Rainy”: Expected information for attribute: Note: log(0) is not defined, but we evaluate 0*log(0) as zero

Computing the information gain (information before split) – (information after split) Information gain for attributes from weather data:

Continuing to split

The final decision tree Note: not all leaves need to be pure; sometimes identical instances have different classes  Splitting stops when data can’t be split any further

A worked example Weekend (Example) Weather Parents Money Decision (Category) W1 Sunny Yes Rich Cinema W2 No Tennis W3 Windy W4 Rainy Poor W5 Stay in W6 W7 W8 Shopping W9 W10

Determining the Best Attribute Entropy(S) = -pcinema log2(pcinema) -ptennis log2(ptennis) – pshopping log2(pshopping) –pstay_in log2(pstay_in)                    = -(6/10) * log2(6/10) -(2/10) * log2(2/10) -(1/10) * log2(1/10) -(1/10) * log2(1/10)                    = -(6/10) * -0.737 -(2/10) * -2.322 -(1/10) * -3.322 -(1/10) * -3.322                    = 0.4422 + 0.4644 + 0.3322 + 0.3322 = 1.571 and we need to determine the best of: Gain(S, weather) = 1.571 - (|Ssun|/10)*Entropy(Ssun) – (|Swind|/10)*Entropy(Swind) – (|Srain|/10)*Entropy(Srain)                           = 1.571 - (0.3)*Entropy(Ssun) - (0.4)*Entropy(Swind) – (0.3)*Entropy(Srain)                           = 1.571 - (0.3)*(0.918) - (0.4)*(0.81125) - (0.3)*(0.918) = 0.70 Gain(S, parents) = 1.571 - (|Syes|/10)*Entropy(Syes) - (|Sno|/10)*Entropy(Sno)                           = 1.571 - (0.5) * 0 - (0.5) * 1.922 = 1.571 - 0.961 = 0.61 Gain(S, money) = 1.571 - (|Srich|/10)*Entropy(Srich) - (|Spoor|/10)*Entropy(Spoor)                           = 1.571 - (0.7) * (1.842) - (0.3) * 0 = 1.571 - 1.2894 = 0.2816

Now we look at the first branch Now we look at the first branch. Ssunny = {W1, W2, W10}, not empty and the class labels for these rows are not common, thus we put a node rather than a leaf. Samething is for the 2nd and 3rd branches (Weather, windy) and (weather, rainy), and therefore, we put a node for each one of them.

Hence we can calculate: Now we focus on the first branch of the tree data, i.e. data for attribute values (Weather, Sunny) as shown below: Weekend (Example) Weather Parents Money Decision (Category) W1 Sunny Yes Rich Cinema W2 No Tennis W10 Hence we can calculate: Gain(Ssunny, parents) = 0.918 - (|Syes|/|S|)*Entropy(Syes) - (|Sno|/|S|)*Entropy(Sno)                           = 0.918 - (1/3)*0 - (2/3)*0 = 0.918 Gain(Ssunny, money) = 0.918 - (|Srich|/|S|)*Entropy(Srich) - (|Spoor|/|S|)*Entropy(Spoor)                           = 0.918 - (3/3)*0.918 - (0/3)*0 = 0.918 - 0.918 = 0

Finishing this tree off is left as a tutorial exercise. Remembering that we replaced the set S by the set S(Sunny), looking at S(yes), we see that the only example of this is W1. Hence, the branch for yes stops at a categorisation leaf, with the category being Cinema. Also, S(no) contains W2 and W10, but these are in the same category (Tennis). Hence the branch for no ends here at a categorisation leaf. Hence our upgraded tree looks like this: Finishing this tree off is left as a tutorial exercise.