Machine Learning Lecture 10 Decision Trees G53MLE Machine Learning Dr Guoping Qiu1.

Slides:



Advertisements
Similar presentations
Lecture 3: CBR Case-Base Indexing
Advertisements

Decision Tree Learning
Decision Trees Decision tree representation ID3 learning algorithm
Machine Learning III Decision Tree Induction
Decision Tree Algorithm (C4.5)
ICS320-Foundations of Adaptive and Learning Systems
Classification Techniques: Decision Tree Learning
Decision Trees Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei Han.
Decision Tree Rong Jin. Determine Milage Per Gallon.
ID3 Algorithm Abbas Rizvi CS157 B Spring What is the ID3 algorithm? ID3 stands for Iterative Dichotomiser 3 Algorithm used to generate a decision.
Part 7.3 Decision Trees Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting.
Induction of Decision Trees
1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
Decision Trees an Introduction.
Decision Trees Chapter 18 From Data to Knowledge.
SEG Tutorial 1 – Classification Decision tree, Naïve Bayes & k-NN CHANG Lijun.
Machine Learning Reading: Chapter Text Classification  Is text i a finance new article? PositiveNegative.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
ID3 Algorithm Allan Neymark CS157B – Spring 2007.
Machine Learning Chapter 3. Decision Tree Learning
Learning what questions to ask. 8/29/03Decision Trees2  Job is to build a tree that represents a series of questions that the classifier will ask of.
Short Introduction to Machine Learning Instructor: Rada Mihalcea.
Mohammad Ali Keyvanrad
Classification I. 2 The Task Input: Collection of instances with a set of attributes x and a special nominal attribute Y called class attribute Output:
Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.
Machine Learning Lecture 10 Decision Tree Learning 1.
CpSc 810: Machine Learning Decision Tree Learning.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
CS344: Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 29 and 30– Decision Tree Learning; ID3;Entropy.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.3: Decision Trees Rodney Nielsen Many of.
Longin Jan Latecki Temple University
For Wednesday No reading Homework: –Chapter 18, exercise 6.
DECISION TREES CS446 Fall ’15 Administration Registration Hw1Hw1 is out  Please start working on it as soon as possible  Come to sections with questions.
CS690L Data Mining: Classification
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
Computing & Information Sciences Kansas State University Wednesday, 15 Nov 2006CIS 490 / 730: Artificial Intelligence Lecture 35 of 42 Wednesday, 15 November.
Decision Tree Learning
Training Examples. Entropy and Information Gain Information answers questions The more clueless I am about the answer initially, the more information.
A C B. Will I play tennis today? Features – Outlook: {Sun, Overcast, Rain} – Temperature:{Hot, Mild, Cool} – Humidity:{High, Normal, Low} – Wind:{Strong,
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
COM24111: Machine Learning Decision Trees Gavin Brown
Machine Learning Recitation 8 Oct 21, 2009 Oznur Tastan.
SEEM Tutorial 1 Classification: Decision tree Siyuan Zhang,
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
CSE343/543 Machine Learning: Lecture 4.  Chapter 3: Decision Trees  Weekly assignment:  There are lot of applications and systems using machine learning.
CS 541: Artificial Intelligence Lecture IX: Supervised Machine Learning Slides Credit: Peter Norvig and Sebastian Thrun.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
CS 541: Artificial Intelligence Lecture IX: Supervised Machine Learning Slides Credit: Peter Norvig and Sebastian Thrun.
Decision Tree Learning
Machine Learning Inductive Learning and Decision Trees
DECISION TREES An internal node represents a test on an attribute.
Decision Trees an introduction.
CS 9633 Machine Learning Decision Tree Learning
Decision Tree Learning
Decision trees (concept learnig)
Machine Learning Lecture 2: Decision Tree Learning.
Decision trees (concept learnig)
Classification Algorithms
Decision Trees: Another Example
Artificial Intelligence
Data Science Algorithms: The Basic Methods
Decision Tree Saed Sayad 9/21/2018.
Machine Learning: Lecture 3
Decision Trees Decision tree representation ID3 learning algorithm
Play Tennis ????? Day Outlook Temperature Humidity Wind PlayTennis
COMP61011 : Machine Learning Decision Trees
Decision Trees Decision tree representation ID3 learning algorithm
Data Mining CSCI 307, Spring 2019 Lecture 15
Presentation transcript:

Machine Learning Lecture 10 Decision Trees G53MLE Machine Learning Dr Guoping Qiu1

Trees  Node  Root  Leaf  Branch  Path  Depth 2

Decision Trees  A hierarchical data structure that represents data by implementing a divide and conquer strategy  Can be used as a non-parametric classification method.  Given a collection of examples, learn a decision tree that represents it.  Use this representation to classify new examples 3

Decision Trees  Each node is associated with a feature (one of the elements of a feature vector that represent an object);  Each node test the value of its associated feature;  There is one branch for each value of the feature  Leaves specify the categories (classes)  Can categorize instances into multiple disjoint categories – multi-class 4 Yes Humidity No Yes Wind No Yes Outlook RainSunnyOvercast High WeakStrongNormal

Decision Trees  Play Tennis Example  Feature Vector = (Outlook, Temperature, Humidity, Wind) 5 Yes Humidity No Yes Wind No Yes Outlook RainSunnyOvercast High WeakStrongNormal

Decision Trees 6 Yes Humidity No Yes Wind No Yes Outlook RainSunnyOvercast High WeakStrongNormal Node associated with a feature

Decision Trees  Play Tennis Example  Feature values:  Outlook = (sunny, overcast, rain)  Temperature =(hot, mild, cool)  Humidity = (high, normal)  Wind =(strong, weak) 7

Decision Trees  Outlook = (sunny, overcast, rain) 8 Yes Humidity No Yes Wind No Yes Outlook RainSunnyOvercast High WeakStrongNormal One branch for each value

Decision Trees  Class = (Yes, No) 9 Yes Humidity No Yes Wind No Yes Outlook RainSunnyOvercast High WeakStrongNormal Leaf nodes specify classes

Decision Trees  Design Decision Tree Classifier  Picking the root node  Recursively branching 10

Decision Trees  Picking the root node  Consider data with two Boolean attributes (A,B) and two classes + and – { (A=0,B=0), - }: 50 examples { (A=0,B=1), - }: 50 examples { (A=1,B=0), - }: 3 examples { (A=1,B=1), + }: 100 examples 11

Decision Trees  Picking the root node  Trees looks structurally similar; which attribute should we choose? 12 B A A B

Decision Trees  Picking the root node  The goal is to have the resulting decision tree as small as possible (Occam’s Razor)  The main decision in the algorithm is the selection of the next attribute to condition on (start from the root node).  We want attributes that split the examples to sets that are relatively pure in one label; this way we are closer to a leaf node.  The most popular heuristics is based on information gain, originated with the ID3 system of Quinlan. 13

Entropy  S is a sample of training examples  p + is the proportion of positive examples in S  p - is the proportion of negative examples in S  Entropy measures the impurity of S 14 p+p+

Highly Disorganized High Entropy Much Information Required Highly Organized Low Entropy Little Information Required

Information Gain  Gain (S, A) = expected reduction in entropy due to sorting on A  Values (A) is the set of all possible values for attribute A, S v is the subset of S which attribute A has value v, |S| and | S v | represent the number of samples in set S and set S v respectively  Gain(S,A) is the expected reduction in entropy caused by knowing the value of attribute A. 16

Information Gain Example: Choose A or B ? 17 A B Split on ASplit on B

Example Play Tennis Example 18

Example 19 Humidity HighNormal 3+,4-6+,1- E =.985 E =.592 Gain(S, Humidity ) = /14 * /14 *.592 = 0.151

Example 20 Wind WeakStrong ,3- E =.811 E =1.0 Gain(S, Wind ) = /14 * /14 * 1.0 = 0.048

Example 21 Outlook OvercastRain 3,7,12,13 4,5,6,10,14 3+,2- Sunny 1,2,8,9,11 4+,0-2+, Gain(S, Outlook ) = 0.246

Example Pick Outlook as the root 22 Outlook OvercastRain Sunny Gain(S, Wind) = Gain(S, Humidity) = Gain(S, Temperature) = Gain(S, Outlook) = 0.246

Example Pick Outlook as the root 23 Outlook 3,7,12,13 4,5,6,10,14 3+,2- 1,2,8,9,11 4+,0- 2+,3- Yes ? Continue until: Every attribute is included in path, or, all examples in the leaf have same label Overcast RainSunny ?

Example 24 Outlook 3,7,12,13 1,2,8,9,11 4+,0- 2+,3- Yes ? Overcast RainSunny Gain (S sunny, Humidity) =.97-(3/5) * 0-(2/5) * 0 =.97 Gain (S sunny, Temp) = (2/5) *1 =.57 Gain (S sunny, Wind) =.97-(2/5) *1 - (3/5) *.92 =.02

Example 25 Outlook Yes Humidity Overcast RainSunny Gain (S sunny, Humidity) =.97-(3/5) * 0-(2/5) * 0 =.97 Gain (S sunny, Temp) = (2/5) *1 =.57 Gain (S sunny, Wind) =.97-(2/5) *1 - (3/5) *.92 =.02 No Yes NormalHigh

Example 26 Outlook Yes Humidity Overcast RainSunny Gain (S rain, Humidity) = Gain (S rain, Temp) = Gain (S rain, Wind) = No Yes NormalHigh 4,5,6,10,14 3+,2- ?

Example 27 Outlook Yes Humidity Overcast RainSunny No Yes NormalHigh Wind No Yes WeakStrong

Tutorial/Exercise Questions G53MLE Machine Learning Dr Guoping Qiu28 An experiment has produced the following 3d feature vectors X = (x 1, x 2, x 3 ) belonging to two classes. Design a decision tree classifier to class an unknown feature vector X = (1, 2, 1). X = (x 1, x 2, x 3 ) x 1 x 2 x 3 Classes = ?