1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)

Slides:



Advertisements
Similar presentations
Decision Tree Learning - ID3
Advertisements

Decision Trees Decision tree representation ID3 learning algorithm
Machine Learning III Decision Tree Induction
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
1er. Escuela Red ProTIC - Tandil, de Abril, Decision Tree Learning 3.1 Introduction –Method for approximation of discrete-valued target functions.
Classification Algorithms
Decision Tree Approach in Data Mining
Decision Tree Algorithm (C4.5)
ICS320-Foundations of Adaptive and Learning Systems
Classification Techniques: Decision Tree Learning
Decision Tree Learning 主講人:虞台文 大同大學資工所 智慧型多媒體研究室.
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
Decision Tree Learning
Machine Learning II Decision Tree Induction CSE 473.
Decision Trees. DEFINE: Set X of Instances (of n-tuples x = ) –E.g., days decribed by attributes (or features): Sky, Temp, Humidity, Wind, Water, Forecast.
Part 7.3 Decision Trees Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting.
CS 590M Fall 2001: Security Issues in Data Mining Lecture 4: ID3.
Decision Tree Learning Learning Decision Trees (Mitchell 1997, Russell & Norvig 2003) –Decision tree induction is a simple but powerful learning paradigm.
Induction of Decision Trees
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
Decision Tree Learning
Decision tree learning
By Wang Rui State Key Lab of CAD&CG
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Machine Learning Chapter 3. Decision Tree Learning
Learning what questions to ask. 8/29/03Decision Trees2  Job is to build a tree that represents a series of questions that the classifier will ask of.
CS 484 – Artificial Intelligence1 Announcements List of 5 source for research paper Homework 5 due Tuesday, October 30 Book Review due Tuesday, October.
Artificial Intelligence 7. Decision trees
Mohammad Ali Keyvanrad
For Wednesday No new reading Homework: –Chapter 18, exercises 3, 4, 7.
For Monday Read chapter 18, sections 5-6 Homework: –Chapter 18, exercises 1-2.
1 CSI 5388:Topics in Machine Learning Inductive Learning: A Review.
Decision tree learning Maria Simi, 2010/2011 Inductive inference with decision trees  Decision Trees is one of the most widely used and practical methods.
For Friday No reading No homework. Program 4 Exam 2 A week from Friday Covers 10, 11, 13, 14, 18, Take home due at the exam.
Machine Learning Lecture 10 Decision Tree Learning 1.
CpSc 810: Machine Learning Decision Tree Learning.
Learning from Observations Chapter 18 Through
Decision-Tree Induction & Decision-Rule Induction
For Wednesday No reading Homework: –Chapter 18, exercise 6.
Machine Learning, Decision Trees, Overfitting Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 14,
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.
1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 5-Inducción de árboles de decisión (2/2) Eduardo Poggi.
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Decision Trees, Part 1 Reading: Textbook, Chapter 6.
1 Decision Trees ExampleSkyAirTempHumidityWindWaterForecastEnjoySport 1SunnyWarmNormalStrongWarmSameYes 2SunnyWarmHighStrongWarmSameYes 3RainyColdHighStrongWarmChangeNo.
Decision Tree Learning
Decision Tree Learning Presented by Ping Zhang Nov. 26th, 2007.
1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 4-Inducción de árboles de decisión (1/2) Eduardo Poggi.
Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.
Learning From Observations Inductive Learning Decision Trees Ensembles.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
CSE573 Autumn /11/98 Machine Learning Administrative –Finish this topic –The rest of the time is yours –Final exam Tuesday, Mar. 17, 2:30-4:20.
Decision Tree Learning
Machine Learning Inductive Learning and Decision Trees
CS 9633 Machine Learning Decision Tree Learning
Decision Tree Learning
Machine Learning Lecture 2: Decision Tree Learning.
Decision Tree Learning
Introduction to Machine Learning Algorithms in Bioinformatics: Part II
Issues in Decision-Tree Learning Avoiding overfitting through pruning
Decision Tree Saed Sayad 9/21/2018.
Machine Learning Chapter 3. Decision Tree Learning
CSI 5388:Topics in Machine Learning
Machine Learning: Lecture 3
Machine Learning Chapter 3. Decision Tree Learning
Decision Trees.
Decision Trees Berlin Chen
Presentation transcript:

1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)

2 Decision Tree Representation Outlook Humidity Wind Sunny Overcast Rain High NormalStrong Weak A Decision Tree for the concept PlayTennis

3 Appropriate Problems for Decision Tree Learning §Instances are represented by discrete attribute-value pairs (though the basic algorithm was extended to real-valued attributes as well) §The target function has discrete output values (can have more than two possible output values --> classes) §Disjunctive hypothesis descriptions may be required §The training data may contain errors §The training data may contain missing attribute values

4 ID3: The Basic Decision Tree Learning Algorithm §Database, See [Mitchell, p. 59] D13 D12 D11 D10 D9 D4 D7 D5 D3 D14 D8D6D2D1 What is the “best” attribute? Answer: Outlook [“best” = with highest information gain]

5 ID3 (Cont’d) D13 D12 D11 D10 D9 D4 D7 D5 D3 D14 D8D6D2D1 Outlook Sunny Overcast Rain What are the “best” attributes? Humidity and Wind

6 What Attribute to choose to “best” split a node? §Choose the attribute that minimize the Disorder (or Entropy) in the subtree rooted at a given node. §Disorder and Information are related as follows: the more disorderly a set, the more information is required to correctly guess an element of that set. §Information: What is the best strategy for guessing a number from a finite set of possible numbers? i.e., how many questions do you need to ask in order to know the answer (we are looking for the minimal number of questions). Answer Log_2(S), where S is the set of numbers and |S|, its cardinality. E.g.: Q1: is it smaller than 5? Q2: is it smaller than 2? Q1Q2

7 What Attribute to choose to “best” split a node? (Cont’d) §Log_2 |S| can also be thought of as the information value of being told x (the number to be guessed) instead of having to guess it. §Let U be a subset of S. What is the informatin value of being told x after finding out whether or not x  U? Ans: Log_2|S|-[P(x  U) Log_2|U|+ P(s  U) Log_2|S-U| §Let S = P  N (positive and negative data). The information value of being told x after finding out whether x  U or x  N is I({P,N})=Log_2(|S|)-|P|/|S| Log_2|P| -|N|/|S| Log_2|N|

8 What Attribute to choose to “best” split a node? (Cont’d) §We want to use this measure to choose an attribute that minimizes the disorder in the partitions it creates. Let {S_i | 1  i  n} be a partition of S resulting from a particular attribute. The disorder associated with this partition is: V({S_i | 1  i  n})=  |S_i|/|S|.I({P(S_i),N(S_i)}) Set of positive examples in S_i Set of negative examples in S_i

9 Hypothesis Space Search in Decision Tree Learning §Hypothesis Space: Set of possible decision trees (i.e., complete space of finie discrete-valued functions). §Search Method: Simple-to-Complex Hill-Climbing Search (only a single current hypothesis is maintained (  from candidate-elimination method)). No Backtracking!!! §Evaluation Function: Information Gain Measure §Batch Learning: ID3 uses all training examples at each step to make statistically-based decisions (  from candidate-elimination method which makes decisions incrementally). ==> the search is less sensitive to errors in individual training examples.

10 Inductive Bias in Decision Tree Learning §ID3’s Inductive Bias: Shorter trees are preferred over longer trees. Trees that place high information gain attributes close to the root are preferred over those that do not. §Note: this type of bias is different from the type of bias used by Candidate-Elimination: the inductive bias of ID3 follows from its search strategy (preference or search bias) whereas the inductive bias of the Candidate-Elimination algorithm follows from the definition of its hypothesis space (restriction or language bias).

11 Why Prefer Short Hypotheses? §Occam’s razor: Prefer the simplest hypothesis that fits the data [William of Occam (Philosopher), circa 1320] §Scientists seem to do that: E.g., Physicist seem to prefer simple explanations for the motion of planets, over more complex ones §Argument: Since there are fewer short hypotheses than long ones, it is less likely that one will find a short hypothesis that coincidentally fits the training data. §Problem with this argument: it can be made about many other constraints. Why is the “short description” constraint more relevant than others? §Nevertheless: Occam’s razor was shown experimentally to be a successful strategy!

12 Issues in Decision Tree Learning: I. Avoiding Overfitting the Data §Definition: Given a hypothesis space H, a hypothesis h  H is said to overfit the training data if there exists some alternative hypothesis h’  H, such that h has smaller error than h’ over the training examples, but h’ has a smaller error than h over the entire distribution of instances. (See curves in [Mitchell, p.67]) §There are two approaches for overfitting avoidance in Decision Trees: l Stop growing the tree before it perfectly fits the data l Allow the tree to overfit the data, and then post-prune it.

13 Issues in Decision Tree Learning: II. Other Issues §Incorporating Continuous-Valued Attributes §Alternative Measures for Selecting Attributes §Handling Training Examples with Missing Attribute Values §Handling Attributes with Differing Costs