1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 5-Inducción de árboles de decisión (2/2) Eduardo Poggi.

Slides:



Advertisements
Similar presentations
DECISION TREES. Decision trees  One possible representation for hypotheses.
Advertisements

Random Forest Predrag Radenković 3237/10
1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)
Decision Trees Decision tree representation ID3 learning algorithm
1er. Escuela Red ProTIC - Tandil, de Abril, Decision Tree Learning 3.1 Introduction –Method for approximation of discrete-valued target functions.
RIPPER Fast Effective Rule Induction
Classification Techniques: Decision Tree Learning
Decision Tree Learning 主講人:虞台文 大同大學資工所 智慧型多媒體研究室.
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
Decision Tree Learning
Decision Tree Rong Jin. Determine Milage Per Gallon.
ID3 Algorithm Abbas Rizvi CS157 B Spring What is the ID3 algorithm? ID3 stands for Iterative Dichotomiser 3 Algorithm used to generate a decision.
Part 7.3 Decision Trees Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting.
CS 590M Fall 2001: Security Issues in Data Mining Lecture 4: ID3.
Decision tree LING 572 Fei Xia 1/10/06. Outline Basic concepts Main issues Advanced topics.
Classification Continued
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
Classification.
Decision Tree Learning
Chapter 7 Decision Tree.
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Machine Learning Chapter 3. Decision Tree Learning
Decision Trees Advanced Statistical Methods in NLP Ling572 January 10, 2012.
Learning what questions to ask. 8/29/03Decision Trees2  Job is to build a tree that represents a series of questions that the classifier will ask of.
Mohammad Ali Keyvanrad
Chapter 9 – Classification and Regression Trees
Chapter 4 Classification. 2 Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of.
Machine Learning Lecture 10 Decision Tree Learning 1.
CpSc 810: Machine Learning Decision Tree Learning.
Categorical data. Decision Tree Classification Which feature to split on? Try to classify as many as possible with each split (This is a good split)
Decision Trees DefinitionDefinition MechanismMechanism Splitting FunctionSplitting Function Issues in Decision-Tree LearningIssues in Decision-Tree Learning.
For Wednesday No reading Homework: –Chapter 18, exercise 6.
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
1 Decision Tree Learning Original slides by Raymond J. Mooney University of Texas at Austin.
CS Inductive Bias1 Inductive Bias: How to generalize on novel data.
1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 2-Concept Learning (1/3) Eduardo Poggi
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 6.2: Classification Rules Rodney Nielsen Many.
Decision Tree Learning Presented by Ping Zhang Nov. 26th, 2007.
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
Decision Trees.
1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 4-Inducción de árboles de decisión (1/2) Eduardo Poggi.
Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
Decision Tree Learning DA514 - Lecture Slides 2 Modified and expanded from: E. Alpaydin-ML (chapter 9) T. Mitchell-ML.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
CSE573 Autumn /11/98 Machine Learning Administrative –Finish this topic –The rest of the time is yours –Final exam Tuesday, Mar. 17, 2:30-4:20.
Chapter 6 Decision Tree.
CS 9633 Machine Learning Decision Tree Learning
Decision Tree Learning
C4.5 - pruning decision trees
Issues in Decision-Tree Learning Avoiding overfitting through pruning
Decision Tree Saed Sayad 9/21/2018.
Introduction to Data Mining, 2nd Edition by
Classification and Prediction
Introduction to Data Mining, 2nd Edition by
Introduction to Data Mining, 2nd Edition by
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning: Lecture 3
Machine Learning Chapter 3. Decision Tree Learning
CSCI N317 Computation for Scientific Applications Unit Weka
Machine Learning in Practice Lecture 17
©Jiawei Han and Micheline Kamber
Avoid Overfitting in Classification
Presentation transcript:

1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 5-Inducción de árboles de decisión (2/2) Eduardo Poggi Ernesto Mislej otoño de 2005

2 Decision Trees Definition Mechanism Splitting Functions Hypothesis Space and Bias Issues in Decision-Tree Learning Avoiding overfitting through pruning Numeric and Missing attributes

3 Example of a Decision Tree Example: Learning to classify stars. Luminosity Mass Type A Type B Type C > T1 <= T1 > T2 <= T2

4 Short vs Long Hypotheses We mentioned a top-down, greedy approach to constructing decision trees denotes a preference of short hypotheses over long hypotheses. We mentioned a top-down, greedy approach to constructing decision trees denotes a preference of short hypotheses over long hypotheses. Why is this the right thing to do? Occam’s Razor: Prefer the simplest hypothesis that fits the data. Back since William of Occam (1320). Great debate in the philosophy of science.

5 Issues in Decision Tree Learning Practical issues while building a decision tree can be enumerated as follows: 1)How deep should the tree be? 2)How do we handle continuous attributes? 3)What is a good splitting function? 4)What happens when attribute values are missing? 5)How do we improve the computational efficiency?

6 How deep should the tree be? Overfitting the Data A tree overfits the data if we let it grow deep enough so that it begins to capture “aberrations” in the data that harm the predictive power on unseen examples: size t2 t3 humidity Possibly just noise, but the tree is grown deeper to capture these examples

7 Overtting the Data: Definition Assume a hypothesis space H. We say a hypothesis h in H overfits a dataset D if there is another hypothesis h’ in H where h has better classification accuracy than h’ on D but worse classification accuracy than h’ on D’ Size of the tree training data testing data overfitting

8 Causes for Overtting the Data What causes a hypothesis to overfit the data? 1)Random errors or noise Examples have incorrect class label or Examples have incorrect class label or incorrect attribute values. incorrect attribute values. 2)Coincidental patterns By chance examples seem to deviate from a pattern due to By chance examples seem to deviate from a pattern due to the small size of the sample. the small size of the sample. Overfitting is a serious problem that can cause strong performance degradation.

9 Solutions for Overtting the Data There are two main classes of solutions: 1)Stop the tree early before it begins to overfit the data. + In practice this solution is hard to implement because it + In practice this solution is hard to implement because it is not clear what is a good stopping point. is not clear what is a good stopping point. 2) Grow the tree until the algorithm stops even if the overfitting problem shows up. Then prune the tree as a post-processing problem shows up. Then prune the tree as a post-processing step. step. + This method has found great popularity in the machine + This method has found great popularity in the machine learning community. learning community.

10 Decision Tree Pruning 1.) Grow the tree to learn the training data training data 2.) Prune tree to avoid overfitting the data the data

11 Methods to Validate the New Tree 1.Training and Validation Set Approach Divide dataset D into a training set TR and a testing set TE Divide dataset D into a training set TR and a testing set TE Build a decision tree on TR Build a decision tree on TR Test pruned trees on TE to decide the best final tree. Test pruned trees on TE to decide the best final tree. Dataset D Training TR Testing TE

12 Methods to Validate the New Tree 2. Use a statistical test Use all dataset D for training Use all dataset D for training Use a statistical test to decide if you should expand Use a statistical test to decide if you should expand the node or not (e.g., chi squared). the node or not (e.g., chi squared). Should I expand or not?

13 Methods to Validate the New Tree 3.Use an encoding scheme to capture the size of the tree and the errors made by the tree. errors made by the tree. Use all dataset D for training Use all dataset D for training Use the encoding scheme to know when to stop Use the encoding scheme to know when to stop growing the tree. growing the tree. The method is know as minimum description length The method is know as minimum description length principle. principle.

14 Training and Validation There are two approaches: A.Reduced Error Pruning B.Rule Post-Pruning Dataset D Training TR (normally 2/3 of D) Testing TE (normally 1/3 of D)

15 Reduced Error Pruning Main Idea: 1) Consider all internal nodes in the tree. 2)For each node check if removing it (along with the subtree below it) and assigning the most common class to it does below it) and assigning the most common class to it does not harm accuracy on the validation set. not harm accuracy on the validation set. 3)Pick the node n* that yields the best performance and prune its subtree. its subtree. 4) Go back to (2) until no more improvements are possible.

16 Example Original Tree Possible trees after pruning:

17 Example Pruned Tree Possible trees after 2 nd pruning:

18 Example Process continues until no improvement is observed on the validation set: Size of the tree validation data Stop pruning the tree

19 Reduced Error Pruning Disadvantages:  If the original data set is small, separating examples away for validation may leave you with few examples for training. validation may leave you with few examples for training. Dataset D Training TR Testing TE Small dataset Training set is too small and so is the validation set

20 Rule Post-Pruning Main Idea: 1) Convert the tree into a rule-based system. 2)Prune every single rules first by removing redundant conditions. conditions. 3) Sort rules by accuracy.

21 Example x1 x2 x3 A B A C Original tree Rules: ~x1 & ~x2 -> Class A ~x1 & x2 -> Class B x1 & ~x3 -> Class A x1 & x3 -> Class C Possible rules after pruning (based on validation set): ~x1 -> Class A ~x1 & x2 -> Class B ~x3 -> Class A ~x3 -> Class A x1 & x3 -> Class C

22 Advantages of Rule Post-Pruning  The language is more expressive  Improves on interpretability  Pruning is more flexible  In practice this method yields high accuracy performance

23 Decision Trees Definition Mechanism Splitting Functions Hypothesis Space and Bias Issues in Decision-Tree Learning Avoiding overfitting through pruning Numeric and Missing attributes

24 Discretizing Continuous Attributes Example: attribute temperature. 1) Order all values in the training set 2) Consider only those cut points where there is a change of class 3) Choose the cut point that maximizes information gain temperature

25 Missing Attribute Values We are at a node n in the decision tree. Different approaches: 1)Assign the most common value for that attribute in node n. 2)Assign the most common value in n among examples with the same classification as X. same classification as X. 3)Assign a probability to each value of the attribute based on the frequency of those values in node n. Each fraction is propagated frequency of those values in node n. Each fraction is propagated down the tree. down the tree. Example: X = (luminosity > T1, mass = ?)

26 Summary  Decision-tree induction is a popular approach to classification that enables us to interpret the output hypothesis. that enables us to interpret the output hypothesis.  The hypothesis space is very powerful: all possible DNF formulas.  We prefer shorter trees than larger trees.  Overfitting is an important issue in decision-tree induction.  Different methods exist to avoid overfitting like reduced-error pruning and rule post-processing. pruning and rule post-processing.  Techniques exist to deal with continuous attributes and missing attribute values. attribute values.

27 Tareas Leer Cap 3 de Mitchel desde 3.7 en adelante