Introduction to Data Mining, 2nd Edition by

Slides:



Advertisements
Similar presentations
On the Optimality of Probability Estimation by Random Decision Trees Wei Fan IBM T.J.Watson.
Advertisements

Is Random Model Better? -On its accuracy and efficiency-
Lecture 6: Model Overfitting and Classifier Evaluation
DECISION TREES. Decision trees  One possible representation for hypotheses.
Decision Trees Decision tree representation ID3 learning algorithm
CHAPTER 9: Decision Trees
CHAPTER 2: Supervised Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Learning a Class from Examples.
RIPPER Fast Effective Rule Induction
Introduction Training Complexity, Pruning CART vs. ID3 vs. C4.5
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Classification: Definition l Given a collection of records (training set) l Find a model.
Data Mining Classification This lecture node is modified based on Lecture Notes for Chapter 4/5 of Introduction to Data Mining by Tan, Steinbach, Kumar,
Lecture Notes for Chapter 4 (2) Introduction to Data Mining
Classification Techniques: Decision Tree Learning
Chapter 7 – Classification and Regression Trees
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
Lecture Notes for Chapter 4 Part III Introduction to Data Mining
Classification: Decision Trees, and Naïve Bayes etc. March 17, 2010 Adapted from Chapters 4 and 5 of the book Introduction to Data Mining by Tan, Steinbach,
Lecture outline Classification Decision-tree classification.
Ensemble Learning: An Introduction
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Example of a Decision Tree categorical continuous class Splitting Attributes Refund Yes No NO MarSt Single, Divorced Married TaxInc NO < 80K > 80K.
Classification.
Ensemble Learning (2), Tree and Forest
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Machine Learning Chapter 3. Decision Tree Learning
Chapter 9 – Classification and Regression Trees
Chapter 4 Classification. 2 Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of.
Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation COSC 4368.
CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.
Christoph Eick: Learning Models to Predict and Classify 1 Learning from Examples Example of Learning from Examples  Classification: Is car x a family.
Practical Issues of Classification Underfitting and Overfitting –Training errors –Generalization (test) errors Missing Values Costs of Classification.
1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 5-Inducción de árboles de decisión (2/2) Eduardo Poggi.
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Decision Trees Example of a Decision Tree categorical continuous class Refund MarSt TaxInc YES NO YesNo Married Single, Divorced < 80K> 80K Splitting.
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Machine Learning: Decision Trees Homework 4 assigned courtesy: Geoffrey Hinton, Yann LeCun, Tan, Steinbach, Kumar.
Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
BY International School of Engineering {We Are Applied Engineering} Disclaimer: Some of the Images and content have been taken from multiple online sources.
Decision Tree Learning DA514 - Lecture Slides 2 Modified and expanded from: E. Alpaydin-ML (chapter 9) T. Mitchell-ML.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Machine Learning: Ensemble Methods
Computational Biology
Decision Trees II CSC 600: Data Mining Class 9.
Lecture Notes for Chapter 4 Introduction to Data Mining
Decision Tree Learning
Ch9: Decision Trees 9.1 Introduction A decision tree:
Classification Decision Trees
Christoph Eick: Learning Models to Predict and Classify 1 Learning from Examples Example of Learning from Examples  Classification: Is car x a family.
Data Mining Classification: Alternative Techniques
Issues in Decision-Tree Learning Avoiding overfitting through pruning
EECS 647: Introduction to Database Systems
Data Mining Classification: Basic Concepts and Techniques
Introduction to Data Mining, 2nd Edition by
Lecture Notes for Chapter 4 Introduction to Data Mining
Introduction to Data Mining, 2nd Edition by
Introduction to Data Mining, 2nd Edition
Machine Learning Chapter 3. Decision Tree Learning
Basic Concepts and Decision Trees
Machine Learning: Lecture 3
Machine Learning Chapter 3. Decision Tree Learning
آبان 96. آبان 96 Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan,
CSCI N317 Computation for Scientific Applications Unit Weka
INTRODUCTION TO Machine Learning 2nd Edition
INTRODUCTION TO Machine Learning 3rd Edition
COSC 4368 Intro Supervised Learning Organization
Presentation transcript:

Introduction to Data Mining, 2nd Edition by Model Overfitting Introduction to Data Mining, 2nd Edition by Tan, Steinbach, Karpatne, Kumar with slides added by Ch. Eick

Classification Errors Training errors (apparent errors) Errors committed on the training set Test errors Errors committed on the test set Generalization errors Expected error of a model over random selection of records from same distribution

Example Data Set Two class problem: + : 5200 instances 5000 instances generated from a Gaussian centered at (10,10) 200 noisy instances added o : 5200 instances Generated from a uniform distribution 10 % of the data used for training and 90% of the data used for testing

Increasing number of nodes in Decision Trees

Decision Tree with 4 nodes Decision boundaries on Training data

Decision Tree with 50 nodes Decision boundaries on Training data

Which tree is better? Which tree is better ? Decision Tree with 4 nodes Which tree is better ? Decision Tree with 50 nodes

Model Overfitting Underfitting: when model is too simple, both training and test errors are large Overfitting: when model is too complex, training error is small but test error is large

Model Overfitting Using twice the number of data instances If training data is under-representative, testing errors increase and training errors decrease on increasing number of nodes Increasing the size of training data reduces the difference between training and testing errors at a given number of nodes

Reasons for Model Overfitting Limited Training Set Size Non-Representative Training Examles High Model Complexity Multiple Comparison Procedure

Overfitting due to Noise Decision boundary is distorted by noise point

Overfitting due to Insufficient Examples Lack of data points in the lower half of the diagram makes it difficult to predict correctly the class labels of that region - Insufficient number of training records in the region causes the decision tree to predict the test examples using other training records that are irrelevant to the classification task

Occam’s Razor Given two models of similar generalization errors, one should prefer the simpler model over the more complex model For complex models, there is a greater chance that it was fitted accidentally by errors in data Usually, simple models are more robust with respect to noise

How to Address Overfitting Pre-Pruning (Early Stopping Rule) Stop the algorithm before it becomes a fully-grown tree Typical stopping conditions for a node: Stop if all instances belong to the same class Stop if all the attribute values are the same More restrictive conditions: Stop if number of instances is less than some user-specified threshold Stop if class distribution of instances are independent of the available features (e.g., using  2 test) Stop if expanding the current node does not improve impurity measures (e.g., Gini or information gain).

How to Address Overfitting… Post-pruning Grow decision tree to its entirety Trim the nodes of the decision tree in a bottom-up fashion If generalization error improves after trimming, replace sub-tree by a leaf node. Class label of leaf node is determined from majority class of instances in the sub-tree

Example of Post-Pruning One approach is to use a separate validation set---which is like a test set that is used during training, and assess validation set accuracy for different tree sizes, and pick the tree with the highest validation set accuracy, breaking ties in favor smaller trees. Class = Yes 20 Class = No 10 Error = 10/30 Class = Yes 8 Class = No 4 Class = Yes 3 Class = No 4 Class = Yes 4 Class = No 1 Class = Yes 5 Class = No 1

“Final” Notes on Overfitting Overfitting results in decision trees that are more complex than necessary: after learning knowledge they “tend to learn noise” More complex models tend to have more complicated decision boundaries and tend to be more sensitive to noise, missing examples,… Training error no longer provides a good estimate of how well the tree will perform on previously unseen records When learning complex models, large representative training sets are needed. Need “new” ways for estimating errors; e.g. an approach that considers both model complexity and training error.