Decision Tree Learning Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 25, 2014.

Slides:



Advertisements
Similar presentations
Chapter 7 Classification and Regression Trees
Advertisements

Data Mining Lecture 9.
CHAPTER 9: Decision Trees
Data Mining Techniques: Classification. Classification What is Classification? –Classifying tuples in a database –In training set E each tuple consists.
Hunt’s Algorithm CIT365: Data Mining & Data Warehousing Bajuna Salehe
Introduction Training Complexity, Pruning CART vs. ID3 vs. C4.5
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
1 Data Mining Classification Techniques: Decision Trees (BUSINESS INTELLIGENCE) Slides prepared by Elizabeth Anglo, DISCS ADMU.
Decision Tree.
Data Mining Classification This lecture node is modified based on Lecture Notes for Chapter 4/5 of Introduction to Data Mining by Tan, Steinbach, Kumar,
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Classification: Decision Trees, and Naïve Bayes etc. March 17, 2010 Adapted from Chapters 4 and 5 of the book Introduction to Data Mining by Tan, Steinbach,
Decision Trees Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei Han.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Classification and Prediction
Radosław Wesołowski Tomasz Pękalski, Michal Borkowicz, Maciej Kopaczyński
1 Classification with Decision Trees Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
Induction of Decision Trees
1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
Basic Data Mining Techniques
Decision Trees an Introduction.
Three kinds of learning
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Example of a Decision Tree categorical continuous class Splitting Attributes Refund Yes No NO MarSt Single, Divorced Married TaxInc NO < 80K > 80K.
Classification.
Classification: Decision Trees 2 Outline  Top-Down Decision Tree Construction  Choosing the Splitting Attribute  Information Gain and Gain Ratio.
Decision Tree Models in Data Mining
Decision Tree Learning
Chapter 7 Decision Tree.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Data Mining: Classification
Lecture Notes 4 Pruning Zhangxi Lin ISQS
1 Statistics 202: Statistical Aspects of Data Mining Professor David Mease Tuesday, Thursday 9:00-10:15 AM Terman 156 Lecture 11 = Finish ch. 4 and start.
Chapter 9 – Classification and Regression Trees
Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.
Computational Intelligence: Methods and Applications Lecture 19 Pruning of decision trees Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Feature Selection: Why?
Business Intelligence and Decision Modeling Week 9 Customer Profiling Decision Trees (Part 2) CHAID CRT.
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
Copyright © 2010 SAS Institute Inc. All rights reserved. Decision Trees Using SAS Sylvain Tremblay SAS Canada – Education SAS Halifax Regional User Group.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
Decision Trees Example of a Decision Tree categorical continuous class Refund MarSt TaxInc YES NO YesNo Married Single, Divorced < 80K> 80K Splitting.
Machine Learning Decision Trees. E. Keogh, UC Riverside Decision Tree Classifier Ross Quinlan Antenna Length Abdomen Length.
Classification and Prediction
Lecture Notes for Chapter 4 Introduction to Data Mining
Machine Learning: Decision Trees Homework 4 assigned courtesy: Geoffrey Hinton, Yann LeCun, Tan, Steinbach, Kumar.
ECE 471/571 – Lecture 20 Decision Tree 11/19/15. 2 Nominal Data Descriptions that are discrete and without any natural notion of similarity or even ordering.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Decision Trees.
Classification and Regression Trees
Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
BY International School of Engineering {We Are Applied Engineering} Disclaimer: Some of the Images and content have been taken from multiple online sources.
Decision Tree Learning DA514 - Lecture Slides 2 Modified and expanded from: E. Alpaydin-ML (chapter 9) T. Mitchell-ML.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
10. Decision Trees and Markov Chains for Gene Finding.
DECISION TREES An internal node represents a test on an attribute.
Decision Trees an introduction.
Decision Trees (suggested time: 30 min)
Ch9: Decision Trees 9.1 Introduction A decision tree:
Data Science Algorithms: The Basic Methods
Introduction to Data Mining, 2nd Edition by
Classification and Prediction
Introduction to Data Mining, 2nd Edition by
Decision Trees By Cole Daily CSCI 446.
Statistical Learning Dong Liu Dept. EEIS, USTC.
INTRODUCTION TO Machine Learning
Presentation transcript:

Decision Tree Learning Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 25, 2014

Example: Age, Income and Owning a flat 2 Monthly income (thousand rupees) Age Training set Owns a house Does not own a house  If the training data was as above – Could we define some simple rules by observation?  Any point above the line L 1  Owns a house  Any point to the right of L 2  Owns a house  Any other point  Does not own a house L1L1 L2L2

Example: Age, Income and Owning a flat 3 Monthly income (thousand rupees) Age Training set Owns a house Does not own a house L1L1 L2L2 Root node: Split at Income = 101 Income ≥ 101: Label = Yes Income < 101: Split at Age = 54 Age ≥ 54: Label = YesAge < 54: Label = No In general, the data won’t be such as above

Example: Age, Income and Owning a flat 4 Monthly income (thousand rupees) Age Training set Owns a house Does not own a house  Approach: recursively split the data into partitions so that each partition becomes purer till … How to decide the split? How to measure purity? When to stop?

Approach for splitting  What are the possible lines for splitting? – For each variable, midpoints between pairs of consecutive values for the variable – How many? – If N = number of points in training set and m = number of variables – About O(N × m)  How to choose which line to use for splitting? – The line which reduce impurity (~ heterogeneity of composition) the most  How to measure impurity? 5

Gini Index for Measuring Impurity  Suppose there are C classes  Let p(i|t) = fraction of observations belonging to class i in rectangle (node) t  Gini index: 6  If all observations in t belong to one single class Gini(t) = 0  When is Gini(t) maximum?

Entropy  Average amount of information contained  From another point of view – average amount of information expected – hence amount of uncertainty – We will study this in more detail later  Entropy: 7 Where 0 log 2 0 is defined to be 0

Classification Error  What if we stop the tree building at a node – That is, do not create any further branches for that node – Make that node a leaf – Classify the node with the most frequent class present in the node  Classification error as measure of impurity 8 This rectangle (node) is still impure  Intuitively – the impurity of the most frequent class in the rectangle (node)

The Full Blown Tree  Recursive splitting  Suppose we don’t stop until all nodes are pure  A large decision tree with leaf nodes having very few data points – Does not represent classes well – Overfitting  Solution: – Stop earlier, or – Prune back the tree 9 Root Number of points Statistically not significant

Prune back  Pruning step: collapse leaf nodes and make the immediate parent a leaf node  Effect of pruning – Lose purity of nodes – But were they really pure or was that a noise? – Too many nodes ≈ noise  Trade-off between loss of purity and gain in complexity 10 Leaf node (label = Y) Freq = 5 Leaf node (label = Y) Freq = 5 Decision node (Freq = 7) Decision node (Freq = 7) Leaf node (label = B) Freq = 2 Leaf node (label = B) Freq = 2 Leaf node (label = Y) Freq = 7 Leaf node (label = Y) Freq = 7 Prune

Prune back: cost complexity  Cost complexity of a (sub)tree:  Classification error (based on training data) and a penalty for size of the tree 11 Leaf node (label = Y) Freq = 5 Leaf node (label = Y) Freq = 5 Decision node (Freq = 7) Decision node (Freq = 7) Leaf node (label = B) Freq = 2 Leaf node (label = B) Freq = 2 Leaf node (label = Y) Freq = 7 Leaf node (label = Y) Freq = 7 Prune  Err(T) is the classification error  L(T) = number of leaves in T  Penalty factor α is between 0 and 1 – If α=0, no penalty for bigger tree

Different Decision Tree Algorithms  Chi-square Automatic Interaction Detector (CHAID) – Gordon Kass (1980) – Stop subtree creation if not statistically significant by chi-square test  Classification and Regression Trees (CART) – Breiman et al. – Decision tree building by Gini’s index  Iterative Dichotomizer 3 (ID3) – Ross Quinlan (1986) – Splitting by information gain (difference in entropy)  C4.5 – Quinlan’s next algorithm, improved over ID3 – Bottom up pruning, both categorical and continuous variables – Handling of incomplete data points  C5.0 – Ross Quinlan’s commercial version 12

Properties of Decision Trees  Non parametric approach – Does not require any prior assumptions regarding the probability distribution of the class and attributes  Finding an optimal decision tree is an NP-complete problem – Heuristics used: greedy, recursive partitioning, top-down, bottom-up pruning  Fast to generate, fast to classify  Easy to interpret or visualize  Error propagation – An error at the top of the tree propagates all the way down 13

References  Introduction to Data Mining, by Tan, Steinbach, Kumar – Chapter 4 is available online: users.cs.umn.edu/~kumar/dmbook/ch4.pdfhttp://www- users.cs.umn.edu/~kumar/dmbook/ch4.pdf 14