Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.

Slides:



Advertisements
Similar presentations
Eco 5385 Predictive Analytics For Economists Spring 2014 Professor Tom Fomby Director, Richard B. Johnson Center for Economic Studies Department of Economics.
Advertisements

Chapter 7 Classification and Regression Trees
Random Forest Predrag Radenković 3237/10
Brief introduction on Logistic Regression
CHAPTER 9: Decision Trees
Introduction Training Complexity, Pruning CART vs. ID3 vs. C4.5
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
Decision Tree.
C4.5 - pruning decision trees
Classification Techniques: Decision Tree Learning
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Lecture Notes for Chapter 4 Introduction to Data Mining
Decision Tree Rong Jin. Determine Milage Per Gallon.
Decision Tree Algorithm
Three kinds of learning
Decision Trees (2). Numerical attributes Tests in nodes are of the form f i > constant.
Example of a Decision Tree categorical continuous class Splitting Attributes Refund Yes No NO MarSt Single, Divorced Married TaxInc NO < 80K > 80K.
ICS 273A Intro Machine Learning
Classification.
Ensemble Learning (2), Tree and Forest
Decision Tree Models in Data Mining
Microsoft Enterprise Consortium Data Mining Concepts Introduction to Directed Data Mining: Decision Trees Prepared by David Douglas, University of ArkansasHosted.
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Introduction to Directed Data Mining: Decision Trees
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
Lecture Notes 4 Pruning Zhangxi Lin ISQS
Machine Learning Chapter 3. Decision Tree Learning
1 Statistics 202: Statistical Aspects of Data Mining Professor David Mease Tuesday, Thursday 9:00-10:15 AM Terman 156 Lecture 11 = Finish ch. 4 and start.
Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.
Biostatistics Case Studies 2005 Peter D. Christenson Biostatistician Session 5: Classification Trees: An Alternative to Logistic.
Chapter 9 – Classification and Regression Trees
Review - Decision Trees
Zhangxi Lin ISQS Texas Tech University Note: Most slides are from Decision Tree Modeling by SAS Lecture Notes 5 Auxiliary Uses of Trees.
K Nearest Neighbors Classifier & Decision Trees
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Decision Tree Learning Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 25, 2014.
Business Intelligence and Decision Modeling Week 9 Customer Profiling Decision Trees (Part 2) CHAID CRT.
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
Copyright © 2010 SAS Institute Inc. All rights reserved. Decision Trees Using SAS Sylvain Tremblay SAS Canada – Education SAS Halifax Regional User Group.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Decision Trees Example of a Decision Tree categorical continuous class Refund MarSt TaxInc YES NO YesNo Married Single, Divorced < 80K> 80K Splitting.
Lecture Notes for Chapter 4 Introduction to Data Mining
Machine Learning: Decision Trees Homework 4 assigned courtesy: Geoffrey Hinton, Yann LeCun, Tan, Steinbach, Kumar.
Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.
Decision Trees.
Classification and Regression Trees
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.
Ensemble Learning, Boosting, and Bagging: Scaling up Decision Trees (with thanks to William Cohen of CMU, Michael Malohlava of 0xdata, and Manish Amde.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.
CPH Dr. Charnigo Chap. 11 Notes Figure 11.2 provides a diagram which shows, at a glance, what a neural network does. Inputs X 1, X 2,.., X P are.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
Introduction to Machine Learning and Tree Based Methods
C4.5 - pruning decision trees
Eco 6380 Predictive Analytics For Economists Spring 2016
Ch9: Decision Trees 9.1 Introduction A decision tree:
Introduction to Data Mining, 2nd Edition by
Introduction to Data Mining, 2nd Edition by
Eco 6380 Predictive Analytics For Economists Spring 2016
Introduction to Data Mining, 2nd Edition by
Eco 6380 Predictive Analytics For Economists Spring 2014
Eco 6380 Predictive Analytics For Economists Spring 2016
Machine Learning: Lecture 3
Lecture 05: Decision Trees
Classification with CART
STT : Intro. to Statistical Learning
Presentation transcript:

Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU

Presentation 12 Classification and Regression Trees: CART and CHAID Chapter 9 in SPB

Some Basic Facts about Recursively-Built Decision Trees Decision Trees are made up of a “Root” node, followed by “decision” nodes and, lastly, ending in “final” nodes. The basic algorithm that is used to fit decision trees to data (either interval or categorical, it doesn’t matter) is recursive partitioning of the input space in the Training data set. This algorithm grows a tree, one decision node at a time. The order in which these nodes are added (and hence the “building” sequence) is UNIQUE. Once a node enters a tree it remains there (in its order) forever with the decision tree “split points” for each node staying the same as the tree grows. (Hence the term “recursive.”) This is shown in the following set of three slides that describe the recursive partitioning of the Boston Housing Data with the interval variable “medv” being the target variable.

First Split First Decision Node

Second Split Second Decision Node

Third Split Third Decision Node

The Uniqueness of the Tree Notice, as this tree is being built recursively, the nodes enter in a particular order and the “split” points (node 1 = 9.54 on LSTAT then node 2 = on RM, and then node 3 = on LSTAT) remain unchanged as the tree grows larger. That is, as the tree is being built by recursive partitioning, the order of previous nodes and their respective split points stay the same. (In essence this is a Directed Search and Fitting of the Data.) In the first split the mean of all of the response points in the training set input space such that LSTAT > 9.54 is while the mean of the response points in the training set with LSTAT < 9.54 is (See the final node values.) In the second split the input space is partitioned into 3 regions with the means of the training set response points of each of the input regions defined by the two splits being , , and The third split leads to the input space being partitioned into 4 regions with the means of the training set response points in each of the input regions defined by the three splits being , , , and (Notice the number of terminal nodes always exceeds the number of decision nodes by one.) The sequence of the splits (and hence the dividing up of the input space) is done in such a way as to maximize the reduction in the sum of squared errors of the tree’s fit to the training data with each split where each partition of the input space uses the mean of the training set responses in that particular region. Notice, in this case, the target variable is a numeric value. We discuss the determination of splits for binary classification problems in the next slide. For more detailed discussion of the mathematical representation of this step function representation of output data via decision trees see the PDF titled “Regression and Classification Trees_Fomby.pdf” on the class website.

Determining Splits for Classification Trees At least three criteria can be used to decides on splits in building a classification tree: Accuracy Rate, Entropy, and Gini Coefficient. These measures are called impurity measures. What one wants to do is choose a split so as to reduce the impurity of the classifications in the final nodes of the tree as much as possible. These impurity measures for the tree t are defined in the following slide.

Impurity Measures

Pruning Trees How do you keep from overtraining a decision tree? Answer: You prune it. The idea behind pruning is that a very large tree is likely to overfit the training data set in the sense that if a “full” tree is used to score an independent data set its accuracy is likely to be significantly attenuated. There are essentially two ways of pruning a full tree: (1) the pre-pruning (early stopping rule) approach as in the CHAID model or (2) the post-pruning approach as in the CART model. In the pre-pruning approach the tree- growing algorithm is halted before generating a fully grown tree that perfectly fits the entire training data set. Unfortunately, it is sometimes difficult to choose the right threshold for early termination. Too high of a threshold will result in underfitted models, while a threshold that is set too low may not be sufficient to overcome the model overfitting problem. In the post-pruning approach, the decision tree is initially grown to its maximum size. This is followed by a tree-pruning step that trims the fully grown tree in reverse order in order to obtain an “optimal” trimmed tree. XLMINER provides the user two possible pre-pruning approaches. The user can limit the growth of the tree by utilizing tuning parameters that can limit the size of a tree. Two popular tuning parameters are (1) the minimum number of cases that are required of all final nodes and (2) the maximum number of decision nodes that are allowed in building the tree. XLMINER supports both of these tuning parameters.

CART and CHAID The CART (Classification and Regression Tree) method uses the training data set to build a “full” tree with many, many decision nodes and since there is a unique sequence in which the tree was build, it can be pruned in reverse sequence giving rise to a sequence of “pruned” trees each of which has one less decision node than its previous family member. The pruning of the tree stops when one finds a minimal VALIDATION RMSE pruned tree for prediction or maximum validation accuracy tree in categorical modeling). These models are called the minimum mean square error trees for prediction of a numeric target variable or maximum accuracy rate trees for classification. In turn they can be used in scoring the Test data set and thereby be compared to other prediction or classification models. Thus the “tuning” parameter of the CART method is the number of decision nodes to use in the tree to optimize Validation data set performance. This pruning exercise on the Validation data set is, of course, intended to prevent the building of an over-trained decision tree that would perform poorly on an independent data set. The CHAID (Chi-Squared Automatic interaction Detection) method, on the other hand, uses the in-sample (training set) statistical significance of adding additional nodes to see if, in fact, another node should be added. This also is intended to prevent the over-training of the decision tree. The tuning parameter of the CHAID method is the chosen level of significance of the statistical test. The greater the significance level ( i.e. the smaller the p- value) of the statistical test (a Chi-square statistic) of the reduction in impurity arising from the new split, the fewer the decision nodes of the final tree and vice versa. The CHAID chosen tree can then be used to score the Validation and Test data sets independently. In the next two presentations we discuss two additional classifiers, the Naïve Bayes Classifier and the Logit Classifier. Two other classification methods that we don’t discuss are the C4.5 and SVM (Support Vector Machines) classifiers. See P. Tan, M. Steinbach, and V. Kumar, Introduction to Data Mining (Pearson, 2006) for more details on these two classifiers.

More information on Decision Trees: For a discussion of how Regression Trees can be viewed as regression models on indicator variables, see the pdf file “Trees as Indicator Regressions.pdf” on the class website. Other pdf files that you might look at are “Regression and Classification Trees_ Fomby.pdf” and “Regression and Classification Trees.pdf,” an excerpt from The Elements of Statistical Learning by T. Hastie, R. Tibshirani, and J. Friedman on the class website

Classroom Exercise: Exercise 9