Decision Trees (2). Numerical attributes Tests in nodes are of the form f i > constant.

Slides:



Advertisements
Similar presentations
Describing Quantitative Variables
Advertisements

Brief introduction on Logistic Regression
CHAPTER 9: Decision Trees
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Hunt’s Algorithm CIT365: Data Mining & Data Warehousing Bajuna Salehe
Introduction Training Complexity, Pruning CART vs. ID3 vs. C4.5
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
Statistical Classification Rong Jin. Classification Problems X Input Y Output ? Given input X={x 1, x 2, …, x m } Predict the class label y  Y Y = {-1,1},
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Decision Tree under MapReduce Week 14 Part II. Decision Tree.
Regression. So far, we've been looking at classification problems, in which the y values are either 0 or 1. Now we'll briefly consider the case where.
Nearest Neighbor. Predicting Bankruptcy Nearest Neighbor Remember all your data When someone asks a question –Find the nearest old data point –Return.
Data Quality Class 9. Rule Discovery Decision and Classification Trees Association Rules.
Chapter 12 Simple Regression
More on Decision Trees. Numerical attributes Tests in nodes can be of the form x j > constant Divides the space into rectangles.
Ensemble Learning: An Introduction
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Evaluating Hypotheses
Example of a Decision Tree categorical continuous class Splitting Attributes Refund Yes No NO MarSt Single, Divorced Married TaxInc NO < 80K > 80K.
DAST 2005 Week 4 – Some Helpful Material Randomized Quick Sort & Lower bound & General remarks…
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
1 Chapter 17: Introduction to Regression. 2 Introduction to Linear Regression The Pearson correlation measures the degree to which a set of data points.
Ensemble Learning (2), Tree and Forest
Decision Tree Models in Data Mining
Microsoft Enterprise Consortium Data Mining Concepts Introduction to Directed Data Mining: Decision Trees Prepared by David Douglas, University of ArkansasHosted.
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Introduction to Directed Data Mining: Decision Trees
Inference for regression - Simple linear regression
Game Trees: MiniMax strategy, Tree Evaluation, Pruning, Utility evaluation Adapted from slides of Yoonsuck Choe.
Minimax Trees: Utility Evaluation, Tree Evaluation, Pruning CPSC 315 – Programming Studio Spring 2008 Project 2, Lecture 2 Adapted from slides of Yoonsuck.
Inductive learning Simplest form: learn a function from examples
by B. Zadrozny and C. Elkan
Chapter 3 Descriptive Measures
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Chapter 9 – Classification and Regression Trees
Decision Trees Jyh-Shing Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 3 Section 2 – Slide 1 of 27 Chapter 3 Section 2 Measures of Dispersion.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
Exercises Decision Trees In decision tree learning, the information gain criterion helps us select the best attribute to split the data at every node.
Decision Trees Example of a Decision Tree categorical continuous class Refund MarSt TaxInc YES NO YesNo Married Single, Divorced < 80K> 80K Splitting.
DECISION TREE Ge Song. Introduction ■ Decision Tree: is a supervised learning algorithm used for classification or regression. ■ Decision Tree Graph:
Measurements and Their Analysis. Introduction Note that in this chapter, we are talking about multiple measurements of the same quantity Numerical analysis.
Classification and Regression Trees
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.
Standard Deviation Variance and Range. Standard Deviation:  Typical distance of observations from their mean  A numerical summary that measures the.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
CPH Dr. Charnigo Chap. 11 Notes Figure 11.2 provides a diagram which shows, at a glance, what a neural network does. Inputs X 1, X 2,.., X P are.
Unsupervised Learning
Regression and Correlation
Data Transformation: Normalization
DECISION TREES An internal node represents a test on an attribute.
Measures of Dispersion
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
C4.5 - pruning decision trees
Ch9: Decision Trees 9.1 Introduction A decision tree:
Introduction to Data Mining, 2nd Edition by
Introduction to Data Mining, 2nd Edition by
Chapter 9 Hypothesis Testing.
Lecture 05: Decision Trees
Basic Practice of Statistics - 3rd Edition
Basic Practice of Statistics - 3rd Edition
Chapter 7: Transformations
Unsupervised Learning
Data Mining CSCI 307, Spring 2019 Lecture 6
Presentation transcript:

Decision Trees (2)

Numerical attributes Tests in nodes are of the form f i > constant

Numerical attributes Tests in nodes can be of the form f i > constant Divides the space into rectangles.

Predicting Bankruptcy (Leslie Kaebling’s example, MIT courseware)

Considering splits Consider splitting between each data point in each dimension. So, here we'd consider 9 different splits in the R dimension

Considering splits II And there are another 6 possible splits in the L dimension –because L is an integer, really, there are lots of duplicate L values.

Bankruptcy Example

We consider all the possible splits in each dimension, and compute the average entropies of the children. And we see that, conveniently, all the points with L not greater than 1.5 are of class 0, so we can make a leaf there.

Bankruptcy Example Now, we consider all the splits of the remaining part of space. Note that we have to recalculate all the average entropies again, because the points that fall into the leaf node are taken out of consideration.

Bankruptcy Example Now the best split is at R > 0.9. And we see that all the points for which that's true are positive, so we can make another leaf.

Bankruptcy Example Continuing in this way, we finally obtain:

Alternative Splitting Criteria based on GINI We have used so far as splitting criteria the entropy at the given node, which is computed by: (NOTE: p( j | t ) is the relative frequency of class j at node t ). Another alternative is to use the GINI computed by: Both, have: –Maximum (1 - 1/n c ) when records are equally distributed among all classes, implying least interesting information –Minimum (0.0) when all records belong to one class, implying most interesting information For a 2-class problem:

Regression Trees Like decision trees, but with real-valued constant outputs at the leaves.

Leaf values Assume that multiple training points are in the leaf and we have decided, for whatever reason, to stop splitting. –In the boolean case, we use the majority output value as the value for the leaf. –In the numeric case, we'll use the average output value. So, if we're going to use the average value at a leaf as its output, we'd like to split up the data so that the leaf averages are not too far away from the actual items in the leaf. Statistics has a good measure of how spread out a set of numbers is –(and, therefore, how different the individuals are from the average); –it's called the variance of a set.

Variance Measure of how much spread out a set of numbers is. Mean of m values, z 1 through z m : Variance is essentially the average of the squared distance between the individual values and the mean. –If it's the average, then you might wonder why we're dividing by m-1 instead of m. –Dividing by m-1 makes it an unbiased estimator, which is a good thing.

Splitting We're going to use the average variance of the children to evaluate the quality of splitting on a particular feature. Here we have a data set, for which just indicated the y values have been indicated.

Splitting Just as we did in the binary case, we can compute a weighted average variance, depending on the relative sizes of the two sides of the split.

Splitting We can see that the average variance of splitting on feature 3 is much lower than of splitting on f7, and so we'd choose to split on f3.

Stopping Stop when the variance at the leaf is small enough. Then, set the value at the leaf to be the mean of the y values of the elements.