Microsoft Enterprise Consortium Data Mining Concepts Introduction to Directed Data Mining: Decision Trees Prepared by David Douglas, University of ArkansasHosted.

Slides:



Advertisements
Similar presentations
Chapter 7 Classification and Regression Trees
Advertisements

DECISION TREES. Decision trees  One possible representation for hypotheses.
Random Forest Predrag Radenković 3237/10
Hunt’s Algorithm CIT365: Data Mining & Data Warehousing Bajuna Salehe
Decision Tree Approach in Data Mining
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
1 Data Mining Classification Techniques: Decision Trees (BUSINESS INTELLIGENCE) Slides prepared by Elizabeth Anglo, DISCS ADMU.
Decision Tree.
Deriving rules from data Decision Trees a.j.m.m (ton) weijters.
Classification Techniques: Decision Tree Learning
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Decision Tree Rong Jin. Determine Milage Per Gallon.
Decision Tree Algorithm
Ensemble Learning: An Introduction
1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
Tree-based methods, neutral networks
Three kinds of learning
Decision Trees (2). Numerical attributes Tests in nodes are of the form f i > constant.
Classification.
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Chapter 6 Decision Trees
Chapter 5 Data mining : A Closer Look.
Introduction to Directed Data Mining: Neural Networks
Decision Tree Models in Data Mining
Learning Chapter 18 and Parts of Chapter 20
Introduction to Directed Data Mining: Decision Trees
Microsoft Enterprise Consortium Data Mining Concepts Introduction: The essential background Prepared by David Douglas, University of ArkansasHosted by.
Introduction to undirected Data Mining: Clustering
Microsoft Enterprise Consortium Data Mining Concepts Introduction to Directed Data Mining: Neural Networks Prepared by David Douglas, University of ArkansasHosted.
PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning RASTOGI, Rajeev and SHIM, Kyuseok Data Mining and Knowledge Discovery, 2000, 4.4.
Data Mining Techniques
Next Generation Techniques: Trees, Network and Rules
Introduction: The essential background
INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.
Decision Trees.
1 Data Mining Lecture 3: Decision Trees. 2 Classification: Definition l Given a collection of records (training set ) –Each record contains a set of attributes,
Chapter 9 – Classification and Regression Trees
Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.
Computational Intelligence: Methods and Applications Lecture 19 Pruning of decision trees Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Zhangxi Lin ISQS Texas Tech University Note: Most slides are from Decision Tree Modeling by SAS Lecture Notes 5 Auxiliary Uses of Trees.
Decision Tree Learning Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 25, 2014.
Scaling up Decision Trees. Decision tree learning.
Decision Trees. Decision trees Decision trees are powerful and popular tools for classification and prediction. The attractiveness of decision trees is.
Business Intelligence and Decision Modeling Week 9 Customer Profiling Decision Trees (Part 2) CHAID CRT.
Decision Trees. MS Algorithms Decision Trees The basic idea –creating a series of splits, also called nodes, in the tree. The algorithm adds a node to.
CS690L Data Mining: Classification
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Copyright © 2010 SAS Institute Inc. All rights reserved. Decision Trees Using SAS Sylvain Tremblay SAS Canada – Education SAS Halifax Regional User Group.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
Christoph Eick: Learning Models to Predict and Classify 1 Learning from Examples Example of Learning from Examples  Classification: Is car x a family.
DECISION TREE Ge Song. Introduction ■ Decision Tree: is a supervised learning algorithm used for classification or regression. ■ Decision Tree Graph:
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Lecture Notes for Chapter 4 Introduction to Data Mining
ECE 471/571 – Lecture 20 Decision Tree 11/19/15. 2 Nominal Data Descriptions that are discrete and without any natural notion of similarity or even ordering.
Classification and Regression Trees
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Machine Learning Recitation 8 Oct 21, 2009 Oznur Tastan.
Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
BY International School of Engineering {We Are Applied Engineering} Disclaimer: Some of the Images and content have been taken from multiple online sources.
Classification Tree Interaction Detection. Use of decision trees Segmentation Stratification Prediction Data reduction and variable screening Interaction.
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
Ch9: Decision Trees 9.1 Introduction A decision tree:
Decision Tree Saed Sayad 9/21/2018.
Decision Trees.
Machine Learning: Lecture 3
Decision Trees By Cole Daily CSCI 446.
Presentation transcript:

Microsoft Enterprise Consortium Data Mining Concepts Introduction to Directed Data Mining: Decision Trees Prepared by David Douglas, University of ArkansasHosted by the University of Arkansas 1 Microsoft Enterprise Consortium

Decision Trees 2  A decision tree is a structure that can be used to divide a large collection of records into successively smaller sets of records by applying a sequence of simple decisions rules. —Berry and Linoff.  It consists of a set of rules for dividing a large heterogeneous population into smaller and smaller homogeneous groups based on a target variable.  A decision tree is a tree-structured plan of a set of attributes to test in order to predict the output. —Andrew Moore.  Target variable is usually categorical. Prepared by David Douglas, University of ArkansasHosted by the University of Arkansas

Microsoft Enterprise Consortium Uses of Decision Trees 3  Decision trees are popular for both classification and prediction (Supervised/Directed).  Attractive largely due to the fact that decision trees represent rules—expressed in both English and SQL.  Can also be used for data exploration—thus a powerful first step in model building. Prepared by David Douglas, University of ArkansasHosted by the University of Arkansas

Microsoft Enterprise Consortium Example Decision Tree  Note this is a binary tree—likely to respond or not.  Leaf nodes with 1 are likely to respond.  There are rules for getting from the root node to a leaf node. Prepared by David Douglas, University of ArkansasHosted by the University of Arkansas 4 Adapted from Berry and Linoff

Microsoft Enterprise Consortium Scoring  Binary classifications throw away useful information.  Thus, use of scores and probabilities is essential. Prepared by David Douglas, University of ArkansasHosted by the University of Arkansas 5

Microsoft Enterprise Consortium Decision Tree with Proportions Prepared by David Douglas, University of ArkansasHosted by the University of Arkansas 6 Adapted from Berry and Linoff

Microsoft Enterprise Consortium Some DM tools produce trees with more than 2 splits Prepared by David Douglas, University of ArkansasHosted by the University of Arkansas 7 Adapted from Berry and Linoff

Microsoft Enterprise Consortium Estimation  Although decision trees can be used to estimate continuous values, there are better ways to do it. So, there are currently no plans to use decision trees for estimation in our discussions.  Multiple Linear Regression and Neural Networks will be used for estimation. Prepared by David Douglas, University of ArkansasHosted by the University of Arkansas 8

Microsoft Enterprise Consortium Finding the Splits  A decision tree is built by splitting records at each node based on a single input field—thus there has to be a way to identify the input field that makes the best split in terms of the target variable.  Measure to evaluate the split is purity (Gini, Entropy, Information Gain, Chi-square for categorical target variables and variance reduction and F test for continuous target variables)  Tree building algorithms are exhaustive—try each variable to determine best one on which to split (increase in purity)—not recursive because it repeats itself on the children. Prepared by David Douglas, University of ArkansasHosted by the University of Arkansas 9

Microsoft Enterprise Consortium Splitting on a Numeric Variable  Binary split on a numeric input considers each value of the input variable.  Takes the form of X<N (N is a constant).  Because numeric inputs are only used to compare their values at the split points, decision trees are not sensitive to outliers or skewed distributions. Prepared by David Douglas, University of ArkansasHosted by the University of Arkansas 10

Microsoft Enterprise Consortium Splitting on a Categorical Variable  The simplest way is to split on each class (level) of the variable.  However, this often yields poor results because high branching factors quickly reduce the population of training records available for lower nodes.  An approach around this is to group the classes that, when taken individually, predict similar outcomes. Prepared by David Douglas, University of ArkansasHosted by the University of Arkansas 11

Microsoft Enterprise Consortium Splitting on Missing Values  This can be done by considering null as a possible value with its own branch.  Preferable to throwing out the record or imputing a value.  Null has been shown to have predictive value in a number of data mining projects. Prepared by David Douglas, University of ArkansasHosted by the University of Arkansas 12

Microsoft Enterprise Consortium Full Trees  Single value fields are eliminated—it cannot be split.  Full tree when it is not possible to do any more splits or to a predetermined depth.  Note—full trees may not be best at classifying a set of new records. Prepared by David Douglas, University of ArkansasHosted by the University of Arkansas 13

Microsoft Enterprise Consortium Building Decision Trees 14 Key points in building a decision tree  Purity  the idea is to split attributes in such a way as move from heterogeneous to homogenous based on target variable  Splitting algorithm (criterion) Repeat for each node. At a node, all attributes analyzed to determine the best variable on which to split (How to measure?) There are a number of algorithms and various implementations of the algorithms.  Stopping When a node is pure  leaf No more splits are possible. User defined parameters such as maximum depth or minimum number in a node. Prepared by David Douglas, University of ArkansasHosted by the University of Arkansas

Microsoft Enterprise Consortium Splitting Rules 15 Measure to evaluate the split is purity.  Gini CART  Entropy reduction or information gain C5.0  Chi-square CHAID  Chi-Square and Variance Reduction QUEST  F-test  Variance reduction Prepared by David Douglas, University of ArkansasHosted by the University of Arkansas

Microsoft Enterprise Consortium Pruning 16  A bushy tree may not be the best predictor and a deep tree has complex rules.  Pruning is used to cut back on the tree. Depending on the pruning algorithm, Pruning may happen as the tree is being constructed. Pruning may be done after the tree is completed.  Stability-Based Pruning Automatic stability-pruning is not yet available. Prepared by David Douglas, University of ArkansasHosted by the University of Arkansas

Microsoft Enterprise ConsortiumExample 17 Prepared by David Douglas, University of ArkansasHosted by the University of Arkansas Evaluate which split is better—the left or the right? The root node has 10 red and 10 blue cases for the target variable.

Microsoft Enterprise Consortium Gini-- 18 Left Split Gini— sum of the squares of the proportion of the classes Gini -- ranges from 0 (no two items alike) to 1 (all items alike) For the root node, =.5 Left node: =.82Right Node: =.82 Multiple by proportion in node and add ½(.82) + ½(.82) =.82 – The Gini value for this split  Left Split  Right Split Prepared by David Douglas, University of ArkansasHosted by the University of Arkansas What is the Gini value?

Microsoft Enterprise Consortium Entropy Reduction—Information Gain 19 -1*(P(black)log 2 P(black) + P(red) log 2 P(red) Root node: -1*(.5)log 2 (.5) + (.5) log 2 (.5) = +1 Left node: -1*(.1)log 2 (.1) + (.9) log 2 (.9) =.47 Right node: -1*(.9)log 2 (.9) + (.1) log 2 (.1) =.47 Multiple by the proportion of records in the node and add ½(.47) +1/2(.47) =.47 Entropy reduction is =.53 Right Split—what is the Entropy Reduction value? -1*(P(black)log 2 P(black) + P(red) log 2 P(red) Root node: -1*(.5)log 2 (.5) + (.5) log 2 (.5) = +1 Left node: -1*(.1)log 2 (.1) + (.9) log 2 (.9) =.47 Right node: -1*(.9)log 2 (.9) + (.1) log 2 (.1) =.47 Multiple by the proportion of records in the node and add ½(.47) +1/2(.47) =.47 Entropy reduction is =.53  Left Split  Right Split Prepared by David Douglas, University of ArkansasHosted by the University of Arkansas What is the Entropy Reduction value?

Microsoft Enterprise Consortium Another Example Prepared by David Douglas, University of ArkansasHosted by the University of Arkansas Red, 10 Blue Left Split Right Split Using Gini as the splitting criterion, which split should be taken?

Microsoft Enterprise Consortium Example - Entropy 21 Prepared by David Douglas, University of ArkansasHosted by the University of Arkansas Evaluate which split is better—the left or the right? The root node has 10 red and 10 blue cases for the target variable.

Microsoft Enterprise Consortium Reduction in Variance - F Test  When target variable is numeric, then a good split would be one that reduces variance of the target variable.  F Test – A large F test means that the proposed split has successfully split the population into subpopulations with significantly different distributions. 22 Prepared by David Douglas, University of ArkansasHosted by the University of Arkansas

Microsoft Enterprise Consortium Pruning the tree  As previously indicated, full trees may not be the best predictors using new data sets.  Thus, a number of tree pruning algorithms have been developed.  CART—Classification and Regression Trees  C5.0  Stability-Based Pruning Automatic stability-pruning is not yet available. 23 Prepared by David Douglas, University of ArkansasHosted by the University of Arkansas

Microsoft Enterprise Consortium Extracting Rules from Trees  Fewer leafs is better for generating rules.  Easy to develop English rules.  Easy to develop SQL rules that can be used on a database of new records that need classifying.  Rules can be explored by domain experts to see if rules are usable or perhaps a rule is simply echoing a procedural policy. 24 Prepared by David Douglas, University of ArkansasHosted by the University of Arkansas

Microsoft Enterprise Consortium Using More than One Field on a Split  Most algorithms consider only a single variable to perform each split.  This can lead to more nodes than necessary.  Algorithms exist to consider multiple fields in combination to form a split. 25 Prepared by David Douglas, University of ArkansasHosted by the University of Arkansas

Microsoft Enterprise Consortium Decision Trees in Practice  Data exploration tool.  Predict future states of important variables in an industrial process.  To form directed clusters of customers for a recommendation system. 26 Prepared by David Douglas, University of ArkansasHosted by the University of Arkansas

Microsoft Enterprise Consortium Using the Software 27 Rule Induction (Decision Trees)  Microsoft Business Intelligence Development Studio will be used to illustrate data mining.  The first example will include decision trees. 27 Prepared by David Douglas, University of ArkansasHosted by the University of Arkansas

Microsoft Enterprise ConsortiumConclusion 28  Decision Trees are the single most popular data mining tool. Easy to understand Easy to implement Easy to use Computationally cheap  It is possible to get into trouble with overfitting.  Mostly, decision trees predict a categorical output from categorical or numeric input variables. 28 Prepared by David Douglas, University of ArkansasHosted by the University of Arkansas Note: Overfitting is when the model fits noise (i.e. pays attention to parts of the data that are irrelevant)—Another way of saying this is it memorizes the data and may not generalize.