Decision Tree.

Slides:



Advertisements
Similar presentations
Decision Trees Decision tree representation ID3 learning algorithm
Advertisements

Decision Tree Approach in Data Mining
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
1 Data Mining Classification Techniques: Decision Trees (BUSINESS INTELLIGENCE) Slides prepared by Elizabeth Anglo, DISCS ADMU.
Decision Tree.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach,
ICS320-Foundations of Adaptive and Learning Systems
Classification Techniques: Decision Tree Learning
Classification: Decision Trees, and Naïve Bayes etc. March 17, 2010 Adapted from Chapters 4 and 5 of the book Introduction to Data Mining by Tan, Steinbach,
Decision Trees Jeff Storey. Overview What is a Decision Tree Sample Decision Trees How to Construct a Decision Tree Problems with Decision Trees Decision.
Part 7.3 Decision Trees Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting.
CSci 8980: Data Mining (Fall 2002)
CS 590M Fall 2001: Security Issues in Data Mining Lecture 4: ID3.
Decision Tree Learning Learning Decision Trees (Mitchell 1997, Russell & Norvig 2003) –Decision tree induction is a simple but powerful learning paradigm.
Decision Trees and Decision Tree Learning Philipp Kärger
1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
Decision Trees Decision tree representation Top Down Construction
Lecture 5 (Classification with Decision Trees)
Example of a Decision Tree categorical continuous class Splitting Attributes Refund Yes No NO MarSt Single, Divorced Married TaxInc NO < 80K > 80K.
Classification.
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning Decision Tree.
1 Data Mining Lecture 3: Decision Trees. 2 Classification: Definition l Given a collection of records (training set ) –Each record contains a set of attributes,
Chapter 4 Classification. 2 Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of.
Classification. 2 Classification: Definition  Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes.
Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.
Modul 6: Classification. 2 Classification: Definition  Given a collection of records (training set ) Each record contains a set of attributes, one of.
Decision Trees Example of a Decision Tree categorical continuous class Refund MarSt TaxInc YES NO YesNo Married Single, Divorced < 80K> 80K Splitting.
Lecture Notes for Chapter 4 Introduction to Data Mining
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
CSE573 Autumn /09/98 Machine Learning Administrative –Last topic: Decision Tree Learning Reading: 5.1, 5.4 Last time –finished NLP sample system’s.
CSE573 Autumn /11/98 Machine Learning Administrative –Finish this topic –The rest of the time is yours –Final exam Tuesday, Mar. 17, 2:30-4:20.
Decision Tree Learning
Chapter 6 Decision Tree.
Machine Learning Inductive Learning and Decision Trees
DECISION TREES An internal node represents a test on an attribute.
Decision Trees an introduction.
Università di Milano-Bicocca Laurea Magistrale in Informatica
Lecture Notes for Chapter 4 Introduction to Data Mining
CS 9633 Machine Learning Decision Tree Learning
Decision Tree Learning
Decision trees (concept learnig)
Machine Learning Lecture 2: Decision Tree Learning.
Decision trees (concept learnig)
Classification Algorithms
Decision Tree Learning
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
Teori Keputusan (Decision Theory)
Artificial Intelligence
Chapter 6 Classification and Prediction
Data Science Algorithms: The Basic Methods
Decision Tree Saed Sayad 9/21/2018.
Classification and Prediction
Machine Learning Chapter 3. Decision Tree Learning
Basic Concepts and Decision Trees
Machine Learning: Lecture 3
Decision Trees Decision tree representation ID3 learning algorithm
Classification with Decision Trees
Machine Learning Chapter 3. Decision Tree Learning
آبان 96. آبان 96 Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan,
Decision Trees Decision tree representation ID3 learning algorithm
©Jiawei Han and Micheline Kamber
INTRODUCTION TO Machine Learning
A task of induction to find patterns
Decision Trees Jeff Storey.
Classification 1.
A task of induction to find patterns
COP5577: Principles of Data Mining Fall 2008 Lecture 4 Dr
Presentation transcript:

Decision Tree

Decision Tree Outlook Sunny Overcast Rain Humidity Yes Wind High Normal Strong Weak No Yes No Yes

An Example Outlook overcast humidity windy high normal false true sunny rain N P

Example of a Decision Tree categorical continuous class Splitting Attributes Refund Yes No NO MarSt Single, Divorced Married TaxInc NO < 80K > 80K NO YES Training Data Model: Decision Tree

Another Example of Decision Tree categorical categorical continuous class MarSt Single, Divorced Married NO Refund No Yes NO TaxInc < 80K > 80K NO YES There could be more than one tree that fits the same data!

Classification by Decision Tree Induction A flow-chart-like tree structure Internal node denotes an attribute Branch represents the values of the node attribute Leaf nodes represent class labels or class distribution

Decision trees high Income? yes no Criminal record? NO yes no NO YES

Constructing a decision tree, one step at a time +a, -c, +i, +e, +o, +u: Y -a, +c, -i, +e, -o, -u: N +a, -c, +i, -e, -o, -u: Y -a, -c, +i, +e, -o, -u: Y -a, +c, +i, -e, -o, -u: N -a, -c, +i, -e, -o, +u: Y +a, -c, -i, -e, +o, -u: N +a, +c, +i, -e, +o, -u: N Constructing a decision tree, one step at a time address? yes no -a, +c, -i, +e, -o, -u: N -a, -c, +i, +e, -o, -u: Y -a, +c, +i, -e, -o, -u: N -a, -c, +i, -e, -o, +u: Y +a, -c, +i, +e, +o, +u: Y +a, -c, +i, -e, -o, -u: Y +a, -c, -i, -e, +o, -u: N +a, +c, +i, -e, +o, -u: N criminal? criminal? yes no yes no -a, +c, -i, +e, -o, -u: N -a, +c, +i, -e, -o, -u: N -a, -c, +i, +e, -o, -u: Y -a, -c, +i, -e, -o, +u: Y +a, -c, +i, +e, +o, +u: Y +a, -c, +i, -e, -o, -u: Y +a, -c, -i, -e, +o, -u: N +a, +c, +i, -e, +o, -u: N income? yes no Address was maybe not the best attribute to start with… +a, -c, +i, +e, +o, +u: Y +a, -c, +i, -e, -o, -u: Y +a, -c, -i, -e, +o, -u: N

A Worked Example Weekend Weather Parents Money Decision (Category) W1 Sunny Yes Rich Cinema W2 No Tennis W3 Windy W4 Rainy Poor W5 Stay in W6 W7 W8 Shopping W9 W10

Decision Tree Learning Building a Decision Tree First test all attributes and select the on that would function as the best root; Break-up the training set into subsets based on the branches of the root node; Test the remaining attributes to see which ones fit best underneath the branches of the root node; Continue this process for all other branches until all examples of a subset are of one type there are no examples left (return majority classification of the parent) there are no more attributes left (default value should be majority classification)

Decision Tree Learning Determining which attribute is best (Entropy & Gain) Entropy (E) is the minimum number of bits needed in order to classify an arbitrary example as yes or no E(S) = ci=1 –pi log2 pi , Where S is a set of training examples, c is the number of classes, and pi is the proportion of the training set that is of class i For our entropy equation 0 log2 0 = 0 The information gain G(S,A) where A is an attribute G(S,A)  E(S) - v in Values(A) (|Sv| / |S|) * E(Sv)

Information Gain Gain(S,A): expected reduction in entropy due to sorting S on attribute A Gain(S,A)=Entropy(S) - vvalues(A) |Sv|/|S| Entropy(Sv) Entropy([29+,35-]) = -29/64 log2 29/64 – 35/64 log2 35/64 = 0.99 A1=? True False [21+, 5-] [8+, 30-] [29+,35-] A2=? True False [18+, 33-] [11+, 2-] [29+,35-]

Training Examples Day Outlook Temp. Humidity Wind Play Tennis D1 Sunny Hot High Weak No D2 Strong D3 Overcast Yes D4 Rain Mild D5 Cool Normal D6 D7 D8 D9 Cold D10 D11 D12 D13 D14

ID3 Algorithm [D1,D2,…,D14] [9+,5-] Outlook Sunny Overcast Rain Ssunny=[D1,D2,D8,D9,D11] [2+,3-] [D3,D7,D12,D13] [4+,0-] [D4,D5,D6,D10,D14] [3+,2-] Yes ? ? Gain(Ssunny , Humidity)=0.970-(3/5)0.0 – 2/5(0.0) = 0.970 Gain(Ssunny , Temp.)=0.970-(2/5)0.0 –2/5(1.0)-(1/5)0.0 = 0.570 Gain(Ssunny , Wind)=0.970= -(2/5)1.0 – 3/5(0.918) = 0.019

ID3 Algorithm Outlook Sunny Overcast Rain Humidity Yes Wind [D3,D7,D12,D13] High Normal Strong Weak No Yes No Yes [D6,D14] [D4,D5,D10] [D1,D2] [D8,D9,D11]

Converting a Tree to Rules Outlook Sunny Overcast Rain Humidity High Normal Wind Strong Weak No Yes R1: If (Outlook=Sunny)  (Humidity=High) Then PlayTennis=No R2: If (Outlook=Sunny)  (Humidity=Normal) Then PlayTennis=Yes R3: If (Outlook=Overcast) Then PlayTennis=Yes R4: If (Outlook=Rain)  (Wind=Strong) Then PlayTennis=No R5: If (Outlook=Rain)  (Wind=Weak) Then PlayTennis=Yes

Decision tree classifiers Does not require any prior knowledge of data distribution, works well on noisy data. Has been applied to: classify medical patients based on the disease, equipment malfunction by cause, loan applicant by likelihood of payment.

Decision trees The internal nodes are simple decision rules on one or more attributes and leaf nodes are predicted class labels. Salary < 1 M Prof = teaching Age < 30 Cover number of topics on this subject Goal: commentary, cover system integration and architectural aspects, shortcomings of existing systems and future directions.. Not a teaching lecture. Details of specific algorithms covered on demand. Trailer: details can be addressed later. Goo d Bad Bad Goo d

Sample Experience Table Example Attributes Target   Hour Weather Accident Stall Commute D1 8 AM Sunny No Long D2 Cloudy Yes D3 10 AM Short D4 9 AM Rainy D5 D6 D7 D8 Medium D9 D10 D11 D12 D13

Decision Tree Analysis For choosing the best course of action when future outcomes are uncertain.

Classification Data: It has k attributes A1, … Ak. Each tuple (case or example) is described by values of the attributes and a class label. Goal: To learn rules or to build a model that can be used to predict the classes of new (or future or test) cases. The data used for building the model is called the training data. Induction is different from deduction and DBMS does not not support induction; The result of induction is higher-level information or knowledge: general statements about data There are many approaches. Refer to the lecture notes for CS3244 available at the Co-Op. We focus on three approaches here, other examples: Other approaches Instance-based learning other neural networks Concept learning (Version space, Focus, Aq11, …) Genetic algorithms Reinforcement learning

Data and its format Data Data type Data format attribute-value pairs with/without class Data type continuous/discrete nominal Data format Flat If not flat, what should we do?

Induction Algorithm We calculate the gain for each attribute and choose the max gain to be the node in the tree. After build the node calculate the gain for other attribute and choose again the max of them.

Decision Tree Induction Many Algorithms: Hunt’s Algorithm (one of the earliest) CART ID3, C4.5 SLIQ,SPRINT

Decision Tree Induction Algorithm Create a root node for the tree If all cases are positive, return single-node tree with label + If all cases are negative, return single-node tree with label - Otherwise begin For each possible value of node Add a new tree branch Let cases be subset of all data that have this value Add new node with new subtree until leaf node

Tests for Choosing Best function Purity (Diversity) Measures: Gini (population diversity) Entropy (information gain) Information Gain Ratio Chi-square Test We will only explore Gini in class

Measures of Node Impurity Gini Index Entropy Misclassification error

Alternative Splitting Criteria based on INFO Entropy at a given node t: (NOTE: p( j | t) is the relative frequency of class j at node t). Measures homogeneity of a node. Maximum (log nc) when records are equally distributed among all classes implying least information Minimum (0.0) when all records belong to one class, implying most information Entropy based computations are similar to the GINI index computations

Impurity Measures Information entropy: Zero when consisting of only one class, one when all classes in equal number Other measures of impurity: Gini:

Splitting Based on INFO... Information Gain: Parent Node, p is split into k partitions; ni is number of records in partition i Measures Reduction in Entropy achieved because of the split. Choose the split that achieves most reduction (maximizes GAIN) Used in ID3 and C4.5 Disadvantage: Tends to prefer splits that result in large number of partitions, each being small but pure.

Review of Log2 log2(0) = unkown log2(1) = 0 log2(2) = 1 log2(4) = 2

Example No. A1 A2 Classification 1 T + 2 3 F - 4 5 6

Example of Decision Tree Size Shape Color Choice Medium Brick Blue Yes Small Wedge Red No Large Sphere Pillar Green

Classification —A Two-Step Process Model construction: describing a set of predetermined classes based on a training set. It is also called learning. Each tuple/sample is assumed to belong to a predefined class The model is represented as classification rules, decision trees, or mathematical formulae Model usage: for classifying future test data/objects Estimate accuracy of the model The known label of test example is compared with the classified result from the model Accuracy rate is the % of test cases that are correctly classified by the model If the accuracy is acceptable, use the model to classify data tuples whose class labels are not known.

Decision Tree Classification Task

Classification Process (1): Model Construction Algorithms Training Data Classifier (Model) IF rank = ‘professor’ OR years > 6 THEN tenured = ‘yes’

Classification Process (2): Use the Model in Prediction Classifier Testing Data Unseen Data (Jeff, Professor, 4) Tenured?

Example consider the problem of learning whether or not to go jogging on a particular day. Possible Values Attribute Warm, Cold, Raining WEATHER Yes, No JOGGED_YESTERDAY

The Data CLASSIFICATION JOGGED_YESTERDAY WEATHER + N C - Y W R

Test Data CLASSIFICATION JOGGED_YESTERDAY WEATHER - Y W + N R C

ID3: Some Issues Sometimes we arrive to a node with no examples. This means that the example has not been observed. We just assigned as value the majority vote of its parent Sometimes we arrive to a node with both positive and negative examples and no attributes left. This means that there is noise in the data. We just assigned as value the majority vote of the examples

Problems with Decision Tree ID3 is not optimal Uses expected entropy reduction, not actual reduction Must use discrete (or discretized) attributes What if we left for work at 9:30 AM? We could break down the attributes into smaller values…

Problems with Decision Trees While decision trees classify quickly, the time for building a tree may be higher than another type of classifier Decision trees suffer from a problem of errors propagating throughout a tree A very serious problem as the number of classes increases

Decision Tree characteristics The training data may contain missing attribute values. Decision tree methods can be used even when some training examples have unknown values.

Unknown Attribute Values What is some examples missing values of A? Use training example anyway sort through tree If node n tests A, assign most common value of A among other examples sorted to node n. Assign most common value of A among other examples with same target value Assign probability pi to each possible value vi of A Assign fraction pi of example to each descendant in tree

Rule Generation Once a decision tree has been constructed, it is a simple matter to convert it into set of rules. Converting to rules allows distinguishing among the different contexts in which a decision node is used.

Rule Generation Converting to rules improves readability. Rules are often easier for people to understand. To generate rules, trace each path in the decision tree, from root node to leaf node

Rule Simplification Once a rule set has been devised: Once individual rules have been simplified by eliminating redundant rules and unnecessary rules. Attempt to replace those rules that share the most common consequent by a default rule that is triggered when no other rule is triggered.

Attribute-Based Representations Examples of decisions

Continuous Valued Attributes Create a discrete attribute to test continuous Temperature = 24.50C (Temperature > 20.00C) = {true, false} Where to set the threshold? Temperatur 150C 180C 190C 220C 240C 270C PlayTennis No Yes

Metrics for Performance Evaluation Focus on the predictive capability of a model Rather than how fast it takes to classify or build models, scalability, etc. Confusion Matrix: PREDICTED CLASS ACTUAL CLASS Class=Yes Class=No a b c d a: TP (true positive) b: FN (false negative) c: FP (false positive) d: TN (true negative)

Metrics for Performance Evaluation… PREDICTED CLASS ACTUAL CLASS Class=Yes Class=No a (TP) b (FN) c (FP) d (TN) Most widely-used metric:

Pruning Trees There is another technique for reducing the number of attributes used in a tree - pruning Two types of pruning: Pre-pruning (forward pruning) Post-pruning (backward pruning)

Prepruning In prepruning, we decide during the building process when to stop adding attributes (possibly based on their information gain) However, this may be problematic – Why? Sometimes attributes individually do not contribute much to a decision, but combined, they may have a significant impact

Postpruning Postpruning waits until the full decision tree has built and then prunes the attributes Two techniques: Subtree Replacement Subtree Raising

Subtree Replacement Entire subtree is replaced by a single leaf node A 5 4 1 2 3

Subtree Replacement Node 6 replaced the subtree Generalizes tree a little more, but may increase accuracy A B 6 5 4

Subtree Raising Entire subtree is raised onto another node A B C 5 4 1 2 3

Subtree Raising Entire subtree is raised onto another node This was not discussed in detail as it is not clear whether this is really worthwhile (as it is very time consuming) A C 1 2 3