Longin Jan Latecki Temple University

Slides:



Advertisements
Similar presentations
DECISION TREES. Decision trees  One possible representation for hypotheses.
Advertisements

Decision Tree Learning - ID3
Decision Trees Decision tree representation ID3 learning algorithm
Classification Algorithms
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of the attributes is the class.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Classification: Definition l Given a collection of records (training set) l Find a model.
1 Data Mining Classification Techniques: Decision Trees (BUSINESS INTELLIGENCE) Slides prepared by Elizabeth Anglo, DISCS ADMU.
Decision Tree.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach,
Classification Kuliah 4 4/29/2015. Classification: Definition  Given a collection of records (training set )  Each record contains a set of attributes,
Data Mining Classification This lecture node is modified based on Lecture Notes for Chapter 4/5 of Introduction to Data Mining by Tan, Steinbach, Kumar,
Classification Techniques: Decision Tree Learning
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Classification: Basic Concepts and Decision Trees.
Classification: Decision Trees, and Naïve Bayes etc. March 17, 2010 Adapted from Chapters 4 and 5 of the book Introduction to Data Mining by Tan, Steinbach,
Decision Trees Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei Han.
Decision Tree Rong Jin. Determine Milage Per Gallon.
CSci 8980: Data Mining (Fall 2002)
Decision Tree Learning Learning Decision Trees (Mitchell 1997, Russell & Norvig 2003) –Decision tree induction is a simple but powerful learning paradigm.
Induction of Decision Trees
1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Lecture 5 (Classification with Decision Trees)
Example of a Decision Tree categorical continuous class Splitting Attributes Refund Yes No NO MarSt Single, Divorced Married TaxInc NO < 80K > 80K.
Classification.
Machine Learning Lecture 10 Decision Trees G53MLE Machine Learning Dr Guoping Qiu1.
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Machine Learning Chapter 3. Decision Tree Learning
Learning what questions to ask. 8/29/03Decision Trees2  Job is to build a tree that represents a series of questions that the classifier will ask of.
DATA MINING LECTURE 9 Classification Basic Concepts Decision Trees.
1 Data Mining Lecture 3: Decision Trees. 2 Classification: Definition l Given a collection of records (training set ) –Each record contains a set of attributes,
Chapter 9 – Classification and Regression Trees
Chapter 4 Classification. 2 Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of.
Classification. 2 Classification: Definition  Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes.
Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.
Modul 6: Classification. 2 Classification: Definition  Given a collection of records (training set ) Each record contains a set of attributes, one of.
Review - Decision Trees
Machine Learning Lecture 10 Decision Tree Learning 1.
Decision Trees Jyh-Shing Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach,
CS690L Data Mining: Classification
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
Classification Algorithms Decision trees Rule-based induction Neural networks Memory(Case) based reasoning Genetic algorithms Bayesian networks Basic Principle.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach,
Decision Trees, Part 1 Reading: Textbook, Chapter 6.
Classification: Basic Concepts, Decision Trees. Classification: Definition l Given a collection of records (training set ) –Each record contains a set.
Decision Trees Example of a Decision Tree categorical continuous class Refund MarSt TaxInc YES NO YesNo Married Single, Divorced < 80K> 80K Splitting.
Lecture Notes for Chapter 4 Introduction to Data Mining
Classification: Basic Concepts, Decision Trees. Classification Learning: Definition l Given a collection of records (training set) –Each record contains.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining By Tan, Steinbach,
Decision Trees.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Machine Learning Inductive Learning and Decision Trees
DECISION TREES An internal node represents a test on an attribute.
Decision Trees an introduction.
Decision Tree Learning
Classification Algorithms
Decision Tree Learning
Artificial Intelligence
Data Science Algorithms: The Basic Methods
Decision Tree Saed Sayad 9/21/2018.
Data Mining Classification: Basic Concepts and Techniques
Machine Learning Chapter 3. Decision Tree Learning
Basic Concepts and Decision Trees
Machine Learning: Lecture 3
Lecture 05: Decision Trees
Machine Learning Chapter 3. Decision Tree Learning
COP5577: Principles of Data Mining Fall 2008 Lecture 4 Dr
Presentation transcript:

Longin Jan Latecki Temple University latecki@temple.edu Ch. 6: Decision Trees Stephen Marsland, Machine Learning: An Algorithmic Perspective.  CRC 2009 based on slides from Stephen Marsland, Jia Li, and Ruoming Jin. Longin Jan Latecki Temple University latecki@temple.edu

Illustrating Classification Task

Decision Trees Split classification down into a series of choices about features in turn Lay them out in a tree Progress down the tree to the leaves 159.302 Stephen Marsland

Play Tennis Example Outlook Overcast Sunny Rain Humidity Wind Yes High Normal Strong Weak No Yes 159.302 Stephen Marsland

Day Outlook Temp Humid Wind Play? 1 Sunny Hot High Weak No 2 Strong 3 Overcast Yes 4 Rain Mild 5 Cool Normal 6 7 8 9 10 11 12 13 14 159.302 Stephen Marsland

Rules and Decision Trees Can turn the tree into a set of rules: (outlook = sunny & humidity = normal) | (outlook = overcast) | (outlook = rain & wind = weak) How do we generate the trees? Need to choose features Need to choose order of features 159.302 Stephen Marsland

A tree structure classification rule for a medical example 159.302 Stephen Marsland

The construction of a tree involves the following three elements: 1. The selection of the splits. 2. The decisions when to declare a node as terminal or to continue splitting. 3. The assignment of each terminal node to one of the classes. 159.302 Stephen Marsland

Goodness of split The goodness of split is measured by an impurity function defined for each node. Intuitively, we want each leaf node to be “pure”, that is, one class dominates. 159.302 Stephen Marsland

How to determine the Best Split Before Splitting: 10 records of class 0, 10 records of class 1 Which test condition is the best?

How to determine the Best Split Greedy approach: Nodes with homogeneous class distribution are preferred Need a measure of node impurity: Non-homogeneous, High degree of impurity Homogeneous, Low degree of impurity

Measures of Node Impurity Entropy Gini Index

Entropy Let F be a feature with possible values f1, …, fn Let p be a pdf (probability density function) of F; usually p is simply given by a histogram (p1, …, pn), where pi is the proportion of the data that has value F=fi. Entropy of p tells us how much extra information we get from knowing the value of the feature, i.e, F=fi for a given data point . Measures the amount in impurity in the set of features Makes sense to pick the features that provides the most information 159.302 Stephen Marsland

E.g., if F is a feature with two possible values +1 and -1, and p1=1 and p2=0, then we get no new information from knowing that F=+1 for a given example. Thus the entropy is zero. If p1=0.5 and p2=0.5, then the entropy is at maximum. 159.302 Stephen Marsland

Entropy and Decision Tree Entropy at a given node t: (NOTE: p( j | t) is the relative frequency of class j at node t). Measures homogeneity of a node. Maximum (log nc) when records are equally distributed among all classes implying least information Minimum (0.0) when all records belong to one class, implying most information Entropy based computations are similar to the GINI index computations

Examples for computing Entropy P(C1) = 0/6 = 0 P(C2) = 6/6 = 1 Entropy = – 0 log 0 – 1 log 1 = – 0 – 0 = 0 P(C1) = 1/6 P(C2) = 5/6 Entropy = – (1/6) log2 (1/6) – (5/6) log2 (1/6) = 0.65 P(C1) = 2/6 P(C2) = 4/6 Entropy = – (2/6) log2 (2/6) – (4/6) log2 (4/6) = 0.92

Splitting Based on Information Gain Parent Node, p is split into k partitions; ni is number of records in partition i Measures Reduction in Entropy achieved because of the split. Choose the split that achieves most reduction (maximizes GAIN) Used in ID3 and C4.5 Disadvantage: Tends to prefer splits that result in large number of partitions, each being small but pure.

Information Gain Choose the feature that provides the highest information gain over all examples That is all there is to ID3: At each stage, pick the feature with the highest information gain 159.302 Stephen Marsland

Example Values(Wind) = Weak, Strong S = [9+, 5-] S(Weak) <- [6+, 2-] S(Strong) <- [3+, 3-] Gain(S,Wind) = Entropy(S) - (8/14) Entropy(S(Weak)) - (6/14)Entropy(S(Strong)) = 0.94 - (8/14)0.811 - (6/14)1.00 159.302 Stephen Marsland

Day Outlook Temp Humid Wind Play? 1 Sunny Hot High Weak No 2 Strong 3 Overcast Yes 4 Rain Mild 5 Cool Normal 6 7 8 9 10 11 12 13 14 159.302 Stephen Marsland

ID3 (Quinlan) Search over all possible trees Can deal with noise Greedy search - no backtracking Susceptible to local minima Uses all features - no pruning Can deal with noise Labels are most common value of examples 159.302 Stephen Marsland

Sv only contains examples in category c1 ID3 Feature F with highest Gain(S,F) F v w Feature G with highest Gain(Sw,G) G Leaf node, category c1 x y Etc. Etc. Sv only contains examples in category c1 159.302 Stephen Marsland

Search 159.302 Stephen Marsland

Example [9+, 5-] Outlook Overcast Sunny Rain [2+, 3-] [3+, 2-] [4+, 0-] ? Yes ? 159.302 Stephen Marsland

Inductive Bias How does the algorithm generalise from the training examples? Choose features with highest information gain Minimise amount of information is left Bias towards shorter trees Occam’s Razor Put most useful features near root 159.302 Stephen Marsland

Missing Data Suppose that one feature has no value Can miss out that node and carry on down the tree, following all paths out of that node Can therefore still get a classification Virtually impossible with neural networks 159.302 Stephen Marsland

C4.5 Improved version of ID3, also by Quinlan Use a validation set to avoid overfitting Could just stop choosing features (early stopping) Better results from post-pruning Make whole tree Chop off some parts of tree afterwards 159.302 Stephen Marsland

Post-Pruning Run over tree Prune each node by replacing subtree below with a leaf Evaluate error and keep if error same or better 159.302 Stephen Marsland

Rule Post-Pruning Turn tree into set of if-then rules Remove preconditions from each rule in turn, and check accuracy Sort rules according to accuracy Rules are easy to read 159.302 Stephen Marsland

Rule Post-Pruning IF ((outlook = sunny) & (humidity = high)) THEN playTennis = no Remove preconditions: Consider IF (outlook = sunny) And IF (humidity = high) Test accuracy If one of them is better, try removing both 159.302 Stephen Marsland

ID3 Decision Tree Outlook Overcast Sunny Rain Humidity Wind Yes High Normal Strong Weak No Yes 159.302 Stephen Marsland

Test Case Outlook = Sunny Temperature = Cool Humidity = High Wind = Strong 159.302 Stephen Marsland

Party Example, Section 6. 4, p Party Example, Section 6.4, p. 147 Construct a decision tree based on these data: Deadline, Party, Lazy, Activity Urgent, Yes, Yes, Party Urgent, No, Yes, Study Near, Yes, Yes, Party None, Yes, No, Party None, No, Yes, Pub Near, No, No, Study Near, No, Yes, TV Urgent, No, No, Study 159.302 Stephen Marsland

Party? Yes No Go to party Deadline None Urgent Near Study Go to pub Lazy? Yes No Watch TV Study 159.302 Stephen Marsland

Measure of Impurity: GINI Gini Index for a given node t : (NOTE: p( j | t) is the relative frequency of class j at node t). Maximum (1 - 1/nc) when records are equally distributed among all classes, implying least interesting information Minimum (0.0) when all records belong to one class, implying most interesting information

Examples for computing GINI P(C1) = 0/6 = 0 P(C2) = 6/6 = 1 Gini = 1 – P(C1)2 – P(C2)2 = 1 – 0 – 1 = 0 P(C1) = 1/6 P(C2) = 5/6 Gini = 1 – (1/6)2 – (5/6)2 = 0.278 P(C1) = 2/6 P(C2) = 4/6 Gini = 1 – (2/6)2 – (4/6)2 = 0.444

Splitting Based on GINI Used in CART, SLIQ, SPRINT. When a node p is split into k partitions (children), the quality of split is computed as, where, ni = number of records at child i, n = number of records at node p.

Binary Attributes: Computing GINI Index Splits into two partitions Larger and Purer Partitions are sought for B? Yes No Node N1 Node N2 Gini(N1) = 1 – (5/6)2 – (2/6)2 = 0.194 Gini(N2) = 1 – (1/6)2 – (4/6)2 = 0.528 Gini(Children) = 7/12 * 0.194 + 5/12 * 0.528 = 0.333

Categorical Attributes: Computing Gini Index For each distinct value, gather counts for each class in the dataset Use the count matrix to make decisions Two-way split (find best partition of values) Multi-way split

Homework Written homework for everyone, due on Oct. 5: Problem 6.3, p. 151. Matlab Homework: Demonstrate decision tree learning and classification on the party example.