Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.

Slides:



Advertisements
Similar presentations
Data Mining Classification: Basic Concepts,
Advertisements

Decision Tree Approach in Data Mining
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
Decision Tree Algorithm (C4.5)
Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of the attributes is the class.
Statistics 202: Statistical Aspects of Data Mining
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Classification: Definition l Given a collection of records (training set) l Find a model.
1 Data Mining Classification Techniques: Decision Trees (BUSINESS INTELLIGENCE) Slides prepared by Elizabeth Anglo, DISCS ADMU.
Decision Tree.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach,
Classification Kuliah 4 4/29/2015. Classification: Definition  Given a collection of records (training set )  Each record contains a set of attributes,
Data Mining Classification This lecture node is modified based on Lecture Notes for Chapter 4/5 of Introduction to Data Mining by Tan, Steinbach, Kumar,
Classification Techniques: Decision Tree Learning
Data Mining Classification: Naïve Bayes Classifier
Decision Trees.
Classification: Basic Concepts and Decision Trees.
Lecture Notes for Chapter 4 Introduction to Data Mining
Classification: Decision Trees, and Naïve Bayes etc. March 17, 2010 Adapted from Chapters 4 and 5 of the book Introduction to Data Mining by Tan, Steinbach,
Decision Trees Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei Han.
Online Algorithms – II Amrinder Arora Permalink:
CSci 8980: Data Mining (Fall 2002)
1 BUS 297D: Data Mining Professor David Mease Lecture 5 Agenda: 1) Go over midterm exam solutions 2) Assign HW #3 (Due Thurs 10/1) 3) Lecture over Chapter.
1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
Lecture 5 (Classification with Decision Trees)
Example of a Decision Tree categorical continuous class Splitting Attributes Refund Yes No NO MarSt Single, Divorced Married TaxInc NO < 80K > 80K.
Machine Learning Lecture 10 Decision Trees G53MLE Machine Learning Dr Guoping Qiu1.
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
1 Statistics 202: Statistical Aspects of Data Mining Professor David Mease Tuesday, Thursday 9:00-10:15 AM Terman 156 Lecture 11 = Finish ch. 4 and start.
DATA MINING LECTURE 9 Classification Basic Concepts Decision Trees.
1 Data Mining Lecture 3: Decision Trees. 2 Classification: Definition l Given a collection of records (training set ) –Each record contains a set of attributes,
Classification I. 2 The Task Input: Collection of instances with a set of attributes x and a special nominal attribute Y called class attribute Output:
Chapter 4 Classification. 2 Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of.
Classification. 2 Classification: Definition  Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes.
Classification Basic Concepts, Decision Trees, and Model Evaluation
Modul 6: Classification. 2 Classification: Definition  Given a collection of records (training set ) Each record contains a set of attributes, one of.
Review - Decision Trees
Decision Trees Jyh-Shing Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University.
Decision Tree Learning Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 25, 2014.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation COSC 4368.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.3: Decision Trees Rodney Nielsen Many of.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach,
Data Mining – Algorithms: Decision Trees - ID3 Chapter 4, Section 4.3.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach,
Classification: Basic Concepts, Decision Trees. Classification: Definition l Given a collection of records (training set ) –Each record contains a set.
Decision Trees Example of a Decision Tree categorical continuous class Refund MarSt TaxInc YES NO YesNo Married Single, Divorced < 80K> 80K Splitting.
Lecture Notes for Chapter 4 Introduction to Data Mining
1 Illustration of the Classification Task: Learning Algorithm Model.
Big Data Analysis and Mining Qinpei Zhao 赵钦佩 2015 Fall Decision Tree.
Classification: Basic Concepts, Decision Trees. Classification Learning: Definition l Given a collection of records (training set) –Each record contains.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining By Tan, Steinbach,
Introduction to Data Mining Clustering & Classification Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
Data Mining – Clustering and Classification 1.  Review Questions ◦ Question 1: Clustering and Classification  Algorithm Questions ◦ Question 2: K-Means.
Data Mining Chapter 4 Algorithms: The Basic Methods - Constructing decision trees Reporter: Yuen-Kuei Hsueh Date: 2008/7/24.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Data Mining Classification and Clustering Techniques Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to Data Mining.
Illustrating Classification Task
DECISION TREES An internal node represents a test on an attribute.
Decision Trees an introduction.
Lecture Notes for Chapter 4 Introduction to Data Mining
Teori Keputusan (Decision Theory)
Artificial Intelligence
Data Science Algorithms: The Basic Methods
Classification Decision Trees
Data Mining Classification: Basic Concepts and Techniques
Lecture Notes for Chapter 4 Introduction to Data Mining
Basic Concepts and Decision Trees
Statistical Learning Dong Liu Dept. EEIS, USTC.
Data Mining CSCI 307, Spring 2019 Lecture 15
COSC 4368 Intro Supervised Learning Organization
COP5577: Principles of Data Mining Fall 2008 Lecture 4 Dr
Presentation transcript:

Lecture 7

Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data preparation and Visualization

1. Illustration of the Classification Task Courtesy to Professor David Mease for Next 10 slides Learning Algorithm Model

Classification: Definition Given a collection of records (training set) – Each record contains a set of attributes (x), with one additional attribute which is the class (y). l Find a model to predict the class as a function of the values of other attributes. l Goal: previously unseen records should be assigned a class as accurately as possible. – A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it.

Classification Examples Classifying credit card transactions as legitimate or fraudulent l Classifying secondary structures of protein as alpha-helix, beta-sheet, or random coil l Categorizing news stories as finance, weather, entertainment, sports, etc l Predicting tumor cells as benign or malignant

Classification Techniques l There are many techniques/algorithms for carrying out classification l In this chapter we will study only decision trees l In Chapter 5 we will study other techniques, including some very modern and effective techniques

An Example of a Decision Tree categorical continuous class Refund MarSt TaxInc YES NO YesNo Married Single, Divorced < 80K> 80K Splitting Attributes Training Data Model: Decision Tree

Applying the Tree Model to Predict the Class for a New Observation Refund MarSt TaxInc YES NO YesNo Married Single, Divorced < 80K> 80K Test Data Start from the root of tree.

Applying the Tree Model to Predict the Class for a New Observation Refund MarSt TaxInc YES NO YesNo Married Single, Divorced < 80K> 80K Test Data

Applying the Tree Model to Predict the Class for a New Observation Refund MarSt TaxInc YES NO YesNo Married Single, Divorced < 80K> 80K Test Data

Applying the Tree Model to Predict the Class for a New Observation Refund MarSt TaxInc YES NO YesNo Married Single, Divorced < 80K> 80K Test Data

Applying the Tree Model to Predict the Class for a New Observation Refund MarSt TaxInc YES NO YesNo Married Single, Divorced < 80K> 80K Test Data

Applying the Tree Model to Predict the Class for a New Observation Refund MarSt TaxInc YES NO YesNo Married Single, Divorced < 80K> 80K Test Data Assign Cheat to “No”

DECISION TREE CHARACTERISTICS Easy to understand -- similar to human decision process Deals with both discrete and continuous features Simple, nonparametric classifier No assumptions regarding probability distribution types NP-complete Algorithms are computationally inexpensive can represent arbitrarily complex decision boundaries Overfitting can be a problem

2. Algorithm to Build Decision Trees Defined recursively Select attribute as root node Use a greedy algorithm to create branches for each possible value of a selected attribute Repeat recursively until Either running out of instances or attributes Or reach a predefined thresholds of purity Use only branches that are reached by instances

WEATHER EXAMPLE, 9 yes/5 no

Outlook 2,Yes 3 No 4 Yes 3 Yes 2 No Sunny Overcast Rainy Temp Hum Wind Play Hot high FALSE No Hot high TRUE No Mild high FALSE No Cool normal FALSE Yes Mild normal TRUE Yes Tem Hum Wind Play Hot high FALSE Yes Cool High TRUE Yes Mild high TRUE Yes Hot normal FALSE Yes Temp Hum Wind Play Mild high FALSE Yes Cool normal FALSE Yes Cool normal TURE No mild normal FALSE Yes Mild high TRUE No

DECISION TREES: Weather Example Temperature Yes No Yes No Yes No Hot Mild Cool

DECISION TREES: Weather Example Humidity Yes No Yes No HighNormal

DECISION TREES: Weather Example Windy Yes No Yes No True False

3. Information Measures Selecting attribute upon which to split Need measure of “purity”/information Entropy Gini Index Classification error

A Graphical Comparison

Entropy l Measures purity similar to Gini l Used in C4.5 l After the entropy is computed in each node, the overall value of the entropy is computed as the weighted average of the entropy in each node as with the Gini index l The decrease in Entropy is called “information gain” (page 160)

Entropy Examples for a Single Node P(C1) = 0/6 = 0 P(C2) = 6/6 = 1 Entropy = – 0 log 0 – 1 log 1 = – 0 – 0 = 0 P(C1) = 1/6 P(C2) = 5/6 Entropy = – (1/6) log 2 (1/6) – (5/6) log 2 (5/6) = 0.65 P(C1) = 2/6 P(C2) = 4/6 Entropy = – (2/6) log 2 (2/6) – (4/6) log 2 (4/6) = 0.92

3. Entropy: Calculating Information All three measures are consistent with each other Will use entropy as example, The less purity, the more bits, the less info, Outlook as root: Info[2, 3] = bits Info[4, 0] = 0.0 bits Info[3, 2] = bits Total info = 5/14* /14* /14* = bits

3. Selecting Root Attribute Initial info = info[9, 5] = bits Gain(outlook) = – = bits Gain( temperature) = bits Gain(humidity) = bits Gain(windy) = bits So, select outlook as root for splitting

Outlook Humidity 2/3 FALSE YES TRUE YES FALSE No TRUE No FALSE No 4 Yes Wind 3/2 YES No NO Sunny OvercastRainy High Normal FALSE TRUE

Outlook Humidity 2/3 FALSE YES TRUE YES FALSE No TRUE No FALSE No 4 Yes Wind 3/2 YES Cool normal Yes Cool normal No Mild high No Sunny OvercastRainy High Normal FALSE TRUE Contradicted Training example

3. Hunt’s Algorithm Many algorithms use a version of a “top-down” or “divide- and-conquer” approach known as Hunt’s Algorithm (Page 152): Let D t be the set of training records that reach a node t – If D t contains records that belong the same class y t, then t is a leaf node labeled as y t – If D t contains records that belong to more than one class, use an attribute test to split the data into smaller subsets. Recursively apply the procedure to each subset.

An Example of Hunt’s Algorithm Don’t Cheat Refund Don’t Cheat Don’t Cheat YesNo Refund Don’t Cheat YesNo Marital Status Don’t Cheat Single, Divorced Married Taxable Income Don’t Cheat < 80K>= 80K Refund Don’t Cheat YesNo Marital Status Don’t Cheat Single, Divorced Married

How to Apply Hunt’s Algorithm Usually it is done in a “greedy” fashion. “Greedy” means that the optimal split is chosen at each stage according to some criterion. This may not be optimal at the end even for the same criterion. However, the greedy approach is computational efficient so it is popular.

Using the greedy approach we still have to decide 3 things: #1) What attribute test conditions to consider #2) What criterion to use to select the “best” split #3) When to stop splitting For #1 we will consider only binary splits for both numeric and categorical predictors as discussed on the next slide For #2 we will consider misclassification error, Gini index and entropy #3 is a subtle business involving model selection. It is tricky because we don’t want to overfit or underfit.

l Misclassification error is usually our final metric which we want to minimize on the test set, so there is a logical argument for using it as the split criterion l It is simply the fraction of total cases misclassified l 1 - Misclassification error = “Accuracy” (page 149) Misclassification Error

Gini Index l This is commonly used in many algorithms like CART and the rpart() function in R l After the Gini index is computed in each node, the overall value of the Gini index is computed as the weighted average of the Gini index in each node

Gini Examples for a Single Node P(C1) = 0/6 = 0 P(C2) = 6/6 = 1 Gini = 1 – P(C1) 2 – P(C2) 2 = 1 – 0 – 1 = 0 P(C1) = 1/6 P(C2) = 5/6 Gini = 1 – (1/6) 2 – (5/6) 2 = P(C1) = 2/6 P(C2) = 4/6 Gini = 1 – (2/6) 2 – (4/6) 2 = 0.444

The Gini index decreases from.42 to.343 while the misclassification error stays at 30%. This illustrates why we often want to use a surrogate loss function like the Gini index even if we really only care about misclassification. A? YesNo Node N1Node N2 Gini(N1) = 1 – (3/3) 2 – (0/3) 2 = 0 Gini(Children) = 3/10 * 0 + 7/10 * 0.49 = Gini(N2) = 1 – (4/7) 2 – (3/7) 2 = Misclassification Error Vs. Gini Index

5. Discretization of Numeric Data