MIS2502: Data Analytics Classification using Decision Trees

Slides:



Advertisements
Similar presentations
Data Mining Lecture 9.
Advertisements

DECISION TREES. Decision trees  One possible representation for hypotheses.
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Hunt’s Algorithm CIT365: Data Mining & Data Warehousing Bajuna Salehe
Decision Tree Approach in Data Mining
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of the attributes is the class.
Decision Tree.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach,
Classification Techniques: Decision Tree Learning
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Data Mining Classification: Naïve Bayes Classifier
Lecture Notes for Chapter 4 Introduction to Data Mining
Classification: Decision Trees, and Naïve Bayes etc. March 17, 2010 Adapted from Chapters 4 and 5 of the book Introduction to Data Mining by Tan, Steinbach,
Decision Tree Rong Jin. Determine Milage Per Gallon.
Decision Tree Algorithm
Induction of Decision Trees
Covering Algorithms. Trees vs. rules From trees to rules. Easy: converting a tree into a set of rules –One rule for each leaf: –Antecedent contains a.
Decision Tree Models in Data Mining
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
1 Data Mining Lecture 3: Decision Trees. 2 Classification: Definition l Given a collection of records (training set ) –Each record contains a set of attributes,
Chapter 9 – Classification and Regression Trees
Chapter 4 Classification. 2 Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of.
Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.
Final Exam Review. The following is a list of items that you should review in preparation for the exam. Note that not every item in the following slides.
Decision Trees Jyh-Shing Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University.
Decision Tree Learning Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 25, 2014.
Decision Trees. Decision trees Decision trees are powerful and popular tools for classification and prediction. The attractiveness of decision trees is.
Classification and Prediction Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot Readings: Chapter 6 – Han and Kamber.
EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.
The Three Analytics Techniques. Decision Trees – Determining Probability.
CLUSTERING AND SEGMENTATION MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
1 Illustration of the Classification Task: Learning Algorithm Model.
Big Data Analysis and Mining Qinpei Zhao 赵钦佩 2015 Fall Decision Tree.
Introduction to Data Mining Clustering & Classification Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
Data Mining – Clustering and Classification 1.  Review Questions ◦ Question 1: Clustering and Classification  Algorithm Questions ◦ Question 2: K-Means.
BY International School of Engineering {We Are Applied Engineering} Disclaimer: Some of the Images and content have been taken from multiple online sources.
Classification Tree Interaction Detection. Use of decision trees Segmentation Stratification Prediction Data reduction and variable screening Interaction.
MIS2502: Data Analytics Clustering and Segmentation Jeremy Shafer
Chapter 12 – Discriminant Analysis
Chapter 6 Decision Tree.
DECISION TREES An internal node represents a test on an attribute.
Decision Trees an introduction.
MIS2502: Data Analytics Advanced Analytics - Introduction
C4.5 - pruning decision trees
Artificial Intelligence
Ch9: Decision Trees 9.1 Introduction A decision tree:
Chapter 6 Classification and Prediction
Data Science Algorithms: The Basic Methods
MIS2502: Data Analytics Clustering and Segmentation
Classification and Prediction
Exam #3 Review Zuyin (Alvin) Zheng.
(classification & regression trees)
Classification by Decision Tree Induction
MIS2502: Data Analytics Clustering and Segmentation
SAS Homework 2 Review Decision trees
Data Mining for Business Analytics
Machine Learning Chapter 3. Decision Tree Learning
Data Mining – Chapter 3 Classification
15.1 Goodness-of-Fit Tests
Machine Learning Chapter 3. Decision Tree Learning
MIS2502: Data Analytics Clustering and Segmentation
MIS2502: Data Analytics Clustering and Segmentation
MIS2502: Data Analytics Classification Using Decision Trees
Decision Tree (Rule Induction)
Data Mining CSCI 307, Spring 2019 Lecture 6
Presentation transcript:

MIS2502: Data Analytics Classification using Decision Trees Jaclyn Hansberry jaclyn.hansberry@temple.edu

What is classification? Determining to what group a data element belongs Or “attributes” of that “entity” Examples Determining whether a customer should be given a loan Flagging a credit card transaction as a fraudulent charge Categorizing a news story as finance, entertainment, or sports

How classification works Split the data set into training and validation subsets 1 Choose a discrete outcome variable 2 Find model that predicts the outcome as a function of the other attributes 3 Apply the model to the validation set to check accuracy 4 Apply the final model to future cases 5

Decision Tree Learning Training Set Trans. ID Charge Amount Avg. Charge 6 months Item Same state as billing Classification 1 $800 $100 Electronics No Fraudulent 2 $60 Gas Yes Legitimate 3 $1 $50 4 $200 Restaurant 5 $40 6 $80 Groceries 7 $140 Retail 8 Derive model Classification software Validation Set Trans. ID Charge Amount Avg. Charge 6 months Item Same state as billing Classification 101 $100 $200 Electronics Yes ? 102 Groceries No 103 $1 Gas 104 $30 $25 Restaurant Apply model

Goals The trained model should assign new cases to the right category It won’t be 100% accurate, but should be as close as possible The model’s rules can be applied to new records as they come along An automated, reliable way to predict the outcome

Classification Method: The Decision Tree A model to predict membership of cases or values of a dependent variable based on one or more predictor variables (Tan, Steinback, and Kumar 2004)

Example: Credit Card Default Classification Predictors We create the tree from a set of training data Each unique combination of predictors is associated with an outcome This set was “rigged” so that every combination is accounted for and has an outcome TID Income Debt Owns/ Rents Outcome 1 25k 35% Owns Default 2 35k 40% Rent 3 33k 15% No default 4 28k 19% Rents 5 55k 30% 6 48k 7 65k 17% 8 85k 10% … Training Data

Example: Credit Card Default Classification Leaf node Predictors Child node TID Income Debt Owns/ Rents Outcome 1 25k 35% Owns Default 2 35k 40% Rent 3 33k 15% No default 4 28k 19% Rents 5 55k 30% 6 48k 7 65k 17% 8 85k 10% … Root node Credit Approval Income <40k Debt > 20% Owns house Default Rents Debt < 20% No Default Income >40k Training Data

Same Data, Different Tree Classification Predictors Credit Approval Owns Income < 40k Debt > 20% Default Debt < 20% No Default Income > 40k Rents TID Income Debt Owns/ Rents Outcome 1 25k 35% Owns Default 2 35k 40% Rent 3 33k 15% No default 4 28k 19% Rents 5 55k 30% 6 48k 7 65k 17% 8 85k 10% … Training Data We just changed the order of the predictors.

Apply to new (validation) data TID Income Debt Owns/ Rents Decision (Predicted) Decision (Actual) 1 80k 35% Rent Default No Default 2 20k 40% Owns 3 15k 15% 4 50k 19% Rents 5 35k 30% Credit Approval Income <40k Debt > 20% Owns house Default Rents Debt < 20% No Default Income >40k How well did the decision tree do in predicting the outcome? When it’s “good enough,” we’ve got our model for future decisions.

In a real situation… The tree induction software has to deal with instances where… The same set of predictors resulting in different outcomes Multiple paths result in the same outcome Not every combination of predictors is in the training set

Tree Induction Algorithms Tree induction algorithms take large sets of data and compute the tree Similar cases may have different outcomes So probability of an outcome is computed Credit Approval Income <40k Debt > 20% Owns house Default Rents Debt < 20% No Default Income >40k 0.8 For instance, you may find: When income > 40k, debt < 20%, and the customers rents no default occurs 80% of the time.

How the induction algorithm works Start with single node with all training data Yes Are samples all of same classification? No No Are there predictor(s) that will split the data? Yes Partition node into child nodes according to predictor(s) No Are there more nodes (i.e., new child nodes)? DONE! Yes Go to next node.

Start with root node Credit Approval Income <40k Debt > 20% Owns house Default Rents Debt < 20% No Default Income >40k There are both “defaults” and “no defaults” in the set So we need to look for predictors to split the data Credit Approval

Split on income Income <40k Credit Approval Income >40k Debt > 20% Owns house Default Rents Debt < 20% No Default Income >40k Income is a factor (Income < 40, Debt > 20, Owns = Default) but (Income > 40, Debt > 20, Owns = No default) But there are also a combination of defaults and no defaults within each income group So look for another split Credit Approval Income <40k Income >40k

Split on debt Debt is a factor Credit Approval Income <40k Debt > 20% Owns house Default Rents Debt < 20% No Default Income >40k Debt is a factor (Income < 40, Debt < 20, Owns = No default) but (Income < 40, Debt < 20, Rents = Default) But there are also a combination of defaults and no defaults within some debt groups So look for another split Credit Approval Income <40k Debt >20% Debt <20% Income >40k

Split on Owns/Rents Owns/Rents is a factor Credit Approval Income <40k Debt > 20% Owns house Default Rents Debt < 20% No Default Income >40k Credit Approval Income <40k Debt >20% Default Debt <20% Owns house No Default Rents Income >40k Owns/Rents is a factor For some cases it doesn’t matter, but for some it does So you group similar branches …And we stop because we’re out of predictors!

How does it know when and how to split? There are statistics that show When a predictor variable maximizes distinct outcomes (if age is a predictor, we should see that older people buy; younger people don’t) When a predictor variable separates outcomes (if age is a predictor, we should not see older people who buy mixed up with older people who don’t)

The Chi-squared test Is the proportion of the outcome class the same in each child node? It shouldn’t be, or the classification isn’t very helpful Observed Owns Rents Default 300 450 750 No Default 550 200 850 650 1500 Expected Owns Rents Default 425 325 750 No Default 850 650 1500 This means that owning or renting has no influence on default rate (it’s 50/50 in either case) Root (n=1500) Default = 750 No Default = 750 Owns (n=850) Default = 300 No Default = 550 Rents (n=650) Default = 450 No Default = 200 If the groups were the same, you’d expect an even split (Expected) But we can see they aren’t distributed evenly (Observed) But is it enough (i.e., statistically significant)?

Chi-squared test Asks: How different is the observed distribution from the expected distribution? Note that you’ve got four distance calculations and four cells. The more difference in each cell, the higher the test statistic Observed Owns Rents Default 300 450 750 No Default 550 200 850 650 1500 Expected Owns Rents Default 425 325 750 No Default 850 650 1500 Small p-values (i.e., less than 0.05 mean it’s very unlikely the groups are the same) So Owns/Rents is a predictor that creates two different groups

Bottom line: Interpreting the Chi-Squared Test High statistic (low p-value) from chi-squared test means the groups are different R calculates the logworth value behind the scenes Which is -log(p-value) Way to compare split variables (big logworths = ) Low p-values are better: -log(0.05)  2.99 -log(0.0001)  9.21

Reading the R Decision Tree Outcome cases are not 100% certain There are probabilities attached to each outcome in a node So let’s code “Default” as 1 and “No Default” as 0 Credit Approval Income <40k Debt >20% 0: 15% 1: 85% Debt <20% Owns house 0: 80% 1: 20% Rents 0: 70% 1: 30% Income >40k 0: 78% 1: 22% 0: 40% 1: 60% 0: 74% 1: 26% So what is the chance that: A renter making more than $40,000 and debt more than 20% of income will default? A home owner making less than $40,000 and debt more than 20% of income will default?