# Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree.

## Presentation on theme: "Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree."— Presentation transcript:

Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree

Bab 4.1 - 2/44 Classification: Definition

Bab 4.1 - 3/44 Example of Classification Task

Bab 4.1 - 4/44 General Approach for Building Classification Model

Bab 4.1 - 5/44 Classification Techniques

Bab 4.1 - 6/44 Example of Decision Tree

Bab 4.1 - 7/44 Another Example of Decision Tree

Bab 4.1 - 8/44 Decision Tree Classification Task

Bab 4.1 - 9/44 Apply Model to Test Data

Bab 4.1 - 10/44 Decision Tree Classification Task

Bab 4.1 - 11/44 Decision Tree Induction

Bab 4.1 - 12/44 General Structure of Hunt’s Algorithm

Bab 4.1 - 13/44 Hunt’s Algorithm

Bab 4.1 - 14/44 Design Issues of Decision Tree Induction

Bab 4.1 - 15/44 Methods for Expression Test Conditions

Bab 4.1 - 16/44 Test Condition for Nominal Attributes

Bab 4.1 - 17/44 Test Condition for Ordinal Attributes

Bab 4.1 - 18/44 Test Condition for Continues Attributes

Bab 4.1 - 19/44 Splitting Based on Continues Attributes

Bab 4.1 - 20/44 How to Determine the Best Split / 1

Bab 4.1 - 21/44 How to Determine the Best Split / 2

Bab 4.1 - 22/44 Measures of Node Impurity

Bab 4.1 - 23/44 Finding the Best Split / 1

Bab 4.1 - 24/44 Finding the Best Split / 2

Bab 4.1 - 25/44 Measure of Impurity: GINI

Bab 4.1 - 26/44 Computing GINI Index of a Single Node

Bab 4.1 - 27/44 Computing GINI Index for a Collection of Nodes

Bab 4.1 - 28/44 Binary Attributes: Computing GINI Index

Bab 4.1 - 29/44 Categorical Attributes: Computing GINI Index

Bab 4.1 - 30/44 Continuous Attributes: Computing GINI Index / 1

Bab 4.1 - 31/44 Continuous Attributes: Computing GINI Index / 2

Bab 4.1 - 32/44 Measure of Impurity: Entropy

Bab 4.1 - 33/44 Computing Entropy of a Single Node

Bab 4.1 - 34/44 Computing information Gain After Splitting

Bab 4.1 - 35/44 Problems with Information Gain

Bab 4.1 - 36/44 Gain Ratio

Bab 4.1 - 37/44 Measure of Impurity: Classification Error

Bab 4.1 - 38/44 Computing Error of a Single Node

Bab 4.1 - 39/44 Comparison among Impurity Measures For binary (2-class) classification problems

Bab 4.1 - 40/44 Misclassification Error vs Gini index

Bab 4.1 - 41/44 Example: C4.5 Simple depth-first construction. Uses Information Gain Sorts Continuous Attributes at each node. Needs entire data to fit in memory. Unsuitable for Large Datasets.  Needs out-of-core sorting. You can download the software from: http://www.cse.unsw.edu.au/~quinlan/c4.5r8.tar.gz http://www.cse.unsw.edu.au/~quinlan/c4.5r8.tar.gz

Bab 4.1 - 42/44 Scalable Decision Tree Induction / 1 How scalable is decision tree induction?  Particularly suitable for small data set SLIQ (EDBT’96 — Mehta et al.)  Builds an index for each attribute and only class list and the current attribute list reside in memory

Bab 4.1 - 43/44 Scalable Decision Tree Induction / 2 SLIQ Sample data for the class buys_computer Disk-resident attribute lists Memory-resident class list RIDCredit_ratingAgeBuys_computer 1excellent38yes 2excellent26yes 3fair35no 4excellent49no Credit_ratingRID excellent1 2 4 fair3 …… ageRID 262 353 381 494 …… RIDBuys_computernode 1yes5 2 2 3no3 4 6 ……… 0 12 3 4 5 6

Bab 4.1 - 44/44 Decision Tree Based Classification Advantages  Inexpensive to construct  Extremely fast at classifying unknown records  Easy to interpret for small-sized tress  Accuracy is comparable to other classification techniques for many data sets Practical Issues of Classification  Underfitting and Overfitting  Missing Values  Costs of Classification

Download ppt "Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree."

Similar presentations