1 Illustration of the Classification Task: Learning Algorithm Model.

1 Illustration of the Classification Task: Learning Algorithm Model

2 Classification: Definition l Given a collection of records (training set) –Each record contains a set of attributes (x), with one additional attribute which is the class (y). l Find a model to predict the class as a function of the values of other attributes. l Goal: previously unseen records should be assigned a class as accurately as possible. –A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it.

3 Classification Examples l Classifying credit card transactions as legitimate or fraudulent l Classifying secondary structures of protein as alpha-helix, beta-sheet, or random coil l Categorizing news stories as finance, weather, entertainment, sports, etc l Predicting tumor cells as benign or malignant

4 An Example of a Decision Tree categorical continuous class Refund MarSt TaxInc YES NO YesNo Married Single, Divorced < 80K> 80K Splitting Attributes Training Data Model: Decision Tree

5 Applying the Tree Model to Predict the Class for a New Observation Refund MarSt TaxInc YES NO YesNo Married Single, Divorced < 80K> 80K Test Data Start from the root of tree.

6 Applying the Tree Model to Predict the Class for a New Observation Refund MarSt TaxInc YES NO YesNo Married Single, Divorced < 80K> 80K Test Data

10 Applying the Tree Model to Predict the Class for a New Observation Refund MarSt TaxInc YES NO YesNo Married Single, Divorced < 80K> 80K Test Data Assign Cheat to “No”

11 How to Apply Hunt’s Algorithm l Usually it is done in a “greedy” fashion. l “Greedy” means that the optimal split is chosen at each stage according to some criterion. l This may not be optimal at the end even for the same criterion. l However, the greedy approach is computational efficient so it is popular.

12 How to Apply Hunt’s Algorithm (continued) l Using the greedy approach we still have to decide 3 things: #1) What attribute test conditions to consider #2) What criterion to use to select the “best” split #3) When to stop splitting l For #1 we will consider only binary splits for both numeric and categorical predictors as discussed on the next slide l For #2 we will consider misclassification error, Gini index and entropy l #3 is a subtle business involving model selection. It is tricky because we don’t want to overfit or underfit.

13 #1) What Attribute Test Conditions to Consider l We will consider only binary splits for both numeric and categorical predictors as discussed, but your book talks about multiway splits also l Nominal l Ordinal – like nominal but don’t break order with split l Numeric – often use midpoints between numbers CarType {Sports, Luxury} {Family} Size {Medium, Large} {Small} Taxable Income > 80K? Yes No OR Size {Small, Medium} {Large}

14 #2) What criterion to use to select the “best” split l We will consider misclassification error, Gini index and entropy Misclassification Error: Gini Index: Entropy:

15 Misclassification Error l Misclassification error is usually our final metric which we want to minimize on the test set, so there is a logical argument for using it as the split criterion l It is simply the fraction of total cases misclassified l 1 - Misclassification error = “Accuracy”

16 In class exercise

17 Gini Index l This is commonly used in many algorithms like CART and the rpart() function in R l After the Gini index is computed in each node, the overall value of the Gini index is computed as the weighted average of the Gini index in each node

18 Gini Examples for a Single Node P(C1) = 0/6 = 0 P(C2) = 6/6 = 1 Gini = 1 – P(C1) 2 – P(C2) 2 = 1 – 0 – 1 = 0 P(C1) = 1/6 P(C2) = 5/6 Gini = 1 – (1/6) 2 – (5/6) 2 = 0.278 P(C1) = 2/6 P(C2) = 4/6 Gini = 1 – (2/6) 2 – (4/6) 2 = 0.444

19 In class exercise

20 Misclassification Error Vs. Gini Index l The Gini index decreases from.42 to.343 while the misclassification error stays at 30%. This illustrates why we often want to use a surrogate loss function like the Gini index even if we really only care about misclassification. A? YesNo Node N1Node N2 Gini(N1) = 1 – (3/3) 2 – (0/3) 2 = 0 Gini(Children) = 3/10 * 0 + 7/10 * 0.49 = 0.343 Gini(N2) = 1 – (4/7) 2 – (3/7) 2 = 0.490

1 Illustration of the Classification Task: Learning Algorithm Model.

Similar presentations

Presentation on theme: "1 Illustration of the Classification Task: Learning Algorithm Model."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Illustration of the Classification Task: Learning Algorithm Model.

Similar presentations

Presentation on theme: "1 Illustration of the Classification Task: Learning Algorithm Model."— Presentation transcript:

Similar presentations

About project

Feedback