© Vipin Kumar CSci 8980 Fall 2002 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.

Slides:



Advertisements
Similar presentations
DECISION TREES. Decision trees  One possible representation for hypotheses.
Advertisements

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
Bab /44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Classification: Definition l Given a collection of records (training set) l Find a model.
1 Data Mining Classification Techniques: Decision Trees (BUSINESS INTELLIGENCE) Slides prepared by Elizabeth Anglo, DISCS ADMU.
Decision Tree.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach,
Classification Kuliah 4 4/29/2015. Classification: Definition  Given a collection of records (training set )  Each record contains a set of attributes,
Data Mining Classification This lecture node is modified based on Lecture Notes for Chapter 4/5 of Introduction to Data Mining by Tan, Steinbach, Kumar,
Lecture Notes for Chapter 4 and towards the end from Chapter 5
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach,
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Classification: Basic Concepts and Decision Trees.
Lecture Notes for Chapter 4 Introduction to Data Mining
Classification: Decision Trees, and Naïve Bayes etc. March 17, 2010 Adapted from Chapters 4 and 5 of the book Introduction to Data Mining by Tan, Steinbach,
Lecture outline Classification Decision-tree classification.
Lecture Notes for Chapter 4 Introduction to Data Mining
CSci 8980: Data Mining (Fall 2002)
Classification: Basic Concepts and Decision Trees
Lecture 5 (Classification with Decision Trees)
Example of a Decision Tree categorical continuous class Splitting Attributes Refund Yes No NO MarSt Single, Divorced Married TaxInc NO < 80K > 80K.
EECS 800 Research Seminar Mining Biological Data
DATA MINING LECTURE 9 Classification Basic Concepts Decision Trees.
1 Data Mining Lecture 3: Decision Trees. 2 Classification: Definition l Given a collection of records (training set ) –Each record contains a set of attributes,
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach,
Chapter 9 – Classification and Regression Trees
Chapter 4 Classification. 2 Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of.
Classification: Basic Concepts, Decision Trees, and Model Evaluation
Classification. 2 Classification: Definition  Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes.
Classification Basic Concepts, Decision Trees, and Model Evaluation
Decision Trees and an Introduction to Classification.
Modul 6: Classification. 2 Classification: Definition  Given a collection of records (training set ) Each record contains a set of attributes, one of.
Review - Decision Trees
Decision Trees Jyh-Shing Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation COSC 4368.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Minqi Zhou.
Longin Jan Latecki Temple University
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach,
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach,
Classification: Basic Concepts, Decision Trees. Classification: Definition l Given a collection of records (training set ) –Each record contains a set.
Decision Trees Example of a Decision Tree categorical continuous class Refund MarSt TaxInc YES NO YesNo Married Single, Divorced < 80K> 80K Splitting.
Lecture Notes for Chapter 4 Introduction to Data Mining
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach,
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4.
1 Illustration of the Classification Task: Learning Algorithm Model.
Classification: Basic Concepts, Decision Trees. Classification Learning: Definition l Given a collection of records (training set) –Each record contains.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining By Tan, Steinbach,
Lecture Notes for Chapter 4 Introduction to Data Mining
Lecture Notes for Chapter 4 Introduction to Data Mining
EECS 647: Introduction to Database Systems
Data Mining Classification: Basic Concepts and Techniques
Lecture Notes for Chapter 4 Introduction to Data Mining
Lecture Notes for Chapter 4 Introduction to Data Mining
Machine Learning” Notes 2
Lecture Notes for Chapter 4 Introduction to Data Mining
Data Mining: Concepts and Techniques
Lecture Notes for Chapter 4 Introduction to Data Mining
Basic Concepts and Decision Trees
Lecture Notes for Chapter 4 Introduction to Data Mining
Lecture Notes for Chapter 4 Introduction to Data Mining
آبان 96. آبان 96 Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan,
Chapter 4 Classification
Lecture Notes for Chapter 4 Introduction to Data Mining
Lecture Notes for Chapter 4 Introduction to Data Mining
Lecture Notes for Chapter 4 Introduction to Data Mining
Lecture Notes for Chapter 4 Introduction to Data Mining
Lecture Notes for Chapter 4 Introduction to Data Mining
Lecture Notes for Chapter 4 Introduction to Data Mining
COP5577: Principles of Data Mining Fall 2008 Lecture 4 Dr
Presentation transcript:

© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer Science University of Minnesota

© Vipin Kumar CSci 8980 Fall Splitting Criterion l Gini Index l Entropy and Information Gain l Misclassification error

© Vipin Kumar CSci 8980 Fall Splitting Criterion: GINI l Gini Index for a given node t : (NOTE: p( j | t) is the relative frequency of class j at node t). –Measures impurity of a node.  Maximum (1 - 1/n c ) when records are equally distributed among all classes, implying least interesting information  Minimum (0.0) when all records belong to one class, implying most interesting information

© Vipin Kumar CSci 8980 Fall Examples for computing GINI P(C1) = 0/6 = 0 P(C2) = 6/6 = 1 Gini = 1 – P(C1) 2 – P(C2) 2 = 1 – 0 – 1 = 0 P(C1) = 1/6 P(C2) = 5/6 Gini = 1 – (1/6) 2 – (5/6) 2 = P(C1) = 2/6 P(C2) = 4/6 Gini = 1 – (2/6) 2 – (4/6) 2 = 0.444

© Vipin Kumar CSci 8980 Fall Splitting Based on GINI l Used in CART, SLIQ, SPRINT. l Splitting Criterion: Minimize Gini Index of the Split. l When a node p is split into k partitions (children), the quality of split is computed as, where,n i = number of records at child i, n = number of records at node p.

© Vipin Kumar CSci 8980 Fall Binary Attributes: Computing GINI Index l Splits into two partitions l Effect of Weighing partitions: –Larger and Purer Partitions are sought for. B? YesNo Node N1Node N2 Gini(N1) = 1 – (5/6) 2 – (1/6) 2 = Gini(N2) = 1 – (2/6) 2 – (4/6) 2 = Gini(Children) = 6/12 * /12 * = 0.361

© Vipin Kumar CSci 8980 Fall Categorical Attributes: Computing Gini Index l For each distinct value, gather counts for each class in the dataset l Use the count matrix to make decisions Multi-way splitTwo-way split (find best partition of values)

© Vipin Kumar CSci 8980 Fall Continuous Attributes: Computing Gini Index l Use Binary Decisions based on one value l Several Choices for the splitting value –Number of possible splitting values = Number of distinct values l Each splitting value has a count matrix associated with it –Class counts in each of the partitions, A < v and A  v l Simple method to choose best v –For each v, scan the database to gather count matrix and compute its Gini index –Computationally Inefficient! Repetition of work.

© Vipin Kumar CSci 8980 Fall Continuous Attributes: Computing Gini Index... l For efficient computation: for each attribute, –Sort the attribute on values –Linearly scan these values, each time updating the count matrix and computing gini index –Choose the split position that has the least gini index Split Positions Sorted Values

© Vipin Kumar CSci 8980 Fall Alternative Splitting Criteria based on INFO l Entropy at a given node t: (NOTE: p( j | t) is the relative frequency of class j at node t). –Measures homogeneity of a node.  Maximum (log n c ) when records are equally distributed among all classes implying least information  Minimum (0.0) when all records belong to one class, implying most information –Entropy based computations are similar to the GINI index computations

© Vipin Kumar CSci 8980 Fall Examples for computing Entropy P(C1) = 0/6 = 0 P(C2) = 6/6 = 1 Entropy = – 0 log 0 – 1 log 1 = – 0 – 0 = 0 P(C1) = 1/6 P(C2) = 5/6 Entropy = – (1/6) log 2 (1/6) – (5/6) log 2 (1/6) = 0.65 P(C1) = 2/6 P(C2) = 4/6 Entropy = – (2/6) log 2 (2/6) – (4/6) log 2 (4/6) = 0.92

© Vipin Kumar CSci 8980 Fall Splitting Based on INFO... l Information Gain: Parent Node, p is split into k partitions; n i is number of records in partition i –Measures Reduction in Entropy achieved because of the split. Choose the split that achieves most reduction (maximizes GAIN) –Used in ID3 and C4.5

© Vipin Kumar CSci 8980 Fall Splitting Based on INFO... l Disadvantage of Information Gain: –Tends to prefer splits that result in large number of partitions, each being small but pure. l Gain Ratio: Parent Node, p is split into k partitions n i is the number of records in partition i –Adjusts Information Gain by the entropy of the partitioning (SplitINFO). Higher entropy partitioning (large number of small partitions) is penalized! –Used in C4.5

© Vipin Kumar CSci 8980 Fall Splitting Criteria based on Classification Error l Classification error at a node t : l Measures misclassification error made by a node.  Maximum (1 - 1/n c ) when records are equally distributed among all classes, implying least interesting information  Minimum (0.0) when all records belong to one class, implying most interesting information

© Vipin Kumar CSci 8980 Fall Examples for Computing Error P(C1) = 0/6 = 0 P(C2) = 6/6 = 1 Error = 1 – max (0, 1) = 1 – 1 = 0 P(C1) = 1/6 P(C2) = 5/6 Error = 1 – max (1/6, 5/6) = 1 – 5/6 = 1/6 P(C1) = 2/6 P(C2) = 4/6 Error = 1 – max (2/6, 4/6) = 1 – 4/6 = 1/3

© Vipin Kumar CSci 8980 Fall Comparison among Splitting Criteria For a 2-class problem:

© Vipin Kumar CSci 8980 Fall Misclassification Error vs Gini A? YesNo Node N1Node N2 Gini(N1) = 1 – (3/3) 2 – (0/3) 2 = 0 Gini(N2) = 1 – (4/7) 2 – (3/7) 2 = Gini(Children) = 3/10 * 0 + 7/10 * = Gini improves !!

© Vipin Kumar CSci 8980 Fall C4.5 l Simple depth-first construction. l Uses Information Gain l Sorts Continuous Attributes at each node. l Needs entire data to fit in memory. l Unsuitable for Large Datasets. –Needs out-of-core sorting.

© Vipin Kumar CSci 8980 Fall Practical Challenges of Classification l Overfitting –Model performs well on training, but poorly on test data. l Missing Values l Data Heterogeneity l Costs –Costs for measuring attributes –Costs of misclassification

© Vipin Kumar CSci 8980 Fall Overfitting (Example) 500 circular and 500 triangular data points. Circular points: 0.5  sqrt(x 1 2 +x 2 2 )  1 Triangular points: sqrt(x 1 2 +x 2 2 ) > 0.5 or sqrt(x 1 2 +x 2 2 ) < 1

© Vipin Kumar CSci 8980 Fall Overfitting… Overfitting

© Vipin Kumar CSci 8980 Fall Overfitting due to Noise Decision boundary is distorted by noise point

© Vipin Kumar CSci 8980 Fall Overfitting due to Insufficient Examples Insufficient number of points may cause the decision boundary to change