Presentation is loading. Please wait.

Presentation is loading. Please wait.

DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.

Similar presentations


Presentation on theme: "DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class."— Presentation transcript:

1 DATA MINING : CLASSIFICATION

2 Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class label attributes).  training data.  A model is created by running the algorithm on the training data.   Test the model. If accuracy is low, regenerate the model, after changing features,reconsidering samples.   Identify a class label for the incoming new data.

3 Applications: Classifying credit card transactions as legitimate or fraudulent. Classifying credit card transactions as legitimate or fraudulent. Classifying secondary structures of protein as alpha-helix, beta-sheet, or random coil. Classifying secondary structures of protein as alpha-helix, beta-sheet, or random coil. Categorizing news stories as finance, weather, entertainment, sports, etc. Categorizing news stories as finance, weather, entertainment, sports, etc.

4 Classification: A two step process Model construction: describing a set of predetermined classes. Each sample is assumed to belong to a predefined class, as determined by the class label attribute. The set of samples used for model construction is training set. The model is represented as classification rules, decision trees, or mathematical formula.

5 Model usage: for classifying future or unknown objects. Estimate accuracy of the model. The known label of test sample is compared with the classified result from the model. Accuracy rate is the percentage of test set samples that are correctly classified by the model. Test set is independent of training set. If the accuracy is acceptable, use the model to classify data samples whose class labels are not known.

6 Model Construction: Training Data Classifier (Model) IF rank = ‘professor’ OR years > 6 THEN tenured = ‘yes’ Classification Algorithms

7 Classification Process (2): Use the Model in Prediction Classifier Testing Data Unseen Data (Jeff, Professor, 4) Tenured?

8 Classification techniques: Decision Tree based Methods Decision Tree based Methods Rule-based Methods Rule-based Methods Neural Networks Neural Networks Bayesian Classification Support Vector Machines Support Vector Machines

9 Algorithm for decision tree induction: Basic algorithm: Basic algorithm: Tree is constructed in a top-down recursive divide- and-conquer manner. Tree is constructed in a top-down recursive divide- and-conquer manner. At start, all the training examples are at the root. At start, all the training examples are at the root. Attributes are categorical (if continuous-valued, they are discretized in advance). Attributes are categorical (if continuous-valued, they are discretized in advance). Examples are partitioned recursively based on selected attributes. Examples are partitioned recursively based on selected attributes.

10 Example of Decision Tree: Training Dataset

11 Output: A Decision Tree for“buys_computer” age? overcast student?credit rating? noyes fair excellent <=30 >40 no yes 30..40

12 Advantages of decision tree based classification: Inexpensive to construct. Inexpensive to construct. Extremely fast at classifying unknown records. Extremely fast at classifying unknown records. Easy to interpret for small-sized trees. Easy to interpret for small-sized trees. Accuracy is comparable to other classification techniques for many simple data sets. Accuracy is comparable to other classification techniques for many simple data sets.

13 Enhancements to basic decision tree induction : Allow for continuous-valued attributes Allow for continuous-valued attributes Dynamically define new discrete-valued attributes that partition the continuous attribute value into a discrete set of intervals Dynamically define new discrete-valued attributes that partition the continuous attribute value into a discrete set of intervals Handle missing attribute values Handle missing attribute values Assign the most common value of the attribute Assign the most common value of the attribute Assign probability to each of the possible values Assign probability to each of the possible values Attribute construction Attribute construction Create new attributes based on existing ones that are sparsely represented Create new attributes based on existing ones that are sparsely represented This reduces fragmentation, repetition, and replication This reduces fragmentation, repetition, and replication

14 Potential Problem: Over fitting: This is when the generated model does not apply to the new incoming data. » Either too small of training data, not covering many cases. » Wrong assumptions Over fitting results in decision trees that are more complex than necessary Over fitting results in decision trees that are more complex than necessary Training error no longer provides a good estimate of how well the tree will perform on previously unseen records Training error no longer provides a good estimate of how well the tree will perform on previously unseen records Need new ways for estimating errors Need new ways for estimating errors

15 How to avoid Over fitting: Two ways to avoid over fitting are –  Pre-pruning  Post-pruning Pre-pruning:  Stop the algorithm before it becomes a fully grown tree.  Stop if all instances belong to the same class.  Stop if no. of instances is less than some user specified threshold

16 Post-pruning: Post-pruning: Grow decision tree to its entirety. Grow decision tree to its entirety. Trim the nodes of the decision tree in a bottom-up fashion. Trim the nodes of the decision tree in a bottom-up fashion. If generalization error improves after trimming, replace sub-tree by a leaf node. If generalization error improves after trimming, replace sub-tree by a leaf node. Class label of leaf node is determined from majority class of instances in the sub-tree. Class label of leaf node is determined from majority class of instances in the sub-tree.

17 Bayesian Classification Algorithm: Let X be a data sample whose class label is unknown Let X be a data sample whose class label is unknown Let H be a hypothesis that X belongs to class C Let H be a hypothesis that X belongs to class C For classification problems, determine P(H/X): the probability that the hypothesis holds given the observed data sample X For classification problems, determine P(H/X): the probability that the hypothesis holds given the observed data sample X P(H): prior probability of hypothesis H (i.e. the initial probability before we observe any data, reflects the background knowledge) P(H): prior probability of hypothesis H (i.e. the initial probability before we observe any data, reflects the background knowledge) P(X): probability that sample data is observed P(X): probability that sample data is observed P(X|H) : probability of observing the sample X, given that the hypothesis holds P(X|H) : probability of observing the sample X, given that the hypothesis holds

18 Training dataset for Bayesian Classification: Class: C1:buys_compute r= ‘yes’ C2:buys_compute r= ‘no’ Data sample X =(age<=30, Income=medium, Student=yes Credit_rating= Fair)

19 Advantages & Disadvantages of Bayesian Classification: Advantages : Advantages : Easy to implement Easy to implement Good results obtained in most of the cases Good results obtained in most of the cases Disadvantages: Disadvantages: Due to assumption there is loss of accuracy. Due to assumption there is loss of accuracy. Practically, dependencies exist among variables Practically, dependencies exist among variables E.g., hospitals: patients: Profile: age, family history etc,Symptoms: fever, cough etc., Disease: lung cancer, diabetes etc E.g., hospitals: patients: Profile: age, family history etc,Symptoms: fever, cough etc., Disease: lung cancer, diabetes etc Dependencies among these cannot be modeled by Bayesian Classifier Dependencies among these cannot be modeled by Bayesian Classifier

20 Conclusion: Training data is an important factor in building a model in supervised algorithms. The classification results generated by each of the algorithms (Naïve Bayes, Decision Tree, Neural Networks,…) is not considerably different from each other. Different classification algorithms can take different time to train and build models. Mechanical classification is faster Mechanical classification is faster

21 References: www.google.com www.google.com www.google.com http://www.thearling.com www.mamma.com www.mamma.com www.mamma.com www.amazon.com www.amazon.com www.amazon.com http://www.kdnuggets.com C. Apte and S. Weiss. Data mining with decision trees and decision rules. Future Generation Computer Systems, 13, 1997. L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Wadsworth International Group, 1984.

22 Thank you !!!


Download ppt "DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class."

Similar presentations


Ads by Google