Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Analytics UNIT-IV :Classification

Similar presentations


Presentation on theme: "Data Analytics UNIT-IV :Classification"— Presentation transcript:

1 Data Analytics UNIT-IV :Classification

2 Chapter Sections Decision trees- Overview, general algorithm, decision tree algorithm, evaluating a decision tree. Naïve Bayes – Bayes‟ Algorithm, Naïve Bayes Classifier, smoothing, diagnostics. Diagnostics of classifiers, Additional classification methods.

3 Classification Classification is widely used for prediction
Most classification methods are supervised This chapter focuses on two fundamental classification methods Decision trees Naïve Bayes

4 Decision Trees Tree structure specifies sequence of decisions
Given input X={x1, x2,…, xn}, predict output Y Input attributes/features can be categorical or continuous Node = tests a particular input variable Root node, internal nodes, leaf nodes return class labels Depth of node = minimum steps to reach node Branch (connects two nodes) = specifies decision Two varieties of decision trees Classification trees: categorical output, often binary Regression trees: numeric output

5 Decision Trees Overview of a Decision Tree
Example of a decision tree Predicts whether customers will buy a product

6 Decision Trees Overview of a Decision Tree
Example: will bank client subscribe to term deposit?

7 Decision Trees The General Algorithm
Construct a tree T from training set S Requires a measure of attribute information Simplistic method (data from previous Fig.) Purity = probability of corresponding class E.g., P(no)=1789/2000=89.45%, P(yes)=10.55% Entropy methods Entropy measures the impurity of an attribute Information gain measures purity of an attribute

8 Decision Trees The General Algorithm
Entropy methods of attribute information Hx = the entropy of X Information gain of an attribute = base entropy – conditional entropy

9 Decision Trees The General Algorithm
Construct a tree T from training set S Choose root node = most informative attribute A Partition S according to A’s values Construct subtrees T1, T2… for the subsets of S recursively until one of following occurs All leaf nodes satisfy minimum purity threshold Tree cannot be further split with min purity threshold Other stopping criterion satisfied – e.g., max depth

10 Decision Trees Decision Tree Algorithms
ID3 Algorithm T=training set, P=output variable, A=attribute

11 Decision Trees Decision Tree Algorithms
C4.5 Algorithm Handles missing data Handles both categorical and sontinuous variables Uses bottom-up pruning to address overfitting CART (Classification And Regression Trees) Also handles continuous variables Uses Gini diversity index as info measure

12 Decision Trees Evaluating a Decision Tree
Decision trees are greedy algorithms Best option at each step, maybe not best overall Addressed by ensemble methods: random forest Model might overfit the data Blue = training set Red = test set Overcome overfitting: Stop growing tree early Grow full tree, then prune

13 Decision Trees Evaluating a Decision Tree
Decision trees -> rectangular decision regions

14 Decision Trees Evaluating a Decision Tree
Advantages of decision trees Computationally inexpensive Outputs are easy to interpret – sequence of tests Show importance of each input variable Decision trees handle Both numerical and categorical attributes Categorical attributes with many distinct values Variables with nonlinear effect on outcome Variable interactions

15 Decision Trees Evaluating a Decision Tree
Disadvantages of decision trees Sensitive to small variations in the training data Overfitting can occur because each split reduces training data for subsequent splits Poor if dataset contains many irrelevant variables

16 Chapter Sections Decision trees- Overview, general algorithm, decision tree algorithm, evaluating a decision tree. Naïve Bayes – Bayes‟ Algorithm, Naïve Bayes Classifier, smoothing, diagnostics. Diagnostics of classifiers, Additional classification methods.

17 Naïve Bayes The naïve Bayes classifier
Based on Bayes’ theorem (or Bayes’ Law) Assumes the features contribute independently Features (variables) are generally categorical Discretization of continuous variables is the process of converting continuous variables into categorical ones Output is usually class label plus probability score Log probability often used instead of probability

18 Naïve Bayes Bayes Theorem
where C = class, A = observed attributes Typical medical example Used because doctor’s frequently get this wrong

19 Naïve Bayes Naïve Bayes Classifier
Conditional independence assumption And dropping common denominator, we get Find cj that maximizes P(cj|A)

20 Naïve Bayes Naïve Bayes Classifier
Example: client subscribes to term deposit? The following record is from a bank client. Is this client likely to subscribe to the term deposit?

21 Naïve Bayes Naïve Bayes Classifier
Compute probabilities for this record

22 Naïve Bayes Naïve Bayes Classifier
Compute Naïve Bayes classifier outputs: yes/no The client is assigned the label subscribed = yes The scores are small, but the ratio is what counts Using logarithms helps avoid numerical underflow

23 Naïve Bayes Smoothing A smoothing technique assigns a small nonzero probability to rare events that are missing in the training data E.g., Laplace smoothing assumes every output occurs once more than occurs in the dataset Smoothing is essential – without it, a zero conditional probability results in P(cj|A)=0

24 Naïve Bayes Diagnostics
Naïve Bayes advantages Handles missing values Robust to irrelevant variables Simple to implement Computationally efficient Handles high-dimensional data efficiently Often competitive with other learning algorithms Reasonably resistant to overfitting Naïve Bayes disadvantages Assumes variables are conditionally independent Therefore, sensitive to double counting correlated variables In its simplest form, used only for categorical variables

25 Naïve Bayes Naïve Bayes in R
This section explores two methods of using the naïve Bayes Classifier Manually compute probabilities from scratch Tedious with many R calculations Use naïve Bayes function from e1071 package Much easier – starts on page 222 Example: subscribing to term deposit

26 Chapter Sections Decision trees- Overview, general algorithm, decision tree algorithm, evaluating a decision tree. Naïve Bayes – Bayes‟ Algorithm, Naïve Bayes Classifier, smoothing, diagnostics. Diagnostics of classifiers, Additional classification methods.

27 Diagnostics of Classifiers
The book covered three classifiers Logistic regression, decision trees, naïve Bayes Tools to evaluate classifier performance Confusion matrix

28 Diagnostics of Classifiers
Bank marketing example Training set of 2000 records Test set of 100 records, evaluated below

29 Diagnostics of Classifiers
Evaluation metrics

30 Diagnostics of Classifiers
Evaluation metrics on bank marketing 100 test set poor poor

31 Diagnostics of Classifiers
ROC curve: good for evaluating binary detection Bank marketing: 2000 training set test set > banktrain<-read.table("bank-sample.csv",header=TRUE,sep=",") > drops<-c("balance","day","campaign","pdays","previous","month") > banktrain<-banktrain[,!(names(banktrain) %in% drops)] > banktest<-read.table("bank-sample-test.csv",header=TRUE,sep=",") > banktest<-banktest[,!(names(banktest) %in% drops)] > nb_model<-naiveBayes(subscribed~.,data=banktrain) > nb_prediction<-predict(nb_model,banktest[,-ncol(banktest)],type='raw') > score<-nb_prediction[,c("yes")] > actual_class<-banktest$subscribed=='yes' > pred<-prediction(score,actual_class) # code problem

32 Diagnostics of Classifiers
ROC curve: good for evaluating binary detection Bank marketing: 2000 training set test set

33 Chapter Sections Decision trees- Overview, general algorithm, decision tree algorithm, evaluating a decision tree. Naïve Bayes – Bayes‟ Algorithm, Naïve Bayes Classifier, smoothing, diagnostics. Diagnostics of classifiers, Additional classification methods.

34 Additional Classification Methods
Ensemble methods that use multiple models Bagging: bootstrap method that uses repeated sampling with replacement Boosting: similar to bagging but iterative procedure Random forest: uses ensemble of decision trees These models usually have better performance than a single decision tree Support Vector Machine (SVM) Linear model using small number of support vectors

35 Summary How to choose a suitable classifier among
Decision trees, naïve Bayes, & logistic regression


Download ppt "Data Analytics UNIT-IV :Classification"

Similar presentations


Ads by Google