1 January 24, 2016Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 7 — Classification Ensemble Learning.

1 January 24, 2016Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 7 — Classification Ensemble Learning

Drawback of Single Classifier If we do some classifiers to choose the best, Which one is the best? –Maybe more than one classifiers meet the criteria (e.g. same training accuracy), especially in the following situations: Without sufficient training data The learning algorithm leads to different local optima easily 2

Drawback of Single Classifier- Cont. Potentially valuable information may be lost by discarding the results of less-successful classifiers –E.g., the discarded classifiers may correctly classify some samples The final decision must be wrong if the output of selected classifier is wrong The trained classifier may not be complex enough to handle the problem 3

Ensemble Methods Combining a number of trained classifiers lead to a better performance than any single one –Errors can be complemented by other correct classifications –Different classifiers have different knowledge regarding the problem To decompose a complex problem into sub- problems for which the solutions obtained are simpler to understand, implement, manage and update 4

General Idea 5

Affecting Factors in ensemble methods Three factors affecting the accuracy: –Accuracy of individual classifiers How good are the individual classifiers? –Fusion Methods How to combine classifiers? –Diversity among classifiers What are the differences between decisions of various classifiers? 6

Accuracy of individual classifier The performance of an individual classifier is affected by –Training Dataset (sample and feature) –Learning Model (types of classifier) –Model’s Parameters (e.g. the number of neurons in NN) 7

Examples of Ensemble Methods How to generate an ensemble of classifiers? –Bootstrap aggregating(Bagging) خودراه انداز متراکم –Boosting (AdaBoost) افزایشی 8

Bagging: Boostrap Aggregation Example in real world: Diagnosis based on multiple doctors’ majority vote Training –Given a set D of d tuples, at each iteration i, a training set D i of d tuples is sampled with replacement from D (i.e., boostrap) –A classifier model M i is learned for each training set D i Classification: classify an unknown sample X –Each classifier M i returns its class prediction –The bagged classifier M* counts the votes and assigns the class with the most votes to X Prediction: can be applied to the prediction of continuous values by taking the average value of each prediction for a given test tuple Accuracy –Often significant better than a single classifier derived from D –For noise data: not considerably worse, more robust –Proved improved accuracy in prediction 9

Bagging Sampling with replacement Build classifier on each bootstrap sample 10

Boosting Example: Consult several doctors, based on a combination of weighted diagnoses—weight assigned based on the previous diagnosis accuracy How boosting works? –Weights are assigned to each training tuple –A series of k classifiers is iteratively learned –After a classifier M i is learned, the weights are updated to allow the subsequent classifier, M i+1, to pay more attention to the training tuples that were misclassified by M i –The final M* combines the votes of each individual classifier, where the weight of each classifier's vote is a function of its accuracy The boosting algorithm can be extended for the prediction of continuous values Comparing with bagging: boosting tends to achieve greater accuracy, but it also risks overfitting the model to misclassified data 11

Boosting Records that are wrongly classified will have their weights increased Records that are classified correctly will have their weights decreased Example 4 is hard to classify Its weight is increased, therefore it is more likely to be chosen again in subsequent rounds 12

Adaboost (Freund and Schapire, 1997) Given a set of d class-labeled tuples, (X 1, y 1 ), …, (X d, y d ) Initially, all the weights of tuples are set the same (1/d) Generate k classifiers in k rounds. At round i, –Tuples from D are sampled (with replacement) to form a training set D i of the same size –Each tuple’s chance of being selected is based on its weight –A classification model M i is derived from D i –Its error rate is calculated using D i as a test set –If a tuple is misclssified, its weight is increased, o.w. it is decreased Error rate: err(X j ) is the misclassification error of tuple X j. Classifier M i error rate is the sum of the weights of the misclassified tuples: The weight of classifier M i ’s vote is

1 January 24, 2016Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 7 — Classification Ensemble Learning.

Similar presentations

Presentation on theme: "1 January 24, 2016Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 7 — Classification Ensemble Learning."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 January 24, 2016Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 7 — Classification Ensemble Learning.

Similar presentations

Presentation on theme: "1 January 24, 2016Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 7 — Classification Ensemble Learning."— Presentation transcript:

Similar presentations

About project

Feedback