Classification Heejune Ahn SeoulTech Last updated May. 03
Outline Introduction Purpose, type, and an example Classification design Design flow Simple classifier Linear discriminant functions Mahalanobis distance Bayesian classification K-means clustering : unsupervised learning
1.Pupose Purpose For decision making Topics of Pattern recognition (in artificial intelligence) Model Automation and Human intervention Task specification: what classes, what features Algorithm to used Training: tuning algorithm parameters Classifier (classification rules) classes Features (patterns, structures) Images
2. Supervised vs unsupervised Supervised (classification) trained by examples (by humans) Unsupervised (clustering) only by feature data using the mathematical properties (statistics) of data set
3. An example Classifying nuts Classifier (classification rules) Pine-nuts Lentils Pumpkin seeds Features (circularity, line-fit-error) lentil pumpkin seed pine nut
Observations What if a single features used? What for the singular points? Classification draw boundaries
Terminalogy
4. Design Flow
5. Prototypes & min-distance classifier Prototypes mean of training samples in each class
6. Linear discriminant Linear discriminant function g(x1,x2) = a*x1 + b*x2 + c = 0 Ex 11.1 & Fig11.6
8. Mahalanobis distance Problems In min-dist. mean-value only, no distribution considered e.g. (right figure) std(class 1) << std(class 2) Mahalanobis dist. Variance considered. (larger variance, less distance)
9. Bayesian classification Idea To assign each data to the “most-probable” class, based on “apriori-known probability” Assumption Priors (probability for class) are known. Bayes theorem
10. Bayes decision rule Classification rule Bayes Theorem Intuitively Class-conditional probability density function Prior probability Total probability & Not used in classification decision
Interpretation Need to know priors and class-conditional pdf: often not available MVN (multivariate normal) distribution model Practically quite good approximation MVN N-D Normal distribution with
12. Bayesian classifier for M-varirates taking log( ) It is monotonic increasing function
Case 1: identical independent Linear Machine: the decision region is hyper-plane (linears) Note: when same prob(w), then Minimum distance criterion
Case 2: all covariance is same: Matlab [class, err] = classify(test, training, group[, type, prior]) training and test Type ‘DiagLinear’ for naïve Baysian
Ex11.3 wrong priorscorrect priors
13. Ensemble classifier Combining multiple classifiers Utilizing diversity, similar to ask multiple experts for decision. AdaBoost Weak classifier: change (1/2) < accuracy << 1.0 weighting mis-classified training data for next classifiers H 1 (x)D 1 (x)H 2 (x) D 2 (x) H t (x) D 2 (x) H T (x) D T (x) uniform a t (x)
AdaBoost in details Given: Initialize weight: For t = 1,..., T: 1. WeakLearn, which return the weak classifier with minimum error w.r.t. distribution D t 2. Choose 3. Update Where Z t is a normalization factor chosen so that D t+1 is a distribution Output the strong classifier:
14. K-means clustering K-means Unsupervised classification Group data to minimize Iterative algorithm (re-)assign X i ’s to class (re-)calculate c i Demo
Issues Sensitive to “initial” centroid values. Multiple trials needed => choose the best one ‘K’ (# of clusters) should be given. Trade-off in K (bigger) and the objective function (smaller) No optimal algorithm to determine it. Nevertheless used in most of un-supervised clustering now.
Ex11.4 & F11.10 kmeans function [classIndexes, centers] = kmeans(data, k, options) k : # of clusters Options: ‘Replicates', ‘Display’