Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.

Similar presentations


Presentation on theme: "Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by."— Presentation transcript:

1 Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by back propagation Support Vector Machines (SVM) Associative classification

2 SVM—History and Applications Vapnik and colleagues (1992)—groundwork from Vapnik & Chervonenkis’ statistical learning theory in 1960s Features: training can be slow but accuracy is high owing to their ability to model complex nonlinear decision boundaries (margin maximization )

3 SVM—History and Applications Used both for classification and prediction Applications: handwritten digit recognition, object recognition, speaker identification, benchmarking time-series prediction tests

4 Linear Classification Binary Classification problem The data above the red line belongs to class ‘x’ The data below red line belongs to class ‘o’ Examples: SVM, Perceptron, Probabilistic Classifiers x x x x xx x x x x o o o o o o o o oo o o o

5 SVM—Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training data into a higher dimension With the new dimension, it searches for the linear optimal separating hyperplane (i.e., “decision boundary”)

6 SVM—Support Vector Machines With an appropriate nonlinear mapping to a sufficiently high dimension, data from two classes can always be separated by a hyperplane SVM finds this hyperplane using support vectors (“essential” training tuples) and margins (defined by the support vectors)

7 Which Separating Hyperplane to Use? Var 1 Var 2

8 Maximizing the Margin Var 1 Var 2 Margin Width IDEA 1: Select the separating hyperplane that maximizes the margin!

9 Support Vectors Var 1 Var 2 Margin Width Support Vectors

10 Why Is SVM Effective on High Dimensional Data? The complexity of trained classifier is characterized by the # of support vectors rather than the dimensionality of the data The support vectors are the essential or critical training examples —they lie closest to the decision boundary (MMH) If all other training examples (besides support vectors) are removed and the training is repeated, the same separating hyperplane would be found

11 Disadvantages of Linear Decision Surfaces Var 1 Var 2

12 Advantages of Non-Linear Surfaces Var 1 Var 2

13 SVM Advantages prediction accuracy is generally high robust, works when training examples contain errors fast evaluation of the learned target function Criticism long training time difficult to understand the learned function (weights) not easy to incorporate domain knowledge

14 SVM Related Links Representative implementations LIBSVM: an efficient implementation of SVM, multi-class classifications, nu-SVM, one-class SVM, including also various interfaces with java, python, etc. SVM-light: simpler but performance is not better than LIBSVM, support only binary classification and only C language SVM-torch: another recent implementation also written in C.


Download ppt "Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by."

Similar presentations


Ads by Google