Presenter: Wen-Feng Hsiao ( 蕭文峰 ) 2009/8/31 1. Outline Introduction Related Work Proposed Method Experiments and Results Conclusions 2.

Slides:

Advertisements

Similar presentations

Support Vector Machines

Advertisements

ECG Signal processing (2)

INTRODUCTION TO Machine Learning 2nd Edition

SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.

An Introduction of Support Vector Machine

Classification / Regression Support Vector Machines

Combining Classification and Model Trees for Handling Ordinal Problems D. Anyfantis, M. Karagiannopoulos S. B. Kotsiantis, P. E. Pintelas Educational Software.

CHAPTER 10: Linear Discrimination

An Introduction of Support Vector Machine

Support Vector Machines and Kernels Adapted from slides by Tim Oates Cognition, Robotics, and Learning (CORAL) Lab University of Maryland Baltimore County.

Support Vector Machines

SVM—Support Vector Machines

Machine learning continued Image source:

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Groundwater 3D Geological Modeling: Solving as Classification Problem with Support Vector Machine A. Smirnoff, E. Boisvert, S. J.Paradis Earth Sciences.

Support Vector Machines

Support Vector Machine

Support Vector Machines (and Kernel Methods in general)

Support Vector Machines and Kernel Methods

Fuzzy Support Vector Machines (FSVMs) Weijia Wang, Huanren Zhang, Vijendra Purohit, Aditi Gupta.

Margins, support vectors, and linear programming Thanks to Terran Lane and S. Dreiseitl.

CS 4700: Foundations of Artificial Intelligence

Dept. of Computer Science & Engineering, CUHK Pseudo Relevance Feedback with Biased Support Vector Machine in Multimedia Retrieval Steven C.H. Hoi 14-Oct,

1 Computational Learning Theory and Kernel Methods Tianyi Jiang March 8, 2004.

Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.

Optimization Theory Primal Optimization Problem subject to: Primal Optimal Value:

SVMs, cont’d Intro to Bayesian learning. Quadratic programming Problems of the form Minimize: Subject to: are called “quadratic programming” problems.

An Introduction to Support Vector Machines Martin Law.

Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.

Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?

Efficient Model Selection for Support Vector Machines

Support Vector Machines Mei-Chen Yeh 04/20/2010. The Classification Problem Label instances, usually represented by feature vectors, into one of the predefined.

10/18/ Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,

An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.

Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.

SVM Support Vector Machines Presented by: Anas Assiri Supervisor Prof. Dr. Mohamed Batouche.

Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)

An Introduction to Support Vector Machines (M. Law)

Extending the Multi- Instance Problem to Model Instance Collaboration Anjali Koppal Advanced Machine Learning December 11, 2007.

Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.

Christopher M. Bishop, Pattern Recognition and Machine Learning.

CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.

CS 478 – Tools for Machine Learning and Data Mining SVM.

Ohad Hageby IDC Support Vector Machines & Kernel Machines IP Seminar 2008 IDC Herzliya.

Support Vector Machines in Marketing Georgi Nalbantov MICC, Maastricht University.

University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.

Supervised Learning. CS583, Bing Liu, UIC 2 An example application An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc)

Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.

Applying Support Vector Machines to Imbalanced Datasets Authors: Rehan Akbani, Stephen Kwek (University of Texas at San Antonio, USA) Nathalie Japkowicz.

Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.

A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.

Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)

Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.

Computational Intelligence: Methods and Applications Lecture 24 SVM in the non-linear case Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.

SVMs in a Nutshell.

Learning by Loss Minimization. Machine learning: Learn a Function from Examples Function: Examples: – Supervised: – Unsupervised: – Semisuprvised:

Introduction to Machine Learning Prof. Nir Ailon Lecture 5: Support Vector Machines (SVM)

LECTURE 20: SUPPORT VECTOR MACHINES PT. 1 April 11, 2016 SDS 293 Machine Learning.

A Brief Introduction to Support Vector Machine (SVM) Most slides were from Prof. A. W. Moore, School of Computer Science, Carnegie Mellon University.

Support Vector Machine

PREDICT 422: Practical Machine Learning

Support Vector Machine

Sparse Kernel Machines

Table 1. Advantages and Disadvantages of Traditional DM/ML Methods

Support Vector Machines

LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS

Support Vector Machines and Kernels

SVMs for Document Ranking

Support Vector Machines 2

Presentation transcript:

Presenter: Wen-Feng Hsiao ( 蕭文峰 ) 2009/8/31 1

Outline Introduction Related Work Proposed Method Experiments and Results Conclusions 2

Introduction Ordinal Classification Tasks Grading (student’s performance) Rating (credit rating, customer’s rating toward products) Ranking (query results) Properties of Ordinal Classification Like multi-class classification tasks, the class values in ordinal classification are discrete; but ordered. Like regression prediction tasks, the class values in ordinal classification are ordered; but not equal- spaced/continuous. 3

Introduction (cont’d) SVM has been shown to be a very powerful and efficient method for multi-class classification tasks. SVM has been further extended to regression domain, called SVR, short for Support Vector Regression. Therefore, several researchers try to understand if SVM can be further applied to ordinal classification tasks. 4

Introduction (cont’d) Most existent methods for ordinal classification do not make use of the ordinal characteristics of the data. But oSVM, proposed recently by Cardoso and da Costa (2007), which makes use of ordinal characteristics of the data. It has been shown to outperform traditional methods in predicting ordinal classes. 5

Introduction (cont’d) However, our empirical experiments showed that oSVM suffers from two problems: (1) it cannot handle datasets with noisy data; (2)it often misclassifies instances near class boundaries. We propose to apply fuzzy sets to ordinal support vector machine in a hope to resolve the above problems simultaneously. 6

Introduction (cont’d) But the challenge is how to devise a reasonable membership function (mf) to assign the membership degrees for instances. We’ve proposed two mf’s. The experiments show that the proposed mf’s are promising, though still need to be verified further. 7

Outline Introduction Related Work Proposed Method Experiments and Results Conclusions 8

Related Work – preprocess & postprocess Kramer, et al. (2000) proposed to use pre-process mechanisms to translate ordinal values into metric values, then use regression model for the tasks. Problem: the true metric distances between the ordinal scales are unknown in most of the tasks. 9

Related Work –converting to binary classification tasks Frank and Hall (2001) converted an ordinal classification problem into nested binary classification problems that encode the ordering of the original ranks. The prediction of an instance can be obtained by combining the results of these binary classifiers. 10

Frank and Hall (2001) (cont’d) 1. Can be applied only for methods that can output class probability estimates. 2. Can go wrong when the calculated prob. is negative. 11

Related Work – Ordinal Support Vector Machine SVM is a supervised learning method, includes both versions for classification and regression tasks: SVC and SVR. Support Vector Machine for Classification (SVC), without particular specification, SVM stands for SVC. Support Vector Machine for Regression (SVR) 12

Related Work – SVM ABC Perpendicular bisector of the shortest line connecting the convex hulls The instances closest to the maximum margin hyperplane are called support vectors

Related Work – SVM linear Primal Dual

Related Work – SVM soft margin Primal Dual 15

Related Work – SVM’s kernel trick Sometimes, seemingly linear unsolvable problems can be linearly resolved by mapping instances into higher dimension. But SVM involves inner product between two instances … mapping to higher dimension then conduct inner product calculation can be computation intensive. 16

Related Work – SVM’s nonlinear 17 Fortunately, kernel function can be used to relieve this burden. Common Kernel functions Polynomial Radial Basis Function

Related Work – oSVM (Cardoso and da Costa, 2007) 18

Related Work – oSVM (cont’d) 19

Related Work – oSVM (cont’d) 20

Related Work – Fuzzy SVM (Lin and Wang, 2002) 21 Define m i is the membership of instance i; the larger value of an instance’s m i is, the more important of this instance. Primal Dual

Outline Introduction Related Work Proposed Method Experiments and Results Conclusions 22

oSVM’s error for grade dataset Is oSVM good enough? No! It is easily influenced by noise. 23

oSVM’s error for grade dataset (cont’d) 24

oSVM’s error for ERA dataset It goes wrong when the class boundaries are vague. 25

oSVM’s error for ERA dataset (cont’d) 26

oSVM’s error for ERA dataset (cont’d) class 1class 2class 3class 4class 5class 6class 7class 8class 9 1st(0.97,1.06)(0.97,1.16)(0.97,1.35) 2n d (0.91,1)(0.91,1.1)(0.91,1.29)(0.91,1.1) 3rd (-1,-0.81)(-1,-0.62)(-1,-0.81)(-1,-0.86) 4th (-1,-0.62)(-1,-0.81)(-1,-0.86)(-1,-0.62) 5th (-1,-0.81)(-1,-0.86)(-1,-0.62)(-0.98,-0.62) 6th (-1.03,- 0.89) (-1.03,-0.66)(-1.01,-0.66)(-1,-0.66) 7th (-1.1,-0.73)(-1.08,-0.73)(-1.07,-0.73)(-1.06,-0.73) 8th (-1.17,-0.81)(-1.16,-0.81)(-1.15,-0.81) 27

Proposed Membership function 1 28 class 2 class 3 min d A d11 For any instance x ij, which longs to class i, its membership degree can be defined as:

Proposed Membership function 2 29 hyperplane class 2 class 3 A d 1i d1d1 d2d2 d 2i d3d3 d 3i d4d4 d 4i For any instance x ij, which longs to class i, its membership degree can be defined as:

Outline Introduction Related Work Proposed Method Experiments and Results Conclusions 30

Experiments The oSVM codes are from Cardoso and da Costa (2007) in matlab. We base their codes to develop our FoSVM. ordinalClassifier is from weka (c4.5 is the base classifier) SVR is from libsvm’s -SVR (better than  -SVR) 10 datasets are used for comparing these five classifiers Two measures are employed to compare the performance: mean zero-one error and mean absolute error. The experiments use 10-fold cross-validation method to obtain over all performance. 31

10 Datasets 32

classifier data oSVMFoSVM (MF1)FoSVM (MF2)ordinalClassifierlibsvm Auto MPG47.14%±3.46%44.79%±3.97%51.3%±4.35%53.12%±2.83% *83.44%±16.61 % Diabetes84.23%±1.72% *84.62%±10.59 % 75.77%±12.04%78.85%±14.09%71.92%±9.75% ERA84.65%±1.63%85.55%±4.59%84.23%±1.33%87.31%±5.6%85.42%±5.84% ESL38.98%±9.54%44.84%±12.35%42.94%±8.93%*55.9%±8.98%53.28%±8.35% LEV66.57%±10.9%67.82%±7.03%72.88%±12.09%73.43%±9.37%*74.5%±12.19% SWD79.73%±8.89%*82.48%±5.68%73.99%±10.5%64.27%±3.3%72.45%±10.41% Grade12.57%±9.35%8.09%±4.68%9.74%±8.66%41.7%±20.43%*43.65%±9.31% wpbc84.67%±8.92%86.75%±8.71%83.91%±6.64%87.37%±5.12%86.54%±3.33% machine34.2%±13.88%32.78%±7.38%32.87%±12.92%28.82%±14.7%*43.34%±14.52% stock25.82%±4.2%22.5%±3.78%29.07%±5.2%21.61%±5.02%*53.9%±4.88% classifier data oSVMFoSVM (MF1)FoSVM (MF2) Auto MPG47.14%±3.46%44.79%±3.97%51.3%±4.35% Diabetes84.23%±1.72% *84.62%±10.59 % 75.77%±12.0% ERA84.65%±1.63%85.55%±4.59%84.23%±1.33% ESL38.98%±9.54%44.84%±12.35%42.94%±8.93% LEV66.57%±10.9%67.82%±7.03%72.88%±12.09% SWD79.73%±8.89%*82.48%±5.68%73.99%±10.5% Grade12.57%±9.35%8.09%±4.68%9.74%±8.66% wpbc84.67%±8.92%86.75%±8.71%83.91%±6.64% machine34.2%±13.88%32.78%±7.38%32.87%±12.92% stock25.82%±4.2%22.5%±3.78%29.07%±5.2% Classifier comparison – mean 0-1 error 33 Mean zero-one error gives an error of 1 to every incorrect prediction that is the fraction of incorrect predictions.

classifier data oSVMFoSVM (MF1)FoSVM (MF2)ordinalClassifierlibsvm Auto MPG0.5654± ± ± ±0.0451*1.4313± Diabetes1.8193± ± ±0.4254*1.8923± ± ERA1.9394±0.1477*2.0575± ± ± ± ESL0.4191± ± ± ±0.3535*0.657± LEV0.8065± ± ± ±0.1102*1.1290± SWD0.8807± ± ± ±0.0859*1.0403± Grade0.1257± ± ± ±0.2037*0.4365± wpbc2.3984± ± ± ± ± machine0.4817± ± ± ±0.2092*0.6636± stock0.2582± ± ± ±0.0587*0.6216± classifier data oSVMFoSVM (MF1)FoSVM (MF2) Auto MPG0.5654± ± ± Diabetes1.8193± ± ± ERA1.9394±0.1477*2.0575± ± ESL0.4191± ± ± LEV0.8065± ± ± SWD0.8807± ± ±0.146 Grade0.1257± ± ± wpbc2.3984± ± ± machine0.4817± ± ± stock0.2582± ± ± Classifier comparison – MAE 34 Mean absolute error is the average deviation of the prediction from the true target, i.e., in which we treat the ordinal scales as consecutive integers

Outline Introduction Related Work Proposed Method Experiments and Results Conclusions 35

Membership function 3 36 hyperplane class 2 class 3 A d 1i d1d1 d2d2 d 2i d3d3 d 3i d4d4 d 4i Fuzzy width For any instance x ij, which longs to class i, its membership degree can be defined as:

References Cardoso, J. S., and da Costa, J. F. P. (2007), "Learning to classify ordinal data: The data replication method," Journal of Machine Learning Research, Vol. 8, pp. 1393–1429. Frank, E. and Hall, M. (2001). “A simple approach to ordinal classification,” Proceedings of the European Conference on Machine Learning, pages 145–165. Kramer, S., Widmer, G., Pfahringer, B., and DeGroeve, M. (2001 ), “Prediction of ordinal classes using regression trees.,” Fundamenta Informaticae, 47:1–13. Lin, C.-F., and Wang, S.-D. (2002), "Fuzzy Support Vector Machines," IEEE Transactions on Neural Networks, Vol. 13, No. 2, pp