Presenter: Wen-Feng Hsiao ( 蕭文峰 ) 2009/8/31 1. Outline Introduction Related Work Proposed Method Experiments and Results Conclusions 2.

Presenter: Wen-Feng Hsiao ( 蕭文峰 ) 2009/8/31 1

Outline Introduction Related Work Proposed Method Experiments and Results Conclusions 2

Introduction Ordinal Classification Tasks Grading (student’s performance) Rating (credit rating, customer’s rating toward products) Ranking (query results) Properties of Ordinal Classification Like multi-class classification tasks, the class values in ordinal classification are discrete; but ordered. Like regression prediction tasks, the class values in ordinal classification are ordered; but not equal- spaced/continuous. 3

Introduction (cont’d) SVM has been shown to be a very powerful and efficient method for multi-class classification tasks. SVM has been further extended to regression domain, called SVR, short for Support Vector Regression. Therefore, several researchers try to understand if SVM can be further applied to ordinal classification tasks. 4

Introduction (cont’d) Most existent methods for ordinal classification do not make use of the ordinal characteristics of the data. But oSVM, proposed recently by Cardoso and da Costa (2007), which makes use of ordinal characteristics of the data. It has been shown to outperform traditional methods in predicting ordinal classes. 5

Introduction (cont’d) However, our empirical experiments showed that oSVM suffers from two problems: (1) it cannot handle datasets with noisy data; (2)it often misclassifies instances near class boundaries. We propose to apply fuzzy sets to ordinal support vector machine in a hope to resolve the above problems simultaneously. 6

Introduction (cont’d) But the challenge is how to devise a reasonable membership function (mf) to assign the membership degrees for instances. We’ve proposed two mf’s. The experiments show that the proposed mf’s are promising, though still need to be verified further. 7

Related Work – preprocess & postprocess Kramer, et al. (2000) proposed to use pre-process mechanisms to translate ordinal values into metric values, then use regression model for the tasks. Problem: the true metric distances between the ordinal scales are unknown in most of the tasks. 9

Related Work –converting to binary classification tasks Frank and Hall (2001) converted an ordinal classification problem into nested binary classification problems that encode the ordering of the original ranks. The prediction of an instance can be obtained by combining the results of these binary classifiers. 10

Frank and Hall (2001) (cont’d) 1. Can be applied only for methods that can output class probability estimates. 2. Can go wrong when the calculated prob. is negative. 11

Related Work – Ordinal Support Vector Machine SVM is a supervised learning method, includes both versions for classification and regression tasks: SVC and SVR. Support Vector Machine for Classification (SVC), without particular specification, SVM stands for SVC. Support Vector Machine for Regression (SVR) 12

Related Work – SVM ABC Perpendicular bisector of the shortest line connecting the convex hulls The instances closest to the maximum margin hyperplane are called support vectors - - 13

Related Work – SVM linear Primal Dual - - 14

Related Work – SVM soft margin Primal Dual 15

Related Work – SVM’s kernel trick Sometimes, seemingly linear unsolvable problems can be linearly resolved by mapping instances into higher dimension. But SVM involves inner product between two instances … mapping to higher dimension then conduct inner product calculation can be computation intensive. 16

Related Work – SVM’s nonlinear 17 Fortunately, kernel function can be used to relieve this burden. Common Kernel functions Polynomial Radial Basis Function

Related Work – oSVM (Cardoso and da Costa, 2007) 18

Related Work – oSVM (cont’d) 19

Related Work – oSVM (cont’d) 20

Related Work – Fuzzy SVM (Lin and Wang, 2002) 21 Define m i is the membership of instance i; the larger value of an instance’s m i is, the more important of this instance. Primal Dual

oSVM’s error for grade dataset Is oSVM good enough? No! It is easily influenced by noise. 23

oSVM’s error for grade dataset (cont’d) 24

oSVM’s error for ERA dataset It goes wrong when the class boundaries are vague. 25

oSVM’s error for ERA dataset (cont’d) 26

oSVM’s error for ERA dataset (cont’d) class 1class 2class 3class 4class 5class 6class 7class 8class 9 1st(0.97,1.06)(0.97,1.16)(0.97,1.35) 2n d (0.91,1)(0.91,1.1)(0.91,1.29)(0.91,1.1) 3rd (-1,-0.81)(-1,-0.62)(-1,-0.81)(-1,-0.86) 4th (-1,-0.62)(-1,-0.81)(-1,-0.86)(-1,-0.62) 5th (-1,-0.81)(-1,-0.86)(-1,-0.62)(-0.98,-0.62) 6th (-1.03,- 0.89) (-1.03,-0.66)(-1.01,-0.66)(-1,-0.66) 7th (-1.1,-0.73)(-1.08,-0.73)(-1.07,-0.73)(-1.06,-0.73) 8th (-1.17,-0.81)(-1.16,-0.81)(-1.15,-0.81) 27

Proposed Membership function 1 28 class 2 class 3 min d A d11 For any instance x ij, which longs to class i, its membership degree can be defined as:

Proposed Membership function 2 29 hyperplane class 2 class 3 A d 1i d1d1 d2d2 d 2i d3d3 d 3i d4d4 d 4i For any instance x ij, which longs to class i, its membership degree can be defined as:

Experiments The oSVM codes are from Cardoso and da Costa (2007) in matlab. We base their codes to develop our FoSVM. ordinalClassifier is from weka (c4.5 is the base classifier) SVR is from libsvm’s -SVR (better than  -SVR) 10 datasets are used for comparing these five classifiers Two measures are employed to compare the performance: mean zero-one error and mean absolute error. The experiments use 10-fold cross-validation method to obtain over all performance. 31

10 Datasets 32

classifier data oSVMFoSVM (MF1)FoSVM (MF2)ordinalClassifierlibsvm Auto MPG47.14%±3.46%44.79%±3.97%51.3%±4.35%53.12%±2.83% *83.44%±16.61 % Diabetes84.23%±1.72% *84.62%±10.59 % 75.77%±12.04%78.85%±14.09%71.92%±9.75% ERA84.65%±1.63%85.55%±4.59%84.23%±1.33%87.31%±5.6%85.42%±5.84% ESL38.98%±9.54%44.84%±12.35%42.94%±8.93%*55.9%±8.98%53.28%±8.35% LEV66.57%±10.9%67.82%±7.03%72.88%±12.09%73.43%±9.37%*74.5%±12.19% SWD79.73%±8.89%*82.48%±5.68%73.99%±10.5%64.27%±3.3%72.45%±10.41% Grade12.57%±9.35%8.09%±4.68%9.74%±8.66%41.7%±20.43%*43.65%±9.31% wpbc84.67%±8.92%86.75%±8.71%83.91%±6.64%87.37%±5.12%86.54%±3.33% machine34.2%±13.88%32.78%±7.38%32.87%±12.92%28.82%±14.7%*43.34%±14.52% stock25.82%±4.2%22.5%±3.78%29.07%±5.2%21.61%±5.02%*53.9%±4.88% classifier data oSVMFoSVM (MF1)FoSVM (MF2) Auto MPG47.14%±3.46%44.79%±3.97%51.3%±4.35% Diabetes84.23%±1.72% *84.62%±10.59 % 75.77%±12.0% ERA84.65%±1.63%85.55%±4.59%84.23%±1.33% ESL38.98%±9.54%44.84%±12.35%42.94%±8.93% LEV66.57%±10.9%67.82%±7.03%72.88%±12.09% SWD79.73%±8.89%*82.48%±5.68%73.99%±10.5% Grade12.57%±9.35%8.09%±4.68%9.74%±8.66% wpbc84.67%±8.92%86.75%±8.71%83.91%±6.64% machine34.2%±13.88%32.78%±7.38%32.87%±12.92% stock25.82%±4.2%22.5%±3.78%29.07%±5.2% Classifier comparison – mean 0-1 error 33 Mean zero-one error gives an error of 1 to every incorrect prediction that is the fraction of incorrect predictions.

classifier data oSVMFoSVM (MF1)FoSVM (MF2)ordinalClassifierlibsvm Auto MPG0.5654±0.43260.5305±0.05020.6427±0.07040.6726±0.0451*1.4313±0.0294 Diabetes1.8193±0.08751.8577±0.41561.5±0.4254*1.8923±0.711.0808±0.1444 ERA1.9394±0.1477*2.0575±0.33771.9577±0.14741.7755±0.10491.7451±0.2711 ESL0.4191±0.12340.4736±0.13640.4509±0.10860.6146±0.3535*0.657±0.0424 LEV0.8065±0.14950.826±0.1390.9447±0.20570.9023±0.1102*1.1290±0.1964 SWD0.8807±0.13080.9113±0.07680.9239±0.1460.6984±0.0859*1.0403±0.1694 Grade0.1257±0.09350.0809±0.04680.0974±0.08660.426±0.2037*0.4365±0.0931 wpbc2.3984±0.53312.6403±0.57972.6582±0.25252.7002±0.71062.3092±0.1139 machine0.4817±0.25770.4435±0.17250.4816±0.24440.3928±0.2092*0.6636±0.3419 stock0.2582±0.0420.225±0.03780.2927±0.05970.2256±0.0587*0.6216±0.0765 classifier data oSVMFoSVM (MF1)FoSVM (MF2) Auto MPG0.5654±0.43260.5305±0.05020.6427±0.0704 Diabetes1.8193±0.08751.8577±0.41561.5±0.4254 ERA1.9394±0.1477*2.0575±0.33771.9577±0.1474 ESL0.4191±0.12340.4736±0.13640.4509±0.1086 LEV0.8065±0.14950.826±0.1390.9447±0.2057 SWD0.8807±0.13080.9113±0.07680.9239±0.146 Grade0.1257±0.09350.0809±0.04680.0974±0.0866 wpbc2.3984±0.53312.6403±0.57972.6582±0.2525 machine0.4817±0.25770.4435±0.17250.4816±0.2444 stock0.2582±0.0420.225±0.03780.2927±0.0597 Classifier comparison – MAE 34 Mean absolute error is the average deviation of the prediction from the true target, i.e., in which we treat the ordinal scales as consecutive integers

Membership function 3 36 hyperplane class 2 class 3 A d 1i d1d1 d2d2 d 2i d3d3 d 3i d4d4 d 4i Fuzzy width For any instance x ij, which longs to class i, its membership degree can be defined as:

References Cardoso, J. S., and da Costa, J. F. P. (2007), "Learning to classify ordinal data: The data replication method," Journal of Machine Learning Research, Vol. 8, pp. 1393–1429. Frank, E. and Hall, M. (2001). “A simple approach to ordinal classification,” Proceedings of the European Conference on Machine Learning, pages 145–165. Kramer, S., Widmer, G., Pfahringer, B., and DeGroeve, M. (2001 ), “Prediction of ordinal classes using regression trees.,” Fundamenta Informaticae, 47:1–13. Lin, C.-F., and Wang, S.-D. (2002), "Fuzzy Support Vector Machines," IEEE Transactions on Neural Networks, Vol. 13, No. 2, pp 464-471. 37

Presenter: Wen-Feng Hsiao ( 蕭文峰 ) 2009/8/31 1. Outline Introduction Related Work Proposed Method Experiments and Results Conclusions 2.

Similar presentations

Presentation on theme: "Presenter: Wen-Feng Hsiao ( 蕭文峰 ) 2009/8/31 1. Outline Introduction Related Work Proposed Method Experiments and Results Conclusions 2."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Presenter: Wen-Feng Hsiao ( 蕭文峰 ) 2009/8/31 1. Outline Introduction Related Work Proposed Method Experiments and Results Conclusions 2.

Similar presentations

Presentation on theme: "Presenter: Wen-Feng Hsiao ( 蕭文峰 ) 2009/8/31 1. Outline Introduction Related Work Proposed Method Experiments and Results Conclusions 2."— Presentation transcript:

Similar presentations

About project

Feedback