# A linear least squares framework for learning ordinal classes

## Presentation on theme: "A linear least squares framework for learning ordinal classes"— Presentation transcript:

A linear least squares framework for learning ordinal classes
Ioannis Mariolis, PhD

Outline Introduction to Ordinal Data Modeling
Generalized Linear Models Ordinary Least Squares (OLS) Regression Ordinal Logistic Regression (OLR) Linear Classifier of Ordinal Classes learns a linear model modifies OLS regression Experimental Results synthetic datasets real datasets visual features textile seam quality control Conclusions

Ordinal Data Modeling Collection of measurements called data
Building a model to fit the data The term ordinal refers to the scale of measurement of the data 3

Scales of Measurement Measurement is the assignment of numbers to objects or events in a systematic fashion Four levels of measurement scales are commonly distinguished Nominal Ordinal Interval Ratio

Nominal Scale Nominal measurement consists of assigning items to groups or categories No quantitative information is conveyed and no ordering of the items is implied qualitative rather than quantitative Variables measured on a nominal scale are often referred to as categorical or qualitative variables

Ordinal Scale Measurements with ordinal scales are ordered
higher numbers represent higher values The intervals between the numbers are not necessarily equal There is no "true" zero point for ordinal scales the zero point is chosen arbitrarily

Interval Scale On interval scales, one unit represents the same magnitude across the whole range of the scale Interval scales do not have a "true" zero point It is not possible to make statements about how many times higher one score on that scale is than another e.g. the Celsius scale for temperature equal differences on this scale represent equal differences in temperature but a temperature of 30 degrees is not twice as warm as one of 15 degrees

Ratio Scale Ratio scales are like interval scales except they have true zero points e.g. the Kelvin scale of temperature this scale has an absolute zero a temperature of 300 Kelvin is twice as high as a temperature of 150 Kelvin

Ratio Scale Ratio scales are like interval scales except they have true zero points e.g. the Kelvin scale of temperature this scale has an absolute zero a temperature of 300 Kelvin is twice as high as a temperature of 150 Kelvin. Earth’s mean temperature is about 14o C (287o K), and it drops as a function of the earth-sun distance’s square root. Thus, doubling the distance results to a factor of ~1.4 decrease in temperature. The calculations should be made in Kelvin (287/1.4=205) resulting to a difference of 82 degrees. The new temperature would be -68o C and not 14/1.4=10o C

Classification to Ordinal Classes
Pattern classification addresses the issue of assigning objects to different categories called classes Most often those classes are of nominal scale discrete classes with no established relationship among them In some cases, additional information regarding the arrangement of the classes is available e.g. an order among the classes is exhibited in that case the predicted classes are of ordinal scale classification is bridged to metric regression in a setting called ranking learning or ordinal regression 10

Classification to Ordinal Classes
Pattern classification addresses the issue of assigning objects to different categories called classes Most often those classes are of nominal scale discrete classes with no established relationship among them In some cases, additional information regarding the arrangement of the classes is available e.g. an order among the classes is exhibited in that case the predicted classes are of ordinal scale classification is bridged to metric regression in a setting called ranking learning or ordinal regression. applied to variables measured on interval or ratio scales 11

State of the Art Ordinal regression problems have been addressed in both machine learning and statistics domain In Frank (2001) classes’ ordering was encoded by a set of nested binary classifiers the classification results were organized for prediction accordingly A constrained classification approach, based on binary classifiers, was proposed in Har-Peled (2003) A loss function between pair of ranks was used in Herbrich (2000) employing distribution independent methods Modifications of support vector machines have been proposed in Shashua (2003), Chu (2005), Pelckmans (2006) incorporating in the design of SVMs information regarding the order of the classes A probabilistic kernel approach to ordinal regression was proposed by Chu (2005) In McCullagh (1980) multinomial logistic regression is extended to apply to ordinal data by using cumulative probabilities proportional odds model proportional hazards model In Tutz (2003) generalized additive models were extended into a semi-parametric approach based on the maximization of penalized log likelihood choice of used parameters based on minimization of the Akaike criterion In Johnson (1999) sampling techniques were employed in order to apply Bayesian inference on parametric models for ordinal data In Krammer (2001) and Torra (2006) the ordinal values are transformed into numeric, and then standard metric regression analysis is performed

State of the Art Ordinal regression problems have been addressed in both machine learning and statistics domain Extending Binary Classifiers In Frank (2001) classes’ ordering was encoded by a set of nested binary classifiers the classification results were organized for prediction accordingly A constrained classification approach, based on binary classifiers, was proposed in Har-Peled (2003) A loss function between pair of ranks was used in Herbrich (2000) employing distribution independent methods Modifications of support vector machines have been proposed in Shashua (2003), Chu (2005), Pelckmans (2006) incorporating in the design of SVMs information regarding the order of the classes A probabilistic kernel approach to ordinal regression was proposed by Chu (2005) In McCullagh (1980) multinomial logistic regression is extended to apply to ordinal data by using cumulative probabilities proportional odds model proportional hazards model In Tutz (2003) generalized additive models were extended into a semi-parametric approach based on the maximization of penalized log likelihood choice of used parameters based on minimization of the Akaike criterion In Johnson (1999) sampling techniques were employed in order to apply Bayesian inference on parametric models for ordinal data In Krammer (2001) and Torra (2006) the ordinal values are transformed into numeric, and then standard metric regression analysis is performed

State of the Art Ordinal regression problems have been addressed in both machine learning and statistics domain Extending SVM Classifiers In Frank (2001) classes’ ordering was encoded by a set of nested binary classifiers the classification results were organized for prediction accordingly A constrained classification approach, based on binary classifiers, was proposed in Har-Peled (2003) A loss function between pair of ranks was used in Herbrich (2000) employing distribution independent methods Modifications of support vector machines have been proposed in Shashua (2003), Chu (2005), Pelckmans (2006) incorporating in the design of SVMs information regarding the order of the classes A probabilistic kernel approach to ordinal regression was proposed by Chu (2005) In McCullagh (1980) multinomial logistic regression is extended to apply to ordinal data by using cumulative probabilities proportional odds model proportional hazards model In Tutz (2003) generalized additive models were extended into a semi-parametric approach based on the maximization of penalized log likelihood choice of used parameters based on minimization of the Akaike criterion In Johnson (1999) sampling techniques were employed in order to apply Bayesian inference on parametric models for ordinal data In Krammer (2001) and Torra (2006) the ordinal values are transformed into numeric, and then standard metric regression analysis is performed

State of the Art Ordinal regression problems have been addressed in both machine learning and statistics domain Explicitly Ordinal Approach In Frank (2001) classes’ ordering was encoded by a set of nested binary classifiers the classification results were organized for prediction accordingly A constrained classification approach, based on binary classifiers, was proposed in Har-Peled (2003) A loss function between pair of ranks was used in Herbrich (2000) employing distribution independent methods Modifications of support vector machines have been proposed in Shashua (2003), Chu (2005), Pelckmans (2006) incorporating in the design of SVMs information regarding the order of the classes A probabilistic kernel approach to ordinal regression was proposed by Chu (2005) In McCullagh (1980) multinomial logistic regression is extended to apply to ordinal data by using cumulative probabilities proportional odds model proportional hazards model In Tutz (2003) generalized additive models were extended into a semi-parametric approach based on the maximization of penalized log likelihood choice of used parameters based on minimization of the Akaike criterion In Johnson (1999) sampling techniques were employed in order to apply Bayesian inference on parametric models for ordinal data In Krammer (2001) and Torra (2006) the ordinal values are transformed into numeric, and then standard metric regression analysis is performed

State of the Art Ordinal regression problems have been addressed in both machine learning and statistics domain Treat Ordinal Data as Numeric In Frank (2001) classes’ ordering was encoded by a set of nested binary classifiers the classification results were organized for prediction accordingly A constrained classification approach, based on binary classifiers, was proposed in Har-Peled (2003) A loss function between pair of ranks was used in Herbrich (2000) employing distribution independent methods Modifications of support vector machines have been proposed in Shashua (2003), Chu (2005), Pelckmans (2006) incorporating in the design of SVMs information regarding the order of the classes A probabilistic kernel approach to ordinal regression was proposed by Chu (2005) In McCullagh (1980) multinomial logistic regression is extended to apply to ordinal data by using cumulative probabilities proportional odds model proportional hazards model In Tutz (2003) generalized additive models were extended into a semi-parametric approach based on the maximization of penalized log likelihood choice of used parameters based on minimization of the Akaike criterion In Johnson (1999) sampling techniques were employed in order to apply Bayesian inference on parametric models for ordinal data In Krammer (2001) and Torra (2006) the ordinal values are transformed into numeric, and then standard metric regression analysis is performed

State of the Art Ordinal regression problems have been addressed in both machine learning and statistics domain Treat Ordinal Data as Numeric In Frank (2001) classes’ ordering was encoded by a set of nested binary classifiers the classification results were organized for prediction accordingly A constrained classification approach, based on binary classifiers, was proposed in Har-Peled (2003) A loss function between pair of ranks was used in Herbrich (2000) employing distribution independent methods Modifications of support vector machines have been proposed in Shashua (2003), Chu (2005), Pelckmans (2006) incorporating in the design of SVMs information regarding the order of the classes A probabilistic kernel approach to ordinal regression was proposed by Chu (2005) In McCullagh (1980) multinomial logistic regression is extended to apply to ordinal data by using cumulative probabilities proportional odds model proportional hazards model In Tutz (2003) generalized additive models were extended into a semi-parametric approach based on the maximization of penalized log likelihood choice of used parameters based on minimization of the Akaike criterion In Johnson (1999) sampling techniques were employed in order to apply Bayesian inference on parametric models for ordinal data In Krammer (2001) and Torra (2006) the ordinal values are transformed into numeric, and then standard metric regression analysis is performed Ordinary Least Squares will be implied when referring to Metric Regression

Generalized Linear Models
GLMs are a generalization of the OLS regression were formulated as a way of unifying under one framework linear regression logistic regression Poisson regression a general algorithm for maximum likelihood estimation in all these models has been developed According to GLM theory a linear predictor is related the distribution function of the dependent variables through a link function each outcome of the dependent variables, Y, is assumed to be generated from a particular exponential-type probability density function Normal, Binomial, Poisson distributions, etc The mean, μ, of the distribution depends on the independent variables, x, through: , where E{Y} is the expected value of Y; g is the link function; b are the unknown weights of the linear model The unknown weights b, called also regression coefficients, are typically estimated with maximum likelihood or Bayesian techniques

Generalized Linear Models
GLMs are a generalization of the OLS regression were formulated as a way of unifying under one framework linear regression logistic regression Poisson regression a general algorithm for maximum likelihood estimation in all these models has been developed According to GLM theory a linear predictor is related the distribution function of the dependent variables through a link function each outcome of the dependent variables, Y, is assumed to be generated from a particular exponential-type probability density function Normal, Binomial, Poisson distributions, etc The mean, μ, of the distribution depends on the independent variables, x, through: , where E{Y} is the expected value of Y; g is the link function; b are the unknown weights of the linear model The unknown weights b, called also regression coefficients, are typically estimated with maximum likelihood or Bayesian techniques In case Y follows the Normal distribution and g is the identity function, the GLM is the standard linear regression model

Generalized Linear Models
GLMs are a generalization of the OLS regression were formulated as a way of unifying under one framework linear regression logistic regression Poisson regression a general algorithm for maximum likelihood estimation in all these models has been developed According to GLM theory a linear predictor is related the distribution function of the dependent variables through a link function each outcome of the dependent variables, Y, is assumed to be generated from a particular exponential-type probability density function Normal, Binomial, Poisson distributions, etc The mean, μ, of the distribution depends on the independent variables, x, through: , where E{Y} is the expected value of Y; g is the link function; b are the unknown weights of the linear model The unknown weights b, called also regression coefficients, are typically estimated with maximum likelihood or Bayesian techniques In the context of this presentation x corresponds to feature vectors and Y to classes

Ordinary Least Squares
The simplest and very popular GLM The distribution function is the normal distribution with constant variance and the link function is the identity Unlike most other GLMs, the maximum likelihood estimates of the linear weights are provided in a closed form solution X is the matrix consisting of all available feature vectors x Y is the vector consisting of the observed values of the dependent variables Y The model’s linear weights b are given by

Ordinary Least Squares (cont.)
OLS is designed to process interval or ratio variables OLS estimates are likely to be satisfactory from a statistical perspective when an ordinal level variable is examined if it is measured in a relatively high number of ascending categories if it can be assumed that the interval each category represents, is the same as the prior interval Thus, OLS can be applied to ordinal measurements treated as if they were interval it is most likely that some of the assumptions of the Gauss-Markov theorem are not met and the regression is not the Best Linear Unbiased Estimator

Ordinal Logistic Regression
Explicitly takes into account an ordered categorical dependent variable and does not assume any specific distance among the categories Different regression models that can be applied in case of ordinal measurements are proposed the proportional odds model is assumed Like in multinomial logistic regression (MLR), in OLR a multinomial distribution is assumed the logit is selected as the link function The main difference between MLR and OLR is that rather than estimating the probability of a single category, OLR estimates a cumulative probability i.e. the probability that the outcome is equal to or less than the category of interest c

Ordinal Logistic Regression
Explicitly takes into account an ordered categorical dependent variable and does not assume any specific distance among the categories Different regression models that can be applied in case of ordinal measurements are proposed the proportional odds model is assumed Like in multinomial logistic regression (MLR), in OLR a multinomial distribution is assumed the logit is selected as the link function The main difference between MLR and OLR is that rather than estimating the probability of a single category, OLS estimates a cumulative probability i.e. the probability that the outcome is equal to or less than the category of interest c c denotes the integer values used to label the classes

Ordinal Logistic Regression (cont.)
Using the Logit equation, the probabilities for each instance belonging to each class can be estimated the proportional odds model employs the cumulative probability’s logit equation The threshold values are different for each category The weights of the linear model contained in vector b are assumed to remain constant for every category A Log-Likelihood function (LL) is created and the parameter values that maximize that function are estimated using computational methods

Linear Classifier of Ordinal Classes
Numerical mapping of the K ordered classes into real numbers Classification is based on the assumption of a linear relationship between the numerical input vectors and the numerical values assigned to the ordered classes A linear output y is produced as the dot product of input vector x and vector b containing the weights of the linear model The output o derives as the class ωj assigned with the numerical value j that is the nearest to the linear output y. j is given by In case of metric regression a numerical mapping is needed and the results do not correspond to probabilities

Linear Classifier of Ordinal Classes
Performs numerical mapping of the K ordered classes into real numbers Classification is based on the assumption of a linear relationship between the numerical input vectors and the numerical values assigned to the ordered classes A linear output y is produced as the dot product of input vector x and vector b containing the weights of the linear model The output o derives as the class ωj assigned with the numerical value j that is the nearest to the linear output y. j is given by In case of metric regression a numerical mapping is needed and the results do not correspond to probabilities

Training LCOC-the naïve case
Arbitrary consequent numbers are assigned to the ordered classes: The linear output of the classifier is xb where vector b has been estimated by minimizing the Sum of Squared Errors (SSE) matrix X is the design matrix consisting of all available input vectors, t denotes the vector of the corresponding targets Then

Training LCOC-the proposed case
Target vector t is decomposed into a product of a known matrix S coding the target classes of the training samples and a parameter vector z of elements containing the unknown numerical values assigned to the K classes SSE becomes where SSE minimization revisited Least Squares Ordinal Classification (LSOC)

Training LCOC-the proposed case
Target vector t is decomposed into a product of a known matrix S coding the target classes of the training samples and a parameter vector z of elements containing the unknown numerical values assigned to the K classes SSE becomes where SSE minimization revisited Least Squares Ordinal Classification (LSOC) A1, AK selection does not affect the classification results

Training LCOC-the proposed case (cont.)
Since SSE is quadratic with respect to b and z, setting the partial derivatives of SSE to zero results to where , if the estimated z parameters were also employed by OLS the same b parameters would have been estimated by both training methods the estimated ζ values are in fact the intra-class average values of the linear outputs By substituting in the second equation the b vector given in the first the system of linear equations becomes Least Squares Ordinal Classification (LSOC)

Invariant Error Measure
When the numerical values of the classes are not fixed the classification results do not depend only on the magnitude of the error, but also on the distance among the classes Proposed measure that is also minimized by LSOC training method However unlike SSE Takes into account the distance between the classes is invariant to the selection of the bounding values A1 and AK since 32

Experimental Evaluation
Both synthetic and real datasets are examined Synthetic input vectors were produced by means of a random number generator arbitrary linear model produces linear targets quantizing linear targets produces class targets quantization levels correspond to ordered classes initial error introduced into the linear model only by quantization the performance of the proposed training method was also assessed in case of weaker linear dependency Additive White Gaussian Noise (AWGN) has been introduced into the linear model before quantization Real datasets involve visual inspection of seam specimen classified to five grades of quality the critical assumption of linear dependency is unverified if not valid, the classification accuracy of the LSOC is anticipated to be as poor as the one of OLS or even worse the produced results were also compared to those of Ordinal Logistic Regression (OLR) OLR yields a good choice for comparison, since its model employs the same number of parameters with those of LSOC however, OLR relies on computational methods to estimate these parameters, whereas LSOC employs a closed form solution

Synthetic Datasets Using a uniform random generator were artificially generated dimensional input vectors the vectors were augment by adding an extra unit element grouped into a design matrix of size 1000×6 6 arbitrary values were randomly selected as the weights of the linear model the design matrix was multiplied with the weights’ vector and the vector of the linear targets has been created consisting of 1000 values linearly dependent on the corresponding input vectors the elements of the linear targets’ vector were positioned in monotonically increasing order by rearranging accordingly the rows of matrix The 1st Synthetic Dataset contains 10 ordered classes with 100 input vectors in each class the 1000 input vectors were grouped together in hundreds the first 100 input vectors of matrix were classified to the first class, and so on until the 10th class The 2nd Synthetic dataset used the same design matrix and vector of linear weights the 1st and the 2nd class were assigned with 300 input vectors each the 8 remaining classes were assigned with 50 vectors each the class targets of the input vectors are different for the second dataset

Synthetic Datasets Euclidian Distance of z values from the norm. centers 1st dataset LSOC: 0.05 OLS: 2nd dataset LSOC: 0.54 OLS:

Synthetic Datasets Euclidian Distance of z values from the norm. centers 1st dataset LSOC: 0.05 OLS: 2nd dataset LSOC: 0.54 OLS:

Synthetic Datasets Euclidian Distance of z values from the norm. centers 1st dataset LSOC: 0.05 OLS: 2nd dataset LSOC: 0.54 OLS:

Synthetic Datasets Euclidian Distance of z values from the norm. centers 1st dataset LSOC: 0.05 OLS: 2nd dataset LSOC: 0.54 OLS:

Synthetic Datasets Euclidian Distance of z values from the norm. centers 1st dataset LSOC: 0.05 OLS: 2nd dataset LSOC: 0.54 OLS:

Synthetic Datasets R2 denotes the coefficient of determination
CA denotes Classification Accuracy V denotes 10-fold Cross-Validation

Synthetic Datasets 1st synthetic dataset 2nd synthetic dataset
AWGN has been introduced into the estimation of the linear targets The Mean Distance (MD) among the classes has been calculated the standard deviation of the added noise was set to be 5% of MD to 100% of MD with a 5% of MD increment Thus, for each dataset 20 different cases with increasing ratios were constructed and tested

Real Datasets Image database of 325 seam specimens, belonging to three different types of fabric Specimen size approximately 20×4 cm A committee of three experts labelled each specimen by assigning a grade denoting the quality of the seam 1 (worse) to 5 (best) For each specimen three ratings are assigned the median is selected as the actual grade the average agreement of each expert to the median ratings has been 80.3% ±1.8%. 3 different feature sets all based on intensity curves Roughness features FFT features Fractal features 4 different features in each set

Textile Seam Quality Control
ISO 7700 Standard

(a) (b) (c) (d) (e) Pre-process (a) (b) (c) (d) (e)

Intensity Curves

Intensity Curves

I (2) I (1) I (3) I (4) Intensity Curves γραμμή εικόνας

Intensity Curves γραμμή εικόνας I (2) I (1) I (3) I (4) S (1) S (2)
Mean intensity values (column-wise)

Roughness Features Moving Average filter Intensity Deviation Feature
Extraction Moving Average filter Intensity Deviation

FFT Features Feature Extraction
Using the first 40 FFT coefficients produced from each intensity curve Applying averaging using different window centers and sizes Selecting the window settings that present the highest correlation with the quality grades

Fractal Features Feature Extraction
Modified Pixel Dilation method (MPD) is applied to an intensity curve estimating its fractal dimensions Each intensity curve is treated as binary image n successive dilation operations are performed The area S(n) occupied by the produced curves and the area E(n) occupied by a single pixel that has been dilated by the same morphological operator are calculated for different values of n The relationship among the fractal dimension D, S(n), and E(n), is given by

Roughness Results LSOC improves results of the naïve case
outperforms OLS if >20 training samples LSOC generalize better than OLR in limited training set outperforms OLR if <45 training samples

FFT Results Similar to RF results
LSOC’s performance is even closer to OLR’s indicates stronger linear relationship between FFT features and quality grades

Fractal Results Different from RF or FFT results
Both metric methods are outperformed by OLR even for limited training sets LSOC’s performance is slightly worse than OLS’s Indicate weak linear relationship between Fractal features and quality grades

Summarizing Results In case of the synthetic datasets
the linear dependency between feature vectors and class values is established proposed method produces significantly better results than the naïve approach the difference in the performance is even greater in case of the 2nd synthetic dataset, where the intervals between the classes are less uniform In case of real datasets OLR presents the highest performance for all feature sets provided a large number of training samples is available LSOC presents, for almost every case, higher classification accuracy than the one using OLS If the linear relation between the inputs and the outputs is not very strong, the proposed method is not likely to outperform the naïve approach In such cases, however the performance of both classifiers is very poor anyway, thus other approaches, like OLR, should be considered

Conclusion A common strategy for selecting an appropriate classification method for a specific task is start with the simplest one and check its performance if the performance is not adequate more complex methods are considered The OLS regression approach is by far the simplest of all ordinal classification methods presenting computational efficiency ease of implementation In the naïve case arbitrary numerical values are assigned to the ordered classes inappropriate numerical mapping can result to poor classification performance LSOC, estimates an optimal mapping using a novel goodness of fit measure like in OLS a linear model is employed the model’s parameters derive through a closed form expression the computational efficiency of the naïve approach is retained

Conclusion (cont.) In the experimental evaluation it was demonstrated that if LSOC is used instead of OLS the classification accuracy can be significantly increased the accuracy of 76 % and 39 % presented by OLS in case of the 1st and 2nd synthetic dataset was increased to 93 % and 83 %, respectively, in case of LSOC a similar trend was present both when Gaussian noise was added to the synthetic datasets and in case of real datasets. LSOC was also compared to OLR more sophisticated method explicitly designed to handle ordinal data even though OLR achieves higher accuracy when a large number of training samples is employed, it is outperformed by LSOC when this number decreases LSOC can be an attractive choice in case a limited number of training samples are available due to its computational simplicity LSOC is also an attractive choice if speed of calculations is an issue. In future work the performance of LSOC can be further investigated in case non-linear kernels are applied to the original input vectors transferring them in a higher-dimensional space where linearity holds

“Remember that all models are wrong; the practical question is how wrong do they have to be to not be useful.” George E.P. Box

References [1] P. McCullagh, “Regression models for ordinal data,” Journal of the Royal Statistical Society B, vol. 42, no. 2, pp. 109–142, 1980. [2] G. Tutz, “Generalized semiparametrically structured ordinal models,” Biometrics, vol. 59, no. 2, pp. 263–273, June 2003. [3] T. Hastie and R. Tibshirani, Generalized Additive Models, Chapman and Hall, London, 1990. [4] V. Johnson and J. Albert, Ordinal Data Modeling, Springer-Verlag, 1999. [5] E. Frank and M. Hall, “A simple approach to ordinal classification,” Proc. European Conf. on Machine Learning, pp. 145–165, 2001. [6] S. Har-Peled, D. Roth, and D. Zimak, “Constraint classification: A new approach to multiclass classification and ranking,” Advances in Neural Information Processing Systems 15, S. Thrun, S. Becker and K. Obermayer, eds, MIT press, pp. 785–792, 2003. [7] R. Herbrich, T. Graepel, and K. Obermayer, “Large margin rank boundaries for ordinal regression,” Advances in Large Margin Classifiers, A. Smola, P. Bartlett, B. Schölkopf, and D. Schuurmans, eds, MIT Press, pp , 2000. [8] V. Vapnik, The Nature of Statistical Learning Theory, New York, Springer-Verlag, 1995. [9] B. Scholkopf, C. Burges, and A. Smola, eds., Advances in Kernel Methods: Support Vector Learning, MIT Press, Cambridge, MA, 1999. [10] A. Shashua and A. Levin, “Ranking with large margin principle: two approaches”, Advances in Neural Information Processing Systems 15, S. Thrun S. Becker and K. Obermayer, eds, MIT Press, pp. 937–944, 2003. [11] W. Chu and S. Keerthi, “New approaches to support vector ordinal regression” Technical Report, Yahoo! Research Labs, 2005. [12] K. Pelckmans, P. Karsmakers, J. Suykens, B. De Moor, “Ordinal Least Squares Support Vector Machines - a Discriminant Analysis Approach,” Proc. of the Machine Learning for Signal Processing (MLSP 2006), Maynooth, Ireland, Sep [13] W. Chu and Z. Ghahramani, “Gaussian processes for ordinal regression,” Journal of Machine Learning Research, vol. 6, pp , July [14] S. Kramer, G. Widmer, B. Pfahringer, and M. DeGroeve, “Prediction of ordinal classes using regression trees,” Fundamenta Informaticae, vol. 47, pp. 1–13, 2001. [15] J. Fox, Applied Regression Analysis, Linear Models, and Related Methods, Thousand Oaks. CA: Sage Publications, 1997, ISBN X. [16] V. Torra, J. Domingo-Ferrer, J. Mateo-Sanz, M. Ng, “Regression for ordinal variables without underlying continuous variables,” Information sciences, vol. 176, pp , 2006. [17] Kohavi, R., “A study of cross-validation and bootstrap for accuracy estimation and model selection,” Proc. of the 14th Int. Joint Conf. on Artificial Intelligence, pp. 1137–1143, 1995. [18] I.G. Mariolis, E.S. Dermatas, "Automated Assessment of Textile Seam Quality based on Surface Roughness Estimation." Journal of the Textile Institute, vol. 101, no. 7, pp , july 2010. [19] C. Bahlmann, G. Heidemann, and H. Ritter, “Artificial Neural Networks for Automated Quality Control of Textile Seams,” Pattern Recognition, vol 32, no 6, pp , 1999. [20] T. Mabuchi, and T. Aibara, “Automatic assessment of the appearance of seam puckers using fractal dimensions”, Trans. IEE of Japan, vol.119-C, no.4, pp.523–528, 1999. [21] E. Honda, M. Domon, and T. Sasaki, “A method for the determination of fractal dimensions of Sialographic images,” Invest. Radiology, vol. 26, pp , 1991. [22] I.G. Mariolis and E.S. Dermatas, "Automatic classification of seam pucker images based on ordinal quality grades", Pattern Analysis and Applications, Published online 04 October 2011, DOI /s y.