Kazuya Akimoto Piet Mondriaan. Salvador Dalí Non-parametric non-linear classifiers Willem Melssen Institute for Molecules and.

Kazuya Akimoto Piet Mondriaan

Salvador Dalí

Non-parametric non-linear classifiers Willem Melssen W.Melssen@science.ru.nl Institute for Molecules and Materials Analytical Chemistry & Chemometrics www.cac.science.ru.nl Radboud University Nijmegen

Non-parametric non-linear classifiers no assumptions regarding - mean - variance / covariance - normality of the distribution of the input data non-linear relationship between input data and the corresponding output (class membership) supervised techniques (input and output based)

Parametric and linear… LDA equal (co-)variance

Parametric and linear… LDA linear separable classes

…versus non-parametric, non-linear LDA???

Some powerful classifiers K Nearest Neighbours; Artificial Neural Networks; Support Vector Machines.

K Nearest Neighbours (KNN) non-parametric classifier; (no assumptions regarding normality) similarity based; (Euclidean distance, 1 - correlation) matching to a set of classified objects. (decision based on consensus criterion)

KNN modelling procedure use appropriate scaling of the selected training set; select a similarity measure (Euclidean distance); set the number of neighbours (K); construct similarity matrix for a new object (unknown class) and the objects in the training set; rank all similarity values in ascending order; generate the class membership list; consensus criterion determines the class; (e.g., the majority takes all) validation of K value (cross-validation, test set)

Select a representative training set X1 X2

Label the data points (supervised) X1 X2 class A class B

Classify a new object X1 X2 class A class B

One neighbour: K = 1 X1 X2 class A class B

K = 3 X1 X2 class A class B class A

K = 2 X1 X2 class A class B class A or B: undecided

K = 11 X1 X2 class A class B 5 A’s and 6 B’s: confidence?

Classification of brain tumours Collaboration with the department of radiology UMCN, Nijmegen; EC project eTumour Magnetic resonance imaging Voxel-wise in-vivo NMR spectroscopy Goal of the project: determination of type and grading of various brain tumours

Magnetic Resonance Imaging T1 weightedT2 weighted proton densitygadolinium ventricles (CSF) tumour grey+white matter skull

Construction of data set Tumour class Nr. of voxels per patient Total Healthy (volunteer) Healthy (patient) 30 32 37 43 20 20 22 14 142 76 CSF 18 9 26 3 17 17 1 9 100 Meningioma15 4 2948 Grade II 13 17 19 30 15 49 5 2 10 16 176 Grade III20 5 28 457 Grade IV7 6 14 29 10 1 370 669

MRI combined with MRS Image variablesQuantitated values

Average spectrum per tissue type PC1 (42.2%) PC2 (19.5%)

Results 10 random divisions of the data in a balanced way training set (2/3), test set (1/3): 10 different models LDA: 90.0% ± 2.0 [87.0 - 92.8] 0.1 sec KNN: 95.4% ± 1.0 [92.2 - 97.2]1.4 sec

Artificial Neural Networks (ANN) non-parametric, non-linear, adaptive; weights trained by an iterative learning procedure;

ANN architecture

Neuron or ‘unit’ weighted input (dendrites, synapses) neuron (soma) distribution of output (axon) summation (net)transfer function f(net)

Transfer functions exponentiallinearcompressive

An easy one: the ‘and’ problem X1X2Y 000 100 010 111 decision line

Two layer network (perceptron) sign(x1*w1 + x2*w2 – t) < 0 : class 0 sign(x1*w1 + x2*w2 – t) > 0 : class 1 Hey, this looks like LDA…

Logical ‘exclusive-or’ problem X1X2Y 000 101 011 110

No single decision line possible…

… but two lines will do

Multi-layer feed-forward ANN

Upper decision line

Lower decision line

Solution

How to get the weights: by learning 1.set network parameters (learning rate, number of hidden layers / units, transfer functions, etc); 2.initialise network weights randomly; 3.present an object; 4.calculate the ANN output; 5.adapt network weights to minimise output error; 6.repeat 3 – 5 for all training objects; 7.iterate until network converges / stop criterion; 8.evaluate network performance by an independent test set or by a cross-validation procedure.

Adapting the weights adapt weights to minimise the output error E weight changes controlled by the learning rate error back propagation (from output to input layer) Newton-Raphson, Levenberg-Marquardt, etc local minimum global minimum

Function of the hidden layer ??? (x, y) points on [0, 1] x [0, 1] grid specified output for the grid white: 0 black: 1

Output of hidden layer units unit output for the [0, 1] x [0, 1] grid Combining linear sub-solutions yields a non-linear classifier…

When to stop training? Error Iteration number training set test set over-fittingnot converged External validation set required to estimate the accuracy

Many solutions possible: not unique

Classification of brain tumours 10 random divisions of the data in a balanced way training set (2/3), test set (1/3): 10 different models LDA: 90.0% ± 2.0 [87.0 - 92.8] 0.1 sec KNN: 95.4% ± 1.0 [92.2 - 97.2]1.4 sec ANN: 93.2% ± 3.5 [86.4 - 97.7]316 sec

Support Vector Machines (SVMs) kernel-based classifier; transforms input space to a high-dimensional feature space; exploits Lagrange formalism for the best solution; binary (two-class) classifier.

A linear separable problem X1 X2 class B class A Goal: to find the optimal separating hyper plane

Optimal hyper plane X1 X2 class B class A no objects are allowed between boundaries, maximisation of distance: unique solution!

Support vectors X1 X2 class B class A support vectors

Crossing the borderlines… X1 X2 class B class A solution: penalise these objects

Lagrange equation Target Constraints

Oops! Target Constraints

Minimisation only support vectors have a non-zero α

SVM properties discriminant function: properties: –sparse solution –number of variables irrelevant (dual formalism) –global and unique model –extension to non-linear binary classifier

A non-linear 2D classification problem X1 X2 class B class A

Going from 2D to 3D in 2D space of the original variables: transformation to 3D feature space:

Data representation in feature space

In the feature space the non-linear problem becomes a linear one From: Belousov et al., Chemometrics and Intelligent Laboratory Systems 64 (2002) 15-25. separating plane

Classifiers in feature space linear: a priori: general: Kernel

Kernel functions Polynomial function Radial basis function (RBF) Pearson VII universal kernel function(PUK) Not every function is a valid kernel function (Mercer’s conditions)

How to construct a SVM model? 1.make a training, test and validation set; 2.set C value (regularisation constant); 3.select kernel function and its parameters; 4.construct kernel matrix for the training set; 5.make SVM model by quadratic programming; 6.evaluate performance for the test set; 7.repeat steps 2 – 6 (e.g. grid search, GA, Simplex); 8.determine accuracy of best model (validation).

Classification of brain tumours 10 random divisions of the data in a balanced way training set (2/3), test set (1/3): 10 different models LDA: 90.0% ± 2.0 [87.0 - 92.8] 0.1 sec KNN: 95.4% ± 1.0 [92.2 - 97.2]1.4 sec ANN: 93.2% ± 3.5 [86.4 - 97.7]316 sec SVM: 96.9% ± 0.9 [95.8 - 98.6]8 hours

Best SVM model (98.6%, test) true / predictHCSFG IIG IIIG IVMen  -error (%) Healthy70000000 CSF03200000 Grade II00570000 Grade III10018005 Grade IV10012109 Meningioma00000160 β-error (%)300500

SVM is the best one but a bit confused true / predictHCSFG IIG IIIG IVMen  -error (%) Healthy70000000 CSF03200000 Grade II00570000 Grade III10018005 Grade IV10012109 Meningioma00000160 β-error (%)300500

Here are the real problems (overlap) true / predictHCSFG IIG IIIG IVMen  -error (%) Healthy70000000 CSF03200000 Grade II00570000 Grade III10018005 Grade IV10012109 Meningioma00000160 β-error (%)300500 KNN, ANN and other classifiers

The consumentenbond ratings... KNNANNSVM simplicity  uniqueness  performance /  multi-class  outliers  # objects  # variables  speed 

Acknowledgements Bülent Üstün (SVMs) Patrick Krooshof (eTumour examples)

Kazuya Akimoto Piet Mondriaan. Salvador Dalí Non-parametric non-linear classifiers Willem Melssen Institute for Molecules and.

Similar presentations

Presentation on theme: "Kazuya Akimoto Piet Mondriaan. Salvador Dalí Non-parametric non-linear classifiers Willem Melssen Institute for Molecules and."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Kazuya Akimoto Piet Mondriaan. Salvador Dalí Non-parametric non-linear classifiers Willem Melssen Institute for Molecules and.

Similar presentations

Presentation on theme: "Kazuya Akimoto Piet Mondriaan. Salvador Dalí Non-parametric non-linear classifiers Willem Melssen Institute for Molecules and."— Presentation transcript:

Similar presentations

About project

Feedback