Download presentation
Presentation is loading. Please wait.
1
Kazuya Akimoto Piet Mondriaan
2
Salvador Dalí
3
Non-parametric non-linear classifiers Willem Melssen W.Melssen@science.ru.nl Institute for Molecules and Materials Analytical Chemistry & Chemometrics www.cac.science.ru.nl Radboud University Nijmegen
4
Non-parametric non-linear classifiers no assumptions regarding - mean - variance / covariance - normality of the distribution of the input data non-linear relationship between input data and the corresponding output (class membership) supervised techniques (input and output based)
5
Parametric and linear… LDA equal (co-)variance
6
Parametric and linear… LDA linear separable classes
7
…versus non-parametric, non-linear LDA???
8
Some powerful classifiers K Nearest Neighbours; Artificial Neural Networks; Support Vector Machines.
9
K Nearest Neighbours (KNN) non-parametric classifier; (no assumptions regarding normality) similarity based; (Euclidean distance, 1 - correlation) matching to a set of classified objects. (decision based on consensus criterion)
10
KNN modelling procedure use appropriate scaling of the selected training set; select a similarity measure (Euclidean distance); set the number of neighbours (K); construct similarity matrix for a new object (unknown class) and the objects in the training set; rank all similarity values in ascending order; generate the class membership list; consensus criterion determines the class; (e.g., the majority takes all) validation of K value (cross-validation, test set)
11
Select a representative training set X1 X2
12
Label the data points (supervised) X1 X2 class A class B
13
Classify a new object X1 X2 class A class B
14
One neighbour: K = 1 X1 X2 class A class B
15
K = 3 X1 X2 class A class B class A
16
K = 2 X1 X2 class A class B class A or B: undecided
17
K = 11 X1 X2 class A class B 5 A’s and 6 B’s: confidence?
18
Classification of brain tumours Collaboration with the department of radiology UMCN, Nijmegen; EC project eTumour Magnetic resonance imaging Voxel-wise in-vivo NMR spectroscopy Goal of the project: determination of type and grading of various brain tumours
19
Magnetic Resonance Imaging T1 weightedT2 weighted proton densitygadolinium ventricles (CSF) tumour grey+white matter skull
20
Construction of data set Tumour class Nr. of voxels per patient Total Healthy (volunteer) Healthy (patient) 30 32 37 43 20 20 22 14 142 76 CSF 18 9 26 3 17 17 1 9 100 Meningioma15 4 2948 Grade II 13 17 19 30 15 49 5 2 10 16 176 Grade III20 5 28 457 Grade IV7 6 14 29 10 1 370 669
21
MRI combined with MRS Image variablesQuantitated values
22
Average spectrum per tissue type PC1 (42.2%) PC2 (19.5%)
23
Results 10 random divisions of the data in a balanced way training set (2/3), test set (1/3): 10 different models LDA: 90.0% ± 2.0 [87.0 - 92.8] 0.1 sec KNN: 95.4% ± 1.0 [92.2 - 97.2]1.4 sec
24
Artificial Neural Networks (ANN) non-parametric, non-linear, adaptive; weights trained by an iterative learning procedure;
25
ANN architecture
27
Neuron or ‘unit’ weighted input (dendrites, synapses) neuron (soma) distribution of output (axon) summation (net)transfer function f(net)
28
Transfer functions exponentiallinearcompressive
29
An easy one: the ‘and’ problem X1X2Y 000 100 010 111 decision line
30
Two layer network (perceptron) sign(x1*w1 + x2*w2 – t) < 0 : class 0 sign(x1*w1 + x2*w2 – t) > 0 : class 1 Hey, this looks like LDA…
31
Logical ‘exclusive-or’ problem X1X2Y 000 101 011 110
32
No single decision line possible…
33
… but two lines will do
34
Multi-layer feed-forward ANN
35
Upper decision line
36
Lower decision line
37
Solution
38
How to get the weights: by learning 1.set network parameters (learning rate, number of hidden layers / units, transfer functions, etc); 2.initialise network weights randomly; 3.present an object; 4.calculate the ANN output; 5.adapt network weights to minimise output error; 6.repeat 3 – 5 for all training objects; 7.iterate until network converges / stop criterion; 8.evaluate network performance by an independent test set or by a cross-validation procedure.
39
Adapting the weights adapt weights to minimise the output error E weight changes controlled by the learning rate error back propagation (from output to input layer) Newton-Raphson, Levenberg-Marquardt, etc local minimum global minimum
40
Function of the hidden layer ??? (x, y) points on [0, 1] x [0, 1] grid specified output for the grid white: 0 black: 1
41
Output of hidden layer units unit output for the [0, 1] x [0, 1] grid Combining linear sub-solutions yields a non-linear classifier…
42
When to stop training? Error Iteration number training set test set over-fittingnot converged External validation set required to estimate the accuracy
43
Many solutions possible: not unique
44
Classification of brain tumours 10 random divisions of the data in a balanced way training set (2/3), test set (1/3): 10 different models LDA: 90.0% ± 2.0 [87.0 - 92.8] 0.1 sec KNN: 95.4% ± 1.0 [92.2 - 97.2]1.4 sec ANN: 93.2% ± 3.5 [86.4 - 97.7]316 sec
45
Support Vector Machines (SVMs) kernel-based classifier; transforms input space to a high-dimensional feature space; exploits Lagrange formalism for the best solution; binary (two-class) classifier.
46
A linear separable problem X1 X2 class B class A Goal: to find the optimal separating hyper plane
47
Optimal hyper plane X1 X2 class B class A no objects are allowed between boundaries, maximisation of distance: unique solution!
48
Support vectors X1 X2 class B class A support vectors
49
Crossing the borderlines… X1 X2 class B class A solution: penalise these objects
50
Lagrange equation Target Constraints
51
Oops! Target Constraints
52
Minimisation only support vectors have a non-zero α
53
SVM properties discriminant function: properties: –sparse solution –number of variables irrelevant (dual formalism) –global and unique model –extension to non-linear binary classifier
54
A non-linear 2D classification problem X1 X2 class B class A
55
Going from 2D to 3D in 2D space of the original variables: transformation to 3D feature space:
56
Data representation in feature space
57
In the feature space the non-linear problem becomes a linear one From: Belousov et al., Chemometrics and Intelligent Laboratory Systems 64 (2002) 15-25. separating plane
58
Classifiers in feature space linear: a priori: general: Kernel
59
Kernel functions Polynomial function Radial basis function (RBF) Pearson VII universal kernel function(PUK) Not every function is a valid kernel function (Mercer’s conditions)
60
How to construct a SVM model? 1.make a training, test and validation set; 2.set C value (regularisation constant); 3.select kernel function and its parameters; 4.construct kernel matrix for the training set; 5.make SVM model by quadratic programming; 6.evaluate performance for the test set; 7.repeat steps 2 – 6 (e.g. grid search, GA, Simplex); 8.determine accuracy of best model (validation).
61
Classification of brain tumours 10 random divisions of the data in a balanced way training set (2/3), test set (1/3): 10 different models LDA: 90.0% ± 2.0 [87.0 - 92.8] 0.1 sec KNN: 95.4% ± 1.0 [92.2 - 97.2]1.4 sec ANN: 93.2% ± 3.5 [86.4 - 97.7]316 sec SVM: 96.9% ± 0.9 [95.8 - 98.6]8 hours
62
Best SVM model (98.6%, test) true / predictHCSFG IIG IIIG IVMen -error (%) Healthy70000000 CSF03200000 Grade II00570000 Grade III10018005 Grade IV10012109 Meningioma00000160 β-error (%)300500
63
SVM is the best one but a bit confused true / predictHCSFG IIG IIIG IVMen -error (%) Healthy70000000 CSF03200000 Grade II00570000 Grade III10018005 Grade IV10012109 Meningioma00000160 β-error (%)300500
64
Here are the real problems (overlap) true / predictHCSFG IIG IIIG IVMen -error (%) Healthy70000000 CSF03200000 Grade II00570000 Grade III10018005 Grade IV10012109 Meningioma00000160 β-error (%)300500 KNN, ANN and other classifiers
65
The consumentenbond ratings... KNNANNSVM simplicity uniqueness performance / multi-class outliers # objects # variables speed
66
Acknowledgements Bülent Üstün (SVMs) Patrick Krooshof (eTumour examples)
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.