Presentation is loading. Please wait.

Presentation is loading. Please wait.

Kazuya Akimoto Piet Mondriaan. Salvador Dalí Non-parametric non-linear classifiers Willem Melssen Institute for Molecules and.

Similar presentations


Presentation on theme: "Kazuya Akimoto Piet Mondriaan. Salvador Dalí Non-parametric non-linear classifiers Willem Melssen Institute for Molecules and."— Presentation transcript:

1 Kazuya Akimoto Piet Mondriaan

2 Salvador Dalí

3 Non-parametric non-linear classifiers Willem Melssen W.Melssen@science.ru.nl Institute for Molecules and Materials Analytical Chemistry & Chemometrics www.cac.science.ru.nl Radboud University Nijmegen

4 Non-parametric non-linear classifiers no assumptions regarding - mean - variance / covariance - normality of the distribution of the input data non-linear relationship between input data and the corresponding output (class membership) supervised techniques (input and output based)

5 Parametric and linear… LDA equal (co-)variance

6 Parametric and linear… LDA linear separable classes

7 …versus non-parametric, non-linear LDA???

8 Some powerful classifiers K Nearest Neighbours; Artificial Neural Networks; Support Vector Machines.

9 K Nearest Neighbours (KNN) non-parametric classifier; (no assumptions regarding normality) similarity based; (Euclidean distance, 1 - correlation) matching to a set of classified objects. (decision based on consensus criterion)

10 KNN modelling procedure use appropriate scaling of the selected training set; select a similarity measure (Euclidean distance); set the number of neighbours (K); construct similarity matrix for a new object (unknown class) and the objects in the training set; rank all similarity values in ascending order; generate the class membership list; consensus criterion determines the class; (e.g., the majority takes all) validation of K value (cross-validation, test set)

11 Select a representative training set X1 X2

12 Label the data points (supervised) X1 X2 class A class B

13 Classify a new object X1 X2 class A class B

14 One neighbour: K = 1 X1 X2 class A class B

15 K = 3 X1 X2 class A class B class A

16 K = 2 X1 X2 class A class B class A or B: undecided

17 K = 11 X1 X2 class A class B 5 A’s and 6 B’s: confidence?

18 Classification of brain tumours Collaboration with the department of radiology UMCN, Nijmegen; EC project eTumour Magnetic resonance imaging Voxel-wise in-vivo NMR spectroscopy Goal of the project: determination of type and grading of various brain tumours

19 Magnetic Resonance Imaging T1 weightedT2 weighted proton densitygadolinium ventricles (CSF) tumour grey+white matter skull

20 Construction of data set Tumour class Nr. of voxels per patient Total Healthy (volunteer) Healthy (patient) 30 32 37 43 20 20 22 14 142 76 CSF 18 9 26 3 17 17 1 9 100 Meningioma15 4 2948 Grade II 13 17 19 30 15 49 5 2 10 16 176 Grade III20 5 28 457 Grade IV7 6 14 29 10 1 370 669

21 MRI combined with MRS Image variablesQuantitated values

22 Average spectrum per tissue type PC1 (42.2%) PC2 (19.5%)

23 Results 10 random divisions of the data in a balanced way training set (2/3), test set (1/3): 10 different models LDA: 90.0% ± 2.0 [87.0 - 92.8] 0.1 sec KNN: 95.4% ± 1.0 [92.2 - 97.2]1.4 sec

24 Artificial Neural Networks (ANN) non-parametric, non-linear, adaptive; weights trained by an iterative learning procedure;

25 ANN architecture

26

27 Neuron or ‘unit’ weighted input (dendrites, synapses) neuron (soma) distribution of output (axon) summation (net)transfer function f(net)

28 Transfer functions exponentiallinearcompressive

29 An easy one: the ‘and’ problem X1X2Y 000 100 010 111 decision line

30 Two layer network (perceptron) sign(x1*w1 + x2*w2 – t) < 0 : class 0 sign(x1*w1 + x2*w2 – t) > 0 : class 1 Hey, this looks like LDA…

31 Logical ‘exclusive-or’ problem X1X2Y 000 101 011 110

32 No single decision line possible…

33 … but two lines will do

34 Multi-layer feed-forward ANN

35 Upper decision line

36 Lower decision line

37 Solution

38 How to get the weights: by learning 1.set network parameters (learning rate, number of hidden layers / units, transfer functions, etc); 2.initialise network weights randomly; 3.present an object; 4.calculate the ANN output; 5.adapt network weights to minimise output error; 6.repeat 3 – 5 for all training objects; 7.iterate until network converges / stop criterion; 8.evaluate network performance by an independent test set or by a cross-validation procedure.

39 Adapting the weights adapt weights to minimise the output error E weight changes controlled by the learning rate error back propagation (from output to input layer) Newton-Raphson, Levenberg-Marquardt, etc local minimum global minimum

40 Function of the hidden layer ??? (x, y) points on [0, 1] x [0, 1] grid specified output for the grid white: 0 black: 1

41 Output of hidden layer units unit output for the [0, 1] x [0, 1] grid Combining linear sub-solutions yields a non-linear classifier…

42 When to stop training? Error Iteration number training set test set over-fittingnot converged External validation set required to estimate the accuracy

43 Many solutions possible: not unique

44 Classification of brain tumours 10 random divisions of the data in a balanced way training set (2/3), test set (1/3): 10 different models LDA: 90.0% ± 2.0 [87.0 - 92.8] 0.1 sec KNN: 95.4% ± 1.0 [92.2 - 97.2]1.4 sec ANN: 93.2% ± 3.5 [86.4 - 97.7]316 sec

45 Support Vector Machines (SVMs) kernel-based classifier; transforms input space to a high-dimensional feature space; exploits Lagrange formalism for the best solution; binary (two-class) classifier.

46 A linear separable problem X1 X2 class B class A Goal: to find the optimal separating hyper plane

47 Optimal hyper plane X1 X2 class B class A no objects are allowed between boundaries, maximisation of distance: unique solution!

48 Support vectors X1 X2 class B class A support vectors

49 Crossing the borderlines… X1 X2 class B class A solution: penalise these objects

50 Lagrange equation Target Constraints

51 Oops! Target Constraints

52 Minimisation only support vectors have a non-zero α

53 SVM properties discriminant function: properties: –sparse solution –number of variables irrelevant (dual formalism) –global and unique model –extension to non-linear binary classifier

54 A non-linear 2D classification problem X1 X2 class B class A

55 Going from 2D to 3D in 2D space of the original variables: transformation to 3D feature space:

56 Data representation in feature space

57 In the feature space the non-linear problem becomes a linear one From: Belousov et al., Chemometrics and Intelligent Laboratory Systems 64 (2002) 15-25. separating plane

58 Classifiers in feature space linear: a priori: general: Kernel

59 Kernel functions Polynomial function Radial basis function (RBF) Pearson VII universal kernel function(PUK) Not every function is a valid kernel function (Mercer’s conditions)

60 How to construct a SVM model? 1.make a training, test and validation set; 2.set C value (regularisation constant); 3.select kernel function and its parameters; 4.construct kernel matrix for the training set; 5.make SVM model by quadratic programming; 6.evaluate performance for the test set; 7.repeat steps 2 – 6 (e.g. grid search, GA, Simplex); 8.determine accuracy of best model (validation).

61 Classification of brain tumours 10 random divisions of the data in a balanced way training set (2/3), test set (1/3): 10 different models LDA: 90.0% ± 2.0 [87.0 - 92.8] 0.1 sec KNN: 95.4% ± 1.0 [92.2 - 97.2]1.4 sec ANN: 93.2% ± 3.5 [86.4 - 97.7]316 sec SVM: 96.9% ± 0.9 [95.8 - 98.6]8 hours

62 Best SVM model (98.6%, test) true / predictHCSFG IIG IIIG IVMen  -error (%) Healthy70000000 CSF03200000 Grade II00570000 Grade III10018005 Grade IV10012109 Meningioma00000160 β-error (%)300500

63 SVM is the best one but a bit confused true / predictHCSFG IIG IIIG IVMen  -error (%) Healthy70000000 CSF03200000 Grade II00570000 Grade III10018005 Grade IV10012109 Meningioma00000160 β-error (%)300500

64 Here are the real problems (overlap) true / predictHCSFG IIG IIIG IVMen  -error (%) Healthy70000000 CSF03200000 Grade II00570000 Grade III10018005 Grade IV10012109 Meningioma00000160 β-error (%)300500 KNN, ANN and other classifiers

65 The consumentenbond ratings... KNNANNSVM simplicity  uniqueness  performance /  multi-class  outliers  # objects  # variables  speed 

66 Acknowledgements Bülent Üstün (SVMs) Patrick Krooshof (eTumour examples)


Download ppt "Kazuya Akimoto Piet Mondriaan. Salvador Dalí Non-parametric non-linear classifiers Willem Melssen Institute for Molecules and."

Similar presentations


Ads by Google