Support Vector Neural Training Włodzisław Duch Department of Informatics Nicolaus Copernicus University, Toruń, Poland School of Computer Engineering,

Support Vector Neural Training Włodzisław Duch Department of Informatics Nicolaus Copernicus University, Toruń, Poland School of Computer Engineering, Nanyang Technological University, Singapore Google: Duch ICANN Warsaw, Sept. 2005

PlanPlan Main idea. Support Vector Machines and active learning. Neural Networks and Support Vectors Pedagogical example Results on real data

Main idea What data should be used for training? Given conditional distributions P(X|C) for dengue fever for: World population. ASEAN countries. Singapore only. Choa Chu Kang only? Which distributions should we use? If we know that X is from Choa Chu Kang and P(X|C) is reliable local knowledge should be used. If X comes from region close to decision borders why use data from regions far away?

LearningLearning MLP/RBF: first fast MSE reduction, very slow later. Typical MSE(t) learning curve: after 10 iterations almost all work is done, but the final convergence is achieved only after a very long process, about 1000 iterations. What is going on?

Learning trajectories Take weights W i from iterations i=1..K; PCA on W i covariance matrix captures 95-95% variance for most data, so error function in 2D shows realistic learning trajectories. Instead of local minima large flat valleys are seen – why? Data far from decision borders has almost no influence, the main reduction of MSE is achieved by increasing ||W||, sharpening sigmoidal functions. Papers by M. Kordos & W. Duch

Support Vectors SVM gradually focuses on the training vectors near the decision hyperplane – can we do the same with MLP?

Selecting Support Vectors Active learning: if contribution to the parameter change is negligible remove the vector from training set. If the difference is sufficiently small the pattern X will have negligible influence on the training process and may be removed from the training. Conclusion: select vectors with  W (X)>  min, for training. 2 problems: possible oscillations and strong influence of outliers. Solution: adjust  min dynamically to avoid oscillations; remove also vectors with  W (X)>1  min =  max

SVNT algorithm Initialize the network parameters W, set  =0.01,  min =0, set SV=T. Until no improvement is found in the last N last iterations do Optimize network parameters for N opt steps on SV data. Run feedforward step on T to determine overall accuracy and errors, take SV={X|  (X)  [  min,1  min ]}. If the accuracy increases: compare current network with the previous best one, choose the better one as the current best increase  min =  min  and make forward step selecting SVs If the number of support vectors |SV| increases: decrease  min  min  ; decrease  =  /1.2 to avoid large changes

XOR solution

Satellite image data Multi-spectral values of pixels in the 3x3 neighborhoods in section 82x100 of an image taken by the Landsat Multi-Spectral Scanner; intensities = 0-255, training has 4435 samples, test 2000 samples. Central pixel in each neighborhood is red soil (1072), cotton crop (479), grey soil (961), damp grey soil (415), soil with vegetation stubble (470), and very damp grey soil (1038 training samples). Strong overlaps between some classes. System and parameters Train accuracy Test accuracy SVNT MLP, 36 nodes,  =0.596.5 91.3 kNN, k=3, Manhattan -- 90.9 SVM Gaussian kernel (optimized) 91.6 88.4 RBF, Statlog result88.9 87.9 MLP, Statlog result88.8 86.1 C4.5 tree 96.0 85.0

Satellite image data – MDS outputs

Hypothyroid data 2 years real medical screening tests for thyroid diseases, 3772 cases with 93 primary hypothyroid and 191 compensated hypothyroid, the remaining 3488 cases are healthy; 3428 test, similar class distribution. 21 attributes (15 binary, 6 continuous) are given, but only two of the binary attributes (on thyroxine, and thyroid surgery) contain useful information, therefore the number of attributes has been reduced to 8. Method % train % test C-MLP2LN rules99.89 99.36 MLP+SCG, 4 neurons 99.81 99.24 SVM Minkovsky opt kernel 100.0 99.18 MLP+SCG, 4 neur, 67 SV 99.95 99.01 MLP+SCG, 4 neur, 45 SV 100.0 98.92 MLP+SCG, 12 neur. 100.0 98.83 Cascade correlation100.098.5 MLP+backprop 99.60 98.5 SVM Gaussian kernel 99.76 98.4

Hypothyroid data

DiscussionDiscussion SVNT is very easy to implement, here only batch version with SCG training was used. First step only, but promising results. Found smaller support vector sets than SVM; may be useful in one-class learning; speeds up training. Problems: possible oscillations, selection requires more careful analysis – but oscillations help to explore the MSE landscape; additional parameters – but rather easy to set; More empirical tests needed.

Thank you for lending your ears...

Support Vector Neural Training Włodzisław Duch Department of Informatics Nicolaus Copernicus University, Toruń, Poland School of Computer Engineering,

Similar presentations

Presentation on theme: "Support Vector Neural Training Włodzisław Duch Department of Informatics Nicolaus Copernicus University, Toruń, Poland School of Computer Engineering,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Support Vector Neural Training Włodzisław Duch Department of Informatics Nicolaus Copernicus University, Toruń, Poland School of Computer Engineering,

Similar presentations

Presentation on theme: "Support Vector Neural Training Włodzisław Duch Department of Informatics Nicolaus Copernicus University, Toruń, Poland School of Computer Engineering,"— Presentation transcript:

Similar presentations

About project

Feedback