Support Vector Neural Training

Slides:



Advertisements
Similar presentations
Visualization of the hidden node activities or hidden secrets of neural networks. Włodzisław Duch Department of Informatics Nicolaus Copernicus University,
Advertisements

1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Universal Learning Machines (ULM) Włodzisław Duch and Tomasz Maszczyk Department of Informatics, Nicolaus Copernicus University, Toruń, Poland ICONIP 2009,
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 7: Learning in recurrent networks Geoffrey Hinton.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Heterogeneous Forests of Decision Trees Krzysztof Grąbczewski & Włodzisław Duch Department of Informatics, Nicholas Copernicus University, Torun, Poland.
Heterogeneous adaptive systems Włodzisław Duch & Krzysztof Grąbczewski Department of Informatics, Nicholas Copernicus University, Torun, Poland.
K-separability Włodzisław Duch Department of Informatics Nicolaus Copernicus University, Torun, Poland School of Computer Engineering, Nanyang Technological.
Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Almost Random Projection Machine with Margin Maximization and Kernel Features Tomasz Maszczyk and Włodzisław Duch Department of Informatics, Nicolaus Copernicus.
Coloring black boxes: visualization of neural network decisions Włodzisław Duch School of Computer Engineering, Nanyang Technological University, Singapore,
Support Vector Neural Training Włodzisław Duch Department of Informatics Nicolaus Copernicus University, Toruń, Poland School of Computer Engineering,
Transfer functions: hidden possibilities for better neural networks. Włodzisław Duch and Norbert Jankowski Department of Computer Methods, Nicholas Copernicus.
How to learn hard Boolean functions Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland School of Computer Engineering,
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
CS Instance Based Learning1 Instance Based Learning.
Dan Simon Cleveland State University
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
A Genetic Algorithms Approach to Feature Subset Selection Problem by Hasan Doğu TAŞKIRAN CS 550 – Machine Learning Workshop Department of Computer Engineering.
Artificial Neural Networks
Introduction to Neural Networks Debrup Chakraborty Pattern Recognition and Machine Learning 2006.
Computational Intelligence: Methods and Applications Lecture 20 SSV & other trees Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
Towards CI Foundations Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland Google: W. Duch WCCI’08 Panel Discussion.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Learning to Detect Faces A Large-Scale Application of Machine Learning (This material is not in the text: for further information see the paper by P.
Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4.
Neural Networks - lecture 51 Multi-layer neural networks  Motivation  Choosing the architecture  Functioning. FORWARD algorithm  Neural networks as.
Computational Intelligence: Methods and Applications Lecture 24 SVM in the non-linear case Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Computational Intelligence: Methods and Applications Lecture 22 Linear discrimination - variants Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with.
Computational Intelligence: Methods and Applications Lecture 9 Self-Organized Mappings Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Computational Intelligence: Methods and Applications Lecture 14 Bias-variance tradeoff – model selection. Włodzisław Duch Dept. of Informatics, UMK Google:
Neural Networks - Berrin Yanıkoğlu1 MLP & Backpropagation Issues.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning Supervised Learning Classification and Regression
Neural networks and support vector machines
Big data classification using neural network
CSSE463: Image Recognition Day 14
It is time for a change: Two New Deep Learning Algorithms beyond Backpropagation By Bojan PLOJ.
Fall 2004 Backpropagation CS478 - Machine Learning.
Support Feature Machine for DNA microarray data
Deep Feedforward Networks
Histograms CSE 6363 – Machine Learning Vassilis Athitsos
Computational Intelligence: Methods and Applications
Real Neurons Cell structures Cell body Dendrites Axon
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Structure learning with deep autoencoders
An Introduction to Support Vector Machines
Tomasz Maszczyk and Włodzisław Duch Department of Informatics,
Hidden Markov Models Part 2: Algorithms
Projection of network outputs
Perceptron as one Type of Linear Discriminants
network of simple neuron-like computing elements
Artificial Neural Networks
Fuzzy rule-based system derived from similarity to prototypes
Visualization of the hidden node activities or hidden secrets of neural networks. Włodzisław Duch Department of Informatics Nicolaus Copernicus University,
Ensemble learning.
Support Vector Machines
Visualization of the hidden node activities or hidden secrets of neural networks. Włodzisław Duch Department of Informatics Nicolaus Copernicus University,
CSSE463: Image Recognition Day 18
Heterogeneous adaptive systems
Ensemble Methods: Bagging.
Linear Discrimination
Nonlinear Conjugate Gradient Method for Supervised Training of MLP
A task of induction to find patterns
Image recognition.
Presentation transcript:

Support Vector Neural Training Włodzisław Duch Department of Informatics Nicolaus Copernicus University, Toruń, Poland School of Computer Engineering, Nanyang Technological University, Singapore Google: Duch ICANN Warsaw, Sept. 2005

Plan Main idea. Support Vector Machines and active learning. Neural Networks and Support Vectors Pedagogical example Results on real data

Main idea What data should be used for training? Given conditional distributions P(X|C) for dengue fever for: World population. ASEAN countries. Singapore only. Choa Chu Kang only? Which distributions should we use? If we know that X is from Choa Chu Kang and P(X|C) is reliable local knowledge should be used. If X comes from region close to decision borders why use data from regions far away?

Learning MLP/RBF: first fast MSE reduction, very slow later. Typical MSE(t) learning curve: after 10 iterations almost all work is done, but the final convergence is achieved only after a very long process, about 1000 iterations. What is going on?

Learning trajectories Take weights Wi from iterations i=1..K; PCA on Wi covariance matrix captures 95-95% variance for most data, so error function in 2D shows realistic learning trajectories. Papers by M. Kordos & W. Duch Instead of local minima large flat valleys are seen – why? Data far from decision borders has almost no influence, the main reduction of MSE is achieved by increasing ||W||, sharpening sigmoidal functions.

Support Vectors SVM gradually focuses on the training vectors near the decision hyperplane – can we do the same with MLP?

Selecting Support Vectors Active learning: if contribution to the parameter change is negligible remove the vector from training set. If the difference is sufficiently small the pattern X will have negligible influence on the training process and may be removed from the training. Conclusion: select vectors with eW(X)>emin, for training. 2 problems: possible oscillations and strong influence of outliers. Solution: adjust emin dynamically to avoid oscillations; remove also vectors with eW(X)>1-emin =emax

SVNT algorithm Initialize the network parameters W, set De=0.01, emin=0, set SV=T. Until no improvement is found in the last Nlast iterations do Optimize network parameters for Nopt steps on SV data. Run feedforward step on T to determine overall accuracy and errors, take SV={X|e(X) [emin,1-emin]}. If the accuracy increases: compare current network with the previous best one, choose the better one as the current best increase emin=emin+De and make forward step selecting SVs If the number of support vectors |SV| increases: decrease emin=emin-De; decrease De = De/1.2 to avoid large changes

XOR solution

Satellite image data Multi-spectral values of pixels in the 3x3 neighborhoods in section 82x100 of an image taken by the Landsat Multi-Spectral Scanner; intensities = 0-255, training has 4435 samples, test 2000 samples. Central pixel in each neighborhood is red soil (1072), cotton crop (479), grey soil (961), damp grey soil (415), soil with vegetation stubble (470), and very damp grey soil (1038 training samples). Strong overlaps between some classes. System and parameters Train accuracy Test accuracy SVNT MLP, 36 nodes, a=0.5 96.5 91.3 kNN, k=3, Manhattan -- 90.9 SVM Gaussian kernel (optimized) 91.6 88.4 RBF, Statlog result 88.9 87.9 MLP, Statlog result 88.8 86.1 C4.5 tree 96.0 85.0

Satellite image data – MDS outputs

Hypothyroid data 2 years real medical screening tests for thyroid diseases, 3772 cases with 93 primary hypothyroid and 191 compensated hypothyroid, the remaining 3488 cases are healthy; 3428 test, similar class distribution. 21 attributes (15 binary, 6 continuous) are given, but only two of the binary attributes (on thyroxine, and thyroid surgery) contain useful information, therefore the number of attributes has been reduced to 8. Method % train % test C-MLP2LN rules 99.89 99.36 MLP+SCG, 4 neurons 99.81 99.24 SVM Minkovsky opt kernel 100.0 99.18 MLP+SCG, 4 neur, 67 SV 99.95 99.01 MLP+SCG, 4 neur, 45 SV 100.0 98.92 MLP+SCG, 12 neur. 100.0 98.83 Cascade correlation 100.0 98.5 MLP+backprop 99.60 98.5 SVM Gaussian kernel 99.76 98.4

Hypothyroid data

Discussion SVNT is very easy to implement, here only batch version with SCG training was used. First step only, but promising results. Found smaller support vector sets than SVM; may be useful in one-class learning; speeds up training. Problems: possible oscillations, selection requires more careful analysis – but oscillations help to explore the MSE landscape; additional parameters – but rather easy to set; More empirical tests needed.

Thank you for lending your ears ...