Tomasz Maszczyk and Włodzisław Duch Department of Informatics,

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

Universal Learning Machines (ULM) Włodzisław Duch and Tomasz Maszczyk Department of Informatics, Nicolaus Copernicus University, Toruń, Poland ICONIP 2009,
Support Vector Machines
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Supervised Learning Recap
Lecture 13 – Perceptrons Machine Learning March 16, 2010.
Ch. 4: Radial Basis Functions Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from many Internet sources Longin.
Lecture 14 – Neural Networks
Heterogeneous adaptive systems Włodzisław Duch & Krzysztof Grąbczewski Department of Informatics, Nicholas Copernicus University, Torun, Poland.
RBF Neural Networks x x1 Examples inside circles 1 and 2 are of class +, examples outside both circles are of class – What NN does.
Almost Random Projection Machine with Margin Maximization and Kernel Features Tomasz Maszczyk and Włodzisław Duch Department of Informatics, Nicolaus Copernicus.
Support Vector Neural Training Włodzisław Duch Department of Informatics Nicolaus Copernicus University, Toruń, Poland School of Computer Engineering,
Transfer functions: hidden possibilities for better neural networks. Włodzisław Duch and Norbert Jankowski Department of Computer Methods, Nicholas Copernicus.
November 2, 2010Neural Networks Lecture 14: Radial Basis Functions 1 Cascade Correlation Weights to each new hidden node are trained to maximize the covariance.
Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm Greg Grudic.
Support Feature Machine for DNA microarray data Tomasz Maszczyk and Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland.
Support Feature Machines: Support Vectors are not enough Tomasz Maszczyk and Włodzisław Duch Department of Informatics, Nicolaus Copernicus University,
Aula 4 Radial Basis Function Networks
Radial Basis Function (RBF) Networks
Radial-Basis Function Networks
Radial Basis Function Networks
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
NEURAL NETWORKS FOR DATA MINING
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
M Machine Learning F# and Accord.net. Alena Dzenisenka Software architect at Luxoft Poland Member of F# Software Foundation Board of Trustees Researcher.
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
Non-Bayes classifiers. Linear discriminants, neural networks.
CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.
Computational Intelligence: Methods and Applications Lecture 24 SVM in the non-linear case Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Computational Intelligence: Methods and Applications Lecture 22 Linear discrimination - variants Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning Supervised Learning Classification and Regression
Neural networks and support vector machines
CSSE463: Image Recognition Day 14
CS 9633 Machine Learning Support Vector Machines
PREDICT 422: Practical Machine Learning
Support Feature Machine for DNA microarray data
Deep Feedforward Networks
ECE 5424: Introduction to Machine Learning
Learning with Perceptrons and Neural Networks
Computational Intelligence: Methods and Applications
A Support Vector Machine Approach to Sonar Classification
Support Vector Machines
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Machine Learning. Support Vector Machines A Support Vector Machine (SVM) can be imagined as a surface that creates a boundary between points of data.
An Introduction to Support Vector Machines
Machine Learning Today: Reading: Maria Florina Balcan
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
CS 2750: Machine Learning Support Vector Machines
Hyperparameters, bias-variance tradeoff, validation
Projection of network outputs
Neuro-Computing Lecture 4 Radial Basis Function Network
Neural Network - 2 Mayank Vatsa
Machine Learning. Support Vector Machines A Support Vector Machine (SVM) can be imagined as a surface that creates a boundary between points of data.
Machine Learning. Support Vector Machines A Support Vector Machine (SVM) can be imagined as a surface that creates a boundary between points of data.
Artificial Intelligence Lecture No. 28
Visualization of the hidden node activities or hidden secrets of neural networks. Włodzisław Duch Department of Informatics Nicolaus Copernicus University,
Support Vector Machines
Introduction to Radial Basis Function Networks
Visualization of the hidden node activities or hidden secrets of neural networks. Włodzisław Duch Department of Informatics Nicolaus Copernicus University,
Support Vector Neural Training
Heterogeneous adaptive systems
University of Wisconsin - Madison
Support Vector Machines 2
Machine Learning for Cyber
Presentation transcript:

Almost Random Projection Machine with Margin Maximization and Kernel Features Tomasz Maszczyk and Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland ICANN 2010

Plan Main idea Inspirations Description of our approach Influence of margin maximization Results Conclusions

Main idea backpropagation algorithm for training MLP – non-linear systems, non-separable problems, slowly MLP are simpler than biological NN backpropagation is hard to justify from neurobiological perspective algorithms capable of solving problems of similar complexity – important chalange, needed to create ambitious ML applications

Inspirations neurons in association cortex form strongly connected microcircuits found in cortical minicolumns, resonate in diffrent frequencies perceptron observing these minicolumns learn to react on specific signals around particular frequency resonators do not get excited when overall activity is high, but rather when specific levels of activity are reached

Main idea Following these inspirations - aRPM, single hidden layer constructive network based on random projections added only if useful; easily solve highly-non-separable problems Kernel-based features extend the hypothesis space. Cover theorem guarantees better separability.

Main idea Linear kernels define new features t(X;W)=KL(X,W)=X·W based on a projection on the W direction. Gaussian kernels g(X,W)=exp(-||X-W||/(2σ2)) evaluate similarity between two vectors using weighted radial distance function. The final discriminant function is constructed as a linear combination of such kernels. Focus on margin maximization in aRPM; new projections added only if they increase correct classification probability of those examples that are wrongly classified or are close to the decision border.

aRPM Some projections may not be very useful, but the distribution of the training data along direction may have a range of values that includes a pure cluster of projected patterns. This creates binary features bi(X) = t(X;Wi,[ta,tb]) ϵ {0,1}, based on linear restricted projection in the direction W. Good candidate feature should cover some minimal number η of training vectors.

aRPM Features based on kernels - here only Gaussian kernels with several values of dispersion σ g(X;Xi,σ)=exp(-||Xi-X||2/2σ2). Local kernel features have values close to zero except around their support vectors Xi. Therefore their usefulness is limited to the neighborhood O(Xi) in which gi(X)>ϵ.

aRPM To create multi-resolution kernel features – first with large σ (smooth decision borders), then smaller σ (features more localized). The candidate feature is converted into a permanent node only if it increases classification margin. Incremental algorithm, expand feature space until no improvements; move vectors away from decision border.

aRPM WTA - sum of the activation of hidden nodes. Projections with added intervals give binary activations bi(X), but the values of kernel features g(X;Xi,σ) are summed, giving a total activation A(C|X) for each class. Plotting A(C|X) versus A(¬C|X) for each vector leads to scatterograms, giving an idea how far is a given vector from the decision border.

with margin maximization no margin maximization Heart Statlog data with margin maximization

no margin maximization with margin maximization Wine data, 3 classes, pairwise

aRPM In the WTA |A(C|X)-A(¬C|X)| estimates distance from the decision border. Specifying confidence of the model for vector XϵC using logistic function: F(X)=1/(1+exp(-(A(C|X)-A(¬C|X)))) gives values around 1 if X is on the correct side and far from the border, and goes to zero if it is on the wrong side. Total confidence in the model may then be estimated by summing over all vectors.

aRPM The final effect of adding new feature h(X) to the total confidence measure is therefore: U(H,h)=Σ (F(X;H+h) - F(X;H)) If U(H,h)>α than the new feature is accepted providing a larger margin. To make final decision aRPM with margin maximization uses WTA mechanism or LD.

Datasets

Results

Conclusions aRPM algorithm is improved here in two ways: by adding selection of network nodes to ensure wide margins, by adding kernel features. Relations to kernel SVM and biological plausibility make aRPM a very interesting subject of study. In contrast to typical neural networks that learn by parameter adaptation there is no learning involved, just generation and selection of features. Shifting the focus to generation of new features followed by the WTA, makes it a candidate for a fast-learning neural algorithm that is simpler and learns faster than MLP or RBF networks.

Conclusions Feature selection and construction, finding interesting views on the data is the basis of natural categorization and learning processes. Scatterogram of WTA output shows the effect of margin optimization, and allows for estimation of confidence in classification of a given data. Further improvements: admission of impure clusters instead of binary features, use of other kernel features, selection of candidate SV and optimization of the algorithm.

Thank You!