Heterogeneous adaptive systems

Slides:

Advertisements

Similar presentations

Visualization of the hidden node activities or hidden secrets of neural networks. Włodzisław Duch Department of Informatics Nicolaus Copernicus University,

Advertisements

ECG Signal processing (2)

Universal Learning Machines (ULM) Włodzisław Duch and Tomasz Maszczyk Department of Informatics, Nicolaus Copernicus University, Toruń, Poland ICONIP 2009,

NEURAL NETWORKS Perceptron

Meta-Learning: the future of data mining Włodzisław Duch & Co Department of Informatics, Nicolaus Copernicus University, Toruń, Poland School of Computer.

Support Vector Machines

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

GhostMiner Wine example Włodzisław Duch Dept. of Informatics, Nicholas Copernicus University, Toruń, Poland ISEP Porto,

Heterogeneous Forests of Decision Trees Krzysztof Grąbczewski & Włodzisław Duch Department of Informatics, Nicholas Copernicus University, Torun, Poland.

Ch. 4: Radial Basis Functions Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from many Internet sources Longin.

Heterogeneous adaptive systems Włodzisław Duch & Krzysztof Grąbczewski Department of Informatics, Nicholas Copernicus University, Torun, Poland.

RBF Neural Networks x x1 Examples inside circles 1 and 2 are of class +, examples outside both circles are of class – What NN does.

K-separability Włodzisław Duch Department of Informatics Nicolaus Copernicus University, Torun, Poland School of Computer Engineering, Nanyang Technological.

Radial Basis Functions

Almost Random Projection Machine with Margin Maximization and Kernel Features Tomasz Maszczyk and Włodzisław Duch Department of Informatics, Nicolaus Copernicus.

Support Vector Neural Training Włodzisław Duch Department of Informatics Nicolaus Copernicus University, Toruń, Poland School of Computer Engineering,

Transfer functions: hidden possibilities for better neural networks. Włodzisław Duch and Norbert Jankowski Department of Computer Methods, Nicholas Copernicus.

A Study of the Relationship between SVM and Gabriel Graph ZHANG Wan and Irwin King, Multimedia Information Processing Laboratory, Department of Computer.

Competent Undemocratic Committees Włodzisław Duch, Łukasz Itert and Karol Grudziński Department of Informatics, Nicholas Copernicus University, Torun,

Support Feature Machine for DNA microarray data Tomasz Maszczyk and Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland.

Hazırlayan NEURAL NETWORKS Radial Basis Function Networks I PROF. DR. YUSUF OYSAL.

CS Instance Based Learning1 Instance Based Learning.

Aula 4 Radial Basis Function Networks

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Radial Basis Function (RBF) Networks

Last lecture summary.

Radial-Basis Function Networks

Hazırlayan NEURAL NETWORKS Radial Basis Function Networks II PROF. DR. YUSUF OYSAL.

Radial Basis Function Networks

8/10/ RBF NetworksM.W. Mak Radial Basis Function Networks 1. Introduction 2. Finding RBF Parameters 3. Decision Surface of RBF Networks 4. Comparison.

Radial Basis Function Networks

July 11, 2001Daniel Whiteson Support Vector Machines: Get more Higgs out of your data Daniel Whiteson UC Berkeley.

Artificial Intelligence Lecture No. 28 Dr. Asad Ali Safi Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.

_____KOSYR 2001______ Rules for Melanoma Skin Cancer Diagnosis Włodzisław Duch, K. Grąbczewski, R. Adamczak, K. Grudziński, Department of Computer Methods,

Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.

Radial Basis Function Networks:

Computational Intelligence: Methods and Applications Lecture 30 Neurofuzzy system FSM and covering algorithms. Włodzisław Duch Dept. of Informatics, UMK.

1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.

An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.

Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.

So Far……  Clustering basics, necessity for clustering, Usage in various fields : engineering and industrial fields  Properties : hierarchical, flat,

Computational Intelligence: Methods and Applications Lecture 36 Meta-learning: committees, sampling and bootstrap. Włodzisław Duch Dept. of Informatics,

Breast Cancer Diagnosis via Neural Network Classification Jing Jiang May 10, 2000.

Towards CI Foundations Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland Google: W. Duch WCCI’08 Panel Discussion.

Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.

Towards Science of DM Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland Google: W. Duch WCCI’08 Panel Discussion.

Computational Intelligence: Methods and Applications Lecture 33 Decision Tables & Information Theory Włodzisław Duch Dept. of Informatics, UMK Google:

Computational Intelligence: Methods and Applications Lecture 21 Linear discrimination, linear machines Włodzisław Duch Dept. of Informatics, UMK Google:

Computational Intelligence: Methods and Applications Lecture 29 Approximation theory, RBF and SFN networks Włodzisław Duch Dept. of Informatics, UMK Google:

Computational Intelligence: Methods and Applications Lecture 24 SVM in the non-linear case Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.

Today’s Lecture Neural networks Training

PREDICT 422: Practical Machine Learning

Support Feature Machine for DNA microarray data

Computational Intelligence: Methods and Applications

Data Mining, Neural Network and Genetic Programming

Radial Basis Function G.Anuradha.

Department of Informatics, Nicolaus Copernicus University, Toruń

Tomasz Maszczyk and Włodzisław Duch Department of Informatics,

Projection of network outputs

Neuro-Computing Lecture 4 Radial Basis Function Network

Neural Network - 2 Mayank Vatsa

Fuzzy rule-based system derived from similarity to prototypes

Artificial Intelligence Lecture No. 28

Visualization of the hidden node activities or hidden secrets of neural networks. Włodzisław Duch Department of Informatics Nicolaus Copernicus University,

McCulloch–Pitts Neuronal Model :

Introduction to Radial Basis Function Networks

Visualization of the hidden node activities or hidden secrets of neural networks. Włodzisław Duch Department of Informatics Nicolaus Copernicus University,

Support Vector Neural Training

Modeling IDS using hybrid intelligent systems

Support Vector Machines 2

Presentation transcript:

Heterogeneous adaptive systems Włodzisław Duch & Krzysztof Grąbczewski Department of Informatics, Nicholas Copernicus University, Torun, Poland. http://www.is.umk.pl

Why is this important? MLPs are universal approximators, best choice? Wrong bias => poor results, complex networks. No single method may achieve best results for all datasets. 2-class problems, two situations: Class 1 inside the sphere, Class 2 outside. MLP: at least N +1 hyperplanes, O(N2) parameters. RBF: 1 Gaussian, O(N) parameters. C1 in the corner defined by (1,1 ... 1) hyperplane, C2 outside. MLP: 1 hyperplane, O(N) parameters. RBF: many Gaussians, O(N2) parameters, poor approx. Combination: needs both hyperplane and hypersphere!

Inspirations Logical rule: IF x1>0 & x2>0 THEN Class1 Else Class2 is not properly represented neither by MLP nor RBF! Result: decision trees and logical rules perform on some datasets (cf. hypothyroid) significantly better than MLPs! Speed of learning+network complexity depends on TF. Fast learning requires flexible „brain modules” - TF. Biological inspirations: sigmoidal neurons are crude approximation to the lowest neural level. Interesting brain functions are done by interacting minicolumns, implementing complex functions. Human categorization: never so simple. Modular networks: networks of networks. First step beyond single neurons: transfer functions providing flexible decision borders.

Heterogeneous systems Homogenous systems: one type of “building blocks”, same type of decision borders. Ex: neural networks, SVMs, decision trees, kNNs …. Committees combine many models together, but lead to complex models that are difficult to understand. Discovering simplest class structures, its inductive bias: requires heterogeneous adaptive systems (HAS). Ockham razor: simpler systems are better. HAS examples: NN with many types of neuron transfer functions. k-NN with different distance functions. DT with different types of test criteria.

TF in Neural Networks Heterogeneous: Choices with selection of optimal functions: Homogenous NN: select best TF, try several types Ex: RBF networks; SVM kernels (may give 50=>80% change). Heterogeneous NN: one network, several types of TF. Ex: Adaptive Subspace SOM (Kohonen 1995), linear subspaces. Projections on a space of various basis functions. Input enhancement: adding fi(X) to achieve separability. Ex: functional link networks (Pao 1989), tensor products of inputs; D-MLP model. Heterogeneous: 1. Start from large network with different TF, use regularization to prune 2. Construct network adding nodes selected from a pool of candidates 3. Use very flexible TF, force them to specialize.

Taxonomy - activation f.

Taxonomy - output f.

Taxonomy - TF

Most flexible TFs Conical functions: mixed activations Lorentzian: mixed activations Bicentral - separable functions

Optimal Transfer Function network OTF-NN, based on IncNet ontogenic network architecture (N Jankowski), statistical criteria for pruning/growth + Kalman filter learning. XOR solution with: 2 Gaussian functions 1 Gaussian + 1 sigmoidal function 2 sigmoidal functions. 1 Gaussian with G(W.X) activation.

OTF for half sphere/subspace 2D and 10D problem considered, 2000 points. OTF starts with 3 Gaussian + 3 sigmoidal f. 2-3 neuron solutions found, 97.5-99% accuracy. Simplest solution: 1 Gaussian + 1 sigmoid 3 sigmoidal functions – acceptable solution.

Heterogeneous FSM Feature Space Mapping: neurofuzzy ontogenic network, selects a separable localized transfer function from a pool of several types of functions. Rotated halfspace + Gauss Simplest solution found: 1 Gaussian + 1 rectangular function. In 5D and 10D needs many points.

Similarity-based HAS Local distance functions optimized differently in different regions of feature space. Weighted Minkovsky distance functions: Ex: a=20 and other types of functions, including probabilistic functions, changing piecewise linear decision borders. RBF networks with different transfer function; LVQ with different local functions.

HAS decision trees Decision trees select the best feature/threshold value for univariate and multivariate trees: Decision borders: hyperplanes. Introducing tests based on La Minkovsky metric. For L2 spherical decision border are produced. For L∞ rectangular border are produced. Many choices, for example Fisher Linear Discrimination decision trees.

SSV HAS DT Define left and right areas for test T with threshold s: Count how many pairs of vectors from different classes are separated and how many vectors from the same class are separated.

SSV HAS algorithm Compromise between complexity/flexibility: Use training vectors for reference R Calculate TR(X)=D(X,R) for all data vectors, i.e. the distance matrix. Use TR(X) as additional test conditions. Calculate SSV(s) for each condition and select the best split. Different distance functions lead to different decision borders. Several distance functions are used simultaneously. 2000 points, noisy 10 D plane, rotated 45o, + half-sphere centered on the plane. Standard SSV tree: 44 rules, 99.7% HAS SSV tree (Euclidean): 15 rules, 99.9%

SSV HAS Iris Iris data: 3 classes, 50 samples/class. SSV solution with the usual conditions (6 errors, 96%), or with distance test using vectors from a give node only: if petal length < 2.45 then class 1 if petal length > 2.45 and petal width < 1.65 then class 2 if petal length > 2.45 and petal width > 1.65 then class 3 SSV with Euclidean distance tests using all training vectors as reference (5 errors, 96.7%) 1. if petal length < 2.45 then class 1 2. if petal length > 2.45 and ||X-R15|| < 4.02 then class 2 3. if petal length > 2.45 and ||X-R15|| > 4.02 then class 3 ||X-R15|| is the Euclidean distance to the vector R15.

SSV HAS Wisconsin Wisconsin breast cancer dataset (UCI) 699 cases, 9 features (cell parameters, 1..10) Classes: benign 458 (65.5%) & malignant 241 (34.5%). Single rule gives simplest known description of this data: IF ||X-R303|| < 20.27 then malignant else benign 18 errors, 97.4% accuracy. Good prototype for malignant! Simple thresholds, that’s what MDs like the most! Best L1O error 98.3% (FSM), best 10CV around 97.5% (Naïve Bayes + kernel, SVM) C 4.5 gives 94.7±2.0% SSV without distances: 96.4±2.1% Several simple rules of similar accuracy are created in CV tests.

Conclusions Heterogeneous systems are worth investigating. Good biological justification of HAS approach. Better learning cannot repair wrong bias of the model. StatLog report: large differences of RBF and MLP on many datasets. Networks, trees, kNN should select/optimize their functions. Radial and sigmoidal functions in NN are not the only choice. Simple solutions may be discovered by HAS systems. Open questions: How to train heterogeneous systems? Find optimal balance between complexity/flexibility? Ex. complexity of nodes vs. interactions (weights)? Hierarchical, modular networks: nodes that are networks themselves.

Perhaps still the beginning ... The End ? Perhaps still the beginning ...