The Multi-Layer Perceptron non-linear modelling by adaptive pre-processing Rob Harrison AC&SE.

The Multi-Layer Perceptron non-linear modelling by adaptive pre-processing Rob Harrison AC&SE

error, e input data, x output, y adaptive layer target, z pre-processing layer

Adaptive Basis Functions “linear” models –fixed pre-processing –parameters → cost “benign” –easy to optimize but –combinatorial –arbitrary choices what is best pre-processor to choose?

The Multi-Layer Perceptron formulated from loose biological principles popularized mid 1980s –Rumelhart, Hinton & Williams 1986 Werbos 1974, Ho 1964 “learn” pre-processing stage from data layered, feed-forward structure –sigmoidal pre-processing –task-specific output non-linear model

Generic MLP INPUTLAYERINPUTLAYER OUTPUTLAYEROUTPUTLAYER HIDDEN LAYERS

Two-Layer MLP

A Sigmoidal Unit

Combinations of Sigmoids

Universal Approximation linear combination of “enough” sigmoids –Cybenko, 1990 single hidden layer adequate –more may be better choose hidden parameters (w, b) optimally problem solved?

Pros compactness –potential to obtain same veracity with much smaller model c.f. sparsity/complexity control in linear models

Compactness of Model

& Cons parameter → cost “malign” –optimization difficult –many solutions possible –effect of hidden weights in output non-linear

Multi-Modal Cost Surface global min local min gradient?

Heading Downhill assume –minimization (e.g. SSE) –analytically intractable step parameters downhill w new = w old + step in right direction backpropagation (of error) –slow but efficient conjugate gradients, Levenburg/Marquardt –for preference

backprop conjugate gradients

Implications correct structure can get “wrong” answer –dependency on initial conditions –might be good enough train / test (cross-validation) required –is poor behaviour due to network structure? ICs? additional dimension in development

Is It a Problem? pros seem to outweigh cons good solutions often arrived at quickly all previous issues apply –sample density –sample distribution –lack of prior knowledge

How to Use to “generalize” a GLM –linear regression – curve-fitting linear output + SSE –logistic regression – classification logistic output + cross-entropy (deviance) extend to multinomial, ordinal –e.g. softmax outptut + cross entropy –Poisson regression – count data

What is Learned? the right thing –in a maximum likelihood sense theoretical conditional mean of target data, E(z|x) –implies probability of class membership for classification P(C i |x) estimated if good estimate then y → E(z|x)

Simple 1-D Function

More Complex 1-D Example

Local Solutions

A Dichotomy data courtesy B Ripley

Class-Conditional Probabilities

Linear Decision Boundary induced by P(C 1 |x) = P(C 2 |x)

Non-Linear Decision Boundary

Over-fitted Decision Boundary

Invariances e.g. scale, rotation, translation augment training data design regularizer –tangent propagation (Simmard et al, 1992) build into MLP structure –convolutional networks (LeCun et al, 1989)

Multi-Modal Outputs assumed distribution of outputs unimodal sometimes not –e.g. inverse robot kinematics E(z|x) not unique use mixture model for output density Mixture Density Networks (Bishop, 1996)

Bayesian MLPs deterministic treatment point estimates, E(z|x) can use MLP as estimator in a Bayesian treatment –non-linearity → approximate solutions approximate estimate of spread E((z-y) 2 |x)

NETTalk Introduced by T Sejnowski & C Rosenberg First realistic ANNs demo English text-to-speech non-phonetic Context sensitive –brave, gave, save, have(within word) –read, read(within sentence) Mimicked DECTalk (rule-based)

Representing Text 29 “letters” (a–z + space, comma, stop) –29 binary inputs indicating letter Context –local (within word) – global (sentence) 7 letters at a time: middle → AFs –3 each side give context, e.g. boundaries

Coding

NETTalk Performance 1024 words continuous, informal speech 1000 most common words Best performing MLP 26 o/p, 7x29 i/p, 80 hidden 309 units, 18,320 weights 94% training accuracy 85% testing accuracy

The Multi-Layer Perceptron non-linear modelling by adaptive pre-processing Rob Harrison AC&SE.

Similar presentations

Presentation on theme: "The Multi-Layer Perceptron non-linear modelling by adaptive pre-processing Rob Harrison AC&SE."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The Multi-Layer Perceptron non-linear modelling by adaptive pre-processing Rob Harrison AC&SE.

Similar presentations

Presentation on theme: "The Multi-Layer Perceptron non-linear modelling by adaptive pre-processing Rob Harrison AC&SE."— Presentation transcript:

Similar presentations

About project

Feedback