Artificial Neural Networks

Slides:



Advertisements
Similar presentations
EcoTherm Plus WGB-K 20 E 4,5 – 20 kW.
Advertisements

University Paderborn 07 January 2009 RG Knowledge Based Systems Prof. Dr. Hans Kleine Büning Reinforcement Learning.
1 A B C
AGVISE Laboratories %Zone or Grid Samples – Northwood laboratory
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
PDAs Accept Context-Free Languages
© Negnevitsky, Pearson Education, Introduction Introduction Hebbian learning Hebbian learning Generalised Hebbian learning algorithm Generalised.
Reinforcement Learning
1 Artificial Neural Networks Uwe Lämmel Business School Institute of Business Informatics
Slide 1Fig 26-CO, p.795. Slide 2Fig 26-1, p.796 Slide 3Fig 26-2, p.797.
Slide 1Fig 25-CO, p.762. Slide 2Fig 25-1, p.765 Slide 3Fig 25-2, p.765.
Sequential Logic Design
Copyright © 2013 Elsevier Inc. All rights reserved.
STATISTICS INTERVAL ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
STATISTICS POINT ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
David Burdett May 11, 2004 Package Binding for WS CDL.
Create an Application Title 1Y - Youth Chapter 5.
Add Governors Discretionary (1G) Grants Chapter 6.
CALENDAR.
CHAPTER 18 The Ankle and Lower Leg
Agents & Intelligent Systems Dr Liz Black
The 5S numbers game..
A Fractional Order (Proportional and Derivative) Motion Controller Design for A Class of Second-order Systems Center for Self-Organizing Intelligent.
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
Stationary Time Series
Break Time Remaining 10:00.
The basics for simulations
Factoring Quadratics — ax² + bx + c Topic
EE, NCKU Tien-Hao Chang (Darby Chang)
A sample problem. The cash in bank account for J. B. Lindsay Co. at May 31 of the current year indicated a balance of $14, after both the cash receipts.
PP Test Review Sections 6-1 to 6-6
Maths Trail. How many hanging baskets are there in the garden? 1.
MCQ Chapter 07.
Regression with Panel Data
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
Biology 2 Plant Kingdom Identification Test Review.
Chapter 1: Expressions, Equations, & Inequalities
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
Facebook Pages 101: Your Organization’s Foothold on the Social Web A Volunteer Leader Webinar Sponsored by CACO December 1, 2010 Andrew Gossen, Senior.
1 Termination and shape-shifting heaps Byron Cook Microsoft Research, Cambridge Joint work with Josh Berdine, Dino Distefano, and.
Artificial Intelligence
When you see… Find the zeros You think….
Before Between After.
Slide R - 1 Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Prentice Hall Active Learning Lecture Slides For use with Classroom Response.
12 October, 2014 St Joseph's College ADVANCED HIGHER REVISION 1 ADVANCED HIGHER MATHS REVISION AND FORMULAE UNIT 2.
: 3 00.
5 minutes.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
Copyright © 2008 Pearson Addison-Wesley. All rights reserved. Chapter 10 A Monetary Intertemporal Model: Money, Prices, and Monetary Policy.
Static Equilibrium; Elasticity and Fracture
FIGURE 12-1 Op-amp symbols and packages.
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
Clock will move after 1 minute
Copyright © 2013 Pearson Education, Inc. All rights reserved Chapter 11 Simple Linear Regression.
Lial/Hungerford/Holcomb/Mullins: Mathematics with Applications 11e Finite Mathematics with Applications 11e Copyright ©2015 Pearson Education, Inc. All.
Select a time to count down from the clock above
Copyright Tim Morris/St Stephen's School
1.step PMIT start + initial project data input Concept Concept.
9. Two Functions of Two Random Variables
A Data Warehouse Mining Tool Stephen Turner Chris Frala
1 Dr. Scott Schaefer Least Squares Curves, Rational Representations, Splines and Continuity.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
Schutzvermerk nach DIN 34 beachten 05/04/15 Seite 1 Training EPAM and CANopen Basic Solution: Password * * Level 1 Level 2 * Level 3 Password2 IP-Adr.
1 Neural Networks - Basics Artificial Neural Networks - Basics Uwe Lämmel Business School Institute of Business Informatics
Artificial Neural Nets and AI Connectionism Sub symbolic reasoning.
Unsupervised Learning
Presentation transcript:

Artificial Neural Networks Wismar Business School Artificial Neural Networks Uwe Lämmel www.wi.hs-wismar.de/~laemmel Uwe.Laemmel@hs-wismar.de

Literature & Software Robert Callan: The Essence of Neural Networks, Pearson Education, 2002. JavaNNS based on SNNS: Stuttgarter Neuronale Netze Simulator http://www.ra.cs.uni-tuebingen.de/software/JavaNNS/

Prerequisites NO algorithmic solution available or algorithmic solution too time consuming NO knowledge-based solution LOTS of experience (data) Try a NN

Content Idea An artificial Neuron – Neural Network Supervised Learning – feed-forward networks Competitive Learning – Self-Organising Map Applications

Two different types of knowledge processing Logic Conclusion sequential Aware of Symbol processing, Rule processing precise Engineering „traditional" AI Perception, Recognition parallel Not aware of Neural Networks fuzzy Cognitive oriented Connectionism

Idea A human being learns by example “learning by doing” seeing(Perception), Walking, Speaking,… Can a machine do the same? A human being uses his brain. A brain consists of millions of single cells. A cell is connected with ten thousands of other cell. Is it possible to simulate a similar structure on a computer?

Idea Artificial Neural Network Information processing similar to processes in a mammal brain heavy parallel systems, able to learn great number of simple cells ? Is it useful to copy nature ? wheel, aeroplane, ...

Idea An artificial neural network functions in a similar way a natural neural network does. we need: software neurons software connection between neurons software learning algorithms

A biological Neuron Dendrits: (Input) Getting other activations cell and cell nucleus Axon (Neurit) Dendrits Synapse Dendrits: (Input) Getting other activations Axon: (Output ) forward the activation (from 1mm up to 1m long) Synapse: transfer of activation: To other cells, e.g.. Dendrits of other neurons a cell has about 1.000 to 10.000 connections to other cells Cell Nucleus: (processing) evaluation of activation

Abstraction Dendrits: weighted connections weight: real number Axon: output: real number Synapse: --- (identity: output is directly forwarded) Cell nucleus: unit contains simple functions input = (many) real numbers processing = evaluation of activation output = real number (~activation)

An artificial Neuron w1i w2i wji oi net : input from the network ... oi net : input from the network w : weight of a connection act : activation fact : activation function  : bias/threshold fout : output function (mostly ID) o : output

A simple switch a1=__ a2=__ o=__ net= o1w1+o2 w2 a = 1, if net> = 0, otherwise o = a w1=__ w2=__ Set parameters according to function: Input neurons 1,2 : a1,a2 input pattern, here: oi=ai weights of edges: w1, w2 bias  Give values for w1, w2 ,   we can evaluate output o

Questions find values for the parameters so that a logic function is simulated: Logical AND Logical OR Logical exclusive OR (XOR) Identity We want to process more than 2 inputs. Find appropriate parameter values. Logical AND, 3 (4) inputs OR, XOR iff 2 out of 4 are 1

Mathematics in a Cell Propagation function neti(t) =  ojwj = w1i  o1 + w2i  o2 + ... Activation ai(t) – Activation at time t Activation function fact : ai(t+1) = fact(ai(t), neti(t), i)  i – bias Output function fout : oi = fout(ai)

activation functions are sigmoid functions Bias function -1,0 -0,5 0,0 0,5 1,0 -4,0 -2,0 2,0 4,0 activation functions are sigmoid functions Identity -4,0 -2,0 0,0 2,0 4,0 -2,5 -1,0 0,5 3,5

activation functions are sigmoid functions y = tanh(c·x) -1,0 -0,5 0,5 1,0 -0,6 0,6 c=1 c=2 c=3 Logistic function: y = 1/(1+exp(-c·x)) 0,5 1,0 -1,0 0,0 c=1 c=3 c=10 activation functions are sigmoid functions

Structure of a network layers input layer – contains input neurons output layer – contains output neurons hidden layer – contains hidden neurons An n-layer network has: n layer of connections which can be trained n+1 neuron layers n –1 hidden layers

Neural Network - Definition A Neural Network is characterized by connections of many (a lot of) simple units (neurons) and units exchanging signals via these connections A neural Network is a coherent, directed graph which has weighted edges and each node (neurons, units ) contains a value (activation).

Elements of a NN Connections/Links directed, weighted graph weight: wij (from cell i to cell j) weight matrix Propagation function network input of a neuron will be calculated: neti =  ojwji Learning algorithm

Example XOR-Network 1 2 3 1,5 -2 4 0,5 TRUE

Supervised Learning – feed-forward networks Idea An artificial Neuron – Neural Network Supervised Learning – feed-forward networks Architecture Backpropagation Learning Competitive Learning – Self-Organising Map Applications

Multi-layer feed-forward network

Feed-Forward Network

Evaluation of the net output Ni Nj Nk netj netk Oj=actj Ok=actk Training pattern p Oi=pi Input-Layer hidden Layer(s) Output Layer

Backpropagation Learning Algorithm supervised Learning error is a function of the weights wi : E(W) = E(w1,w2, ... , wn) We are looking for a minimal error minimal error = hollow in the error surface Backpropagation uses the gradient for weight approximation.

error curve

Problem output teaching output hidden layer error in output layer: difference output – teaching output error in a hidden layer? input layer

Mathematics modifying weights according to the gradient of the error function W = - E(W) E(W) is the gradient  is a factor, called learning parameter -1 -0,6 -0,2 0,2 0,6 1

Mathematics Here: modification of weights: W = – E(W) E(W): Gradient Proportion factor for the weight vector W, : learning factor  E(Wj) = E(w1j,w2j, ..., wnj)

Error Function Modification of a weight: (1) Error function quadratic distance between real and teaching output of all patterns p: tj - teaching output oj - real output Now: error for one pattern only (omitting pattern index p): (2)

Backpropagation rule Multi layer networks Semi linear Activation function (monotone, differentiable, e.g. logistic function) Problem: no teaching outputs for hidden neurons

Backpropagation Learning Rule Start: (6.1) dependencies: (6.2) fout = Id 6.1 in more detail: (6.3)

The 3rd and 2nd Factor 3rd Factor: dependency net input – weights (6.4) 2nd Factor: derivation of the activation function: (6.5) (6.7)

The 1st Factor 1st Factor: dependency error – output Error signal of output neuron j: (6.8) (6.9) Error signal of hidden neuron j: (6.10) j : error signal

Error Signal  j = f’act(netj)·(tj – oj) j = f’act(netj) · kwjk (6.11) (6.12) Output neuron j: j = f’act(netj)·(tj – oj) Hidden neuron j: j = f’act(netj) · kwjk

Standard Backpropagation Rule For the logistic activation function: f ´act(netj ) = fact(netj )(1 – fact(netj )) = oj (1 –oj) Therefore: and:

error signal for fact = tanh For the activation function tanh holds: f´act(netj ) = (1 – f ²act(netj )) = (1 – tanh² oj ) therefore:

Backpropagation - Problems

Backpropagation-Problems A: flat plateau backpropagation goes very slowly finding a minimum takes a lot of time B: Oscillation in a narrow gorge it jumps from one side to the other and back C: leaving a minimum if the modification in one training step is to high, the minimum can be lost

Solutions: looking at the values change the parameter of the logistic function in order to get other values Modification of weights depends on the output: if oi=0 no modification will take place If we use binary input we probably have a lot of zero-values: Change [0,1] into [-½ , ½] or [-1,1] use another activation function, eg. tanh and use [-1..1] values

Solution: Quickprop assumption: error curve is a square function calculate the vertex of the curve slope of the error curve:

Resilient Propagation (RPROP) sign and size of the weight modification are calculated separately: bij(t) – size of modification  bij(t-1) + if S(t-1)S(t) > 0 bij(t) =  bij(t-1) - if S(t-1)S(t) < 0  bij(t-1) otherwise +>1 : both ascents are equal  „big“ step 0<-<1 : ascents are different  „smaller“ step  -bij(t) if S(t-1)>0  S(t) > 0 wij(t) =  bij(t) íf S(t-1)<0  S(t) < 0  -wij(t-1) if S(t-1)S(t) < 0 (*)  -sgn(S(t))bij(t) otherwise (*) S(t) is set to 0, S(t):=0 ; at time (t+1) the 4th case will be applied.

Limits of the Learning Algorithm it is not a model for biological learning we have no teaching output in a natural learning process In a natural neural network there are no feedbacks (at least nobody has discovered yet) training of a artificial neural network is rather time consuming

Development of an NN-application calculate network output compare to teaching output use Test set data evaluate output change parameters modify weights input of training pattern build a network architecture quality is good enough error is too high

Possible Changes Architecture of NN size of a network shortcut connection partial connected layers remove/add links receptive areas Find the right parameter values learning parameter size of layers using genetic algorithms

Memory Capacity - Experiment output-layer is a copy of the input-layer training set consisting of n random pattern error: error = 0 network can store more than n patterns error >> 0 network can not store n patterns memory capacity: error > 0 and error = 0 for n-1 patterns and error >>0 for n+1 patterns

Summary Backpropagation is a Backpropagation of Error Algorithm works like gradient descent Activation Functions: Logistics, tanh Meaning of Learning parameter Modifications RPROP Backprop Momentum QuickProp Finding an appropriate Architecture: Memory Size of a Network Modifications in layer connection Applications

Binary Coding of nominal values I no order relation, n-values n neurons, each neuron represents one and only one value: example: red, blue, yellow, white, black 1,0,0,0,0 0,1,0,0,0 0,0,1,0,0 ... disadvantage: n neurons necessary, but only one of them is activated  lots of zeros in the input

Binary Coding of nominal values II no order-relation, n values m neurons, of it k neurons switched on for one single value requirement: (m choose k)  n example: red, blue, yellow, white, black 1,1,0,0 1,0,1,0 1,0,0,1 0,1,1,0 0,1,0,1 4 neuron, 2 of it switched on, (4 choose 2) > 5 advantage: fewer neurons balanced ratio of 0 and 1

Example Credit Scoring A1: Credit history A2: debt A3: collateral A4: income network architecture depends on the coding of input and output How can we code values like good, bad, 1, 2, 3, ...?

Example Credit Scoring class A3: A4:

Supervised Learning – feed-forward networks Idea An artificial Neuron – Neural Network Supervised Learning – feed-forward networks Competitive Learning – Self-Organising Map Architecture Learning Visualisation Applications

Self Organizing Maps (SOM) A natural brain can organize itself Now we look at the position of a neuron and its neighbourhood Kohonen Feature Map  two layer pattern associator Input layer is fully connected with map-layer Neurons of the map layer are fully connected to each other (virtually)

Clustering f ai output B Input set A objective: All inputs of a class are mapped onto one and the same neuron f Input set A output B ai Problem: classification in the input space is unknown Network performs a clustering

Winner Neuron Kohonen- Layer Input-Layer Winner Neuron

Learning in an SOM Choose an input k randomly Detect the neuron z which has the maximal activity Adapt the weights in the neighbourhood of z: neuron i within a radius r of z. Stop if a certain number of learning steps is finished otherwise decrease learning rate and radius, go on with step 1.

A Map Neuron look at a single neuron (without feedback): Activation: Output: fout = Id

Centre of Activation Idea: highly activated neurons push down the activation of neurons in the neighbourhood Problem: Finding the centre of activation: Neuron j with a maximal net-input Neuron j, having a weight vector wj which is similar to the input vector (Euklidian Distance): z:  x - wz  = minj  x - wj 

Changing Weights z determines the shape of the curve: weights to neurons within a radius z will be increased: wj(t+1) = wj(t) + hjz(x(t)-wj(t)) , j  z x-input wj(t+1) = wj(t) , otherwise Amount of influence depends on the distance to the centre of activation: (amount of change wj?) Kohonen uses the function : z determines the shape of the curve: z small high + sharp z high  wide + flat

Changing weights Simulation by a Gauß-curve Mexican-Hat-Approach 0,5 1 -3 -2 -1 2 3 Simulation by a Gauß-curve Changing Weights by a learning rate (t), going down to zero Weight change:: wj+1(t+1) = wj(t) + hjz(x(t)-wj(t)) , j  z wj+1(t+1) = wj(t) , otherwise Requirements: Pattern input by random! z(t) and z(t) are monotone decreasing functions in t.

SOM Training Kohonen layer input pattern mp Wj find the winner neuron z for an input pattern p (minimal Euclidian distance) adapt weights of connections winner neuron -input neurons neighbours – input neurons

Example Credit Scoring A1: Credit History A2: Debts A3: Collateral A4: Income We do not look at the Classification SOM performs a Clustering

Credit Scoring good = {5,6,9,10,12} average = {3, 8, 13} bad = {1,2,4,7,11,14}

Credit Scoring Pascal tool box (1991) 10x10 neurons 32,000 training steps

Visualisation of a SOM Colour reflects Euclidian distance to input Weights used as coordinates of a neuron Colour reflects cluster NetDemo ColorDemo TSPDemo

Experiment: Pascal Program, 1998 Example TSP Travelling Salesman Problem A salesman has to visit certain cities and will return to his home. Find an optimal route! problem has exponential complexity: (n-1)! routes 31/32 states in Mexico? Experiment: Pascal Program, 1998

Nearest Neighbour: Example Kiel Rostock Berlin Hamburg Hannover Frankfurt Essen Schwerin Some cities in Northern Germany: Initial city is Hamburg Exercise: Put in the coordinates of the capitals of all the 31 Mexican States + Mexico/City. Find a solution for the TSP using a SOM!

Draw a neuron at position: SOM solves TSP Kohonen layer input Draw a neuron at position: (x,y)=(w1i,w2i) w1i= six X w2i= siy Y

SOM solves TSP Initialisation of weights: weights to input (x,y) are calculated so that all neurons form a circle The initial circle will be expanded to a round trip Solutions for problems of several hundreds of towns are possible Solution may be not optimal!

Applications Data Mining - Clustering Customer Data Weblog ... You have a lot of data, but no teaching data available – unsupervised learning you have at least an idea about the result Can be applied as a first approach to get some training data for supervised learning

Applications Pattern recognition (text, numbers, faces): number plates, access at cash automata, Similarities between molecules Checking the quality of a surface Control of autonomous vehicles Monitoring of credit card accounts Data Mining

Applications Speech recognition Control of artificial limbs classification of galaxies Product orders (Supermarket) Forecast of energy consumption Stock value forecast

Application - Summary Classification Clustering Forecast Pattern recognition Learning by examples, generalization Recognition of not known structures in large data

Application Data Mining: Customer Data Weblog Control of ... Pattern Recognition Quality of surfaces possible if you have training data ...

The End