Introduction to Neural Networks

Introduction to Neural Networks
Holger Wittel

Outline Motivation Neural Networks & Topologies
Perceptron Multi-layered networks Training Time dependence Neural Networks for Filters Example usages Summary

Motivation Computers are fast, still certain tasks are difficult for them Your brain took ~100 cycles (0.1s) to classify the picture as ‘cat‘, what can your computer do in 100 cycles? brain computer No. of processing units 1011 109 Type of processing units neurons transistors Type of calculation massively parallel usually serial Switching time 10-3 sec 10-9 sec Possible switching operations 1014 per sec 1018 per sec Table taken from: D.Kriesel: Neuronal Netze,

A real Neuron Dendrite Axon terminal Node of Ranvier Soma Axon # neurons roundworm 302 ant 104 fly 105 mouse 5·106 cat 3·108 human 1011 elephant 2·1011 Schwann cell Myelin sheath Nucleus SEM image of a neuron, cut open to show sacks of neurotransmitters Images:

Neuron Model ..... Out = resp(Σ inputi · wi) Input1 Input2 Inputi
weight w1 Input2 weight w2 response function Out = resp(Σ inputi · wi) ..... weight wi Inputi Most important response functions: Linear: resp(x)= x Sigmoid: resp(x)= 𝑒 −𝑥 ReLU: resp(x)=max(0, x) Sgn: resp(x) = +1 or -1

The Perceptron / Input Neurons
Output sigmoid → use for classification The magic happens in the connections & weights! Input neurons Output neurons weight w1 weight w2 Output =response(Σ inputi · wi)

The Bias Neuron -1 Input neurons Output neurons
No inputs, constantly outputs bias Often omitted in drawings Implement as input neuron Input neurons Output neurons weight w1 weight w2 weight w3 -1

Perceptron Categorises input vectors as being in one of two categories
A single perceptron can be trained to separate inputs into two linearly separable categories Category 1 Category 2

Training a Perceptron Need a training set of input/output pairs
Initialise weights and bias (randomly or to zero) Calculate output Adjust the weights and bias in proportion to the difference between actual and expected values Repeat until termination criteria is reached Rosenblatt (1962) showed that the weights and bias will converge to fixed values after a finite number of iterations (if the categories are linearly separable)

Perceptron Example We want to classify points in R2 into those points for which y≥x+1 and those for which y<x+1 y y=x+1 x

Perceptron Example Initialise bias/weight vector to (0,0,0)
Input is the point (-1,-1) (below the line) – expressed as (1,-1,-1) s = 0x1+0x-1+0x-1 = 0 Actual output is sgn(0) = +1 Expected output is -1 (below the line)

Perceptron Example Error (expected-actual) is -2
Constant learning rate of 0.25 So new weight vector is (0,0,0)

Constant learning rate of 0.25 So new weight vector is (0,0,0) (-2)

Constant learning rate of 0.25 So new weight vector is (0,0,0) (-2)(1,-1,-1)

Constant learning rate of 0.25 So new weight vector is (0,0,0) (-2)(1,-1,-1) = (-0.5,0.5,0.5)

Perceptron Example New bias/weight vector is (-0.5,0.5,0.5)
Input is the point (0,2) (above the line) – expressed as (1,0,2) s = -0.5x1+0.5x0+0.5x2 = 0.5 Actual output is sgn(0.5) = +1 Expected output is +1 (above the line) – no change to weight

Perceptron Example Eventually, this will converge to the correct answer of (-a,-a,a) for some a>0 Generally, we won’t know the correct answer!

History: OR / XOR 1958: Perceptron
1969: Minsky & Papert: Perceptron can only classify linearly separable data This killed nearly all research on NNs for the next 15 years No XOR as popular example: XOR i1 i2 or xor 1 OR 1 1 1 1

Multi-layered Networks
Adding a middle (‘hidden‘) layer solves the problem! At least one nonlinear layer A 3-layered perceptron can represent virtually anything! -1 -1

Feed forward network (fully connected)
Network Topologies Feed forward network (fully connected) with shortcuts -1 -1

Recurrent network with lateral connections
Network Topologies Recurrent network with lateral connections -1 -1

Recurrent network with direct feedbacks
Network Topologies Recurrent network with direct feedbacks -1 -1

Recurrent network with indirect feedback
Network Topologies Recurrent network with indirect feedback -1 -1

Fully connected network (omitted some connections)
Network Topologies Fully connected network (omitted some connections)

Three-Layer Perceptron
We consider a 3-layer feed forward Perceptron By far most used NN in practice -1 -1

Training How to determine the right connection weights?
Prepare training data set: inputs & desired outputs Optimize weights : Backpropagation Standard text book method Gradient based Useful for adaptive networks May get stuck in local minimum local minimum Genetic algorithm Biology inspired Finds global minimum May be slow global minimum

Back-Propagation . Operates similarly to perceptron learning Input
Output

Back-Propagation . Inputs are fed forward through the network Input
Output

Back-Propagation . Inputs are fed forward through the network Input
Output Compare to expected

Back-Propagation Errors are propagated back Input . Output

Back-Propagation Adjust weights based on errors Input . Output

Back-Propagation Training
Weights might be updated after each pass or after multiple passes Need a comprehensive training set Network cannot be too large for the training set No guarantees the network will learn Network design and learning strategies impact the speed and effectiveness of learning

Excurs: Genetic Algorithms
Start with random population Evaluate each individual & select best Make next generation: Migration – copy best individuals Mutation – copy best and randomly change some attributes Crossover – copy best and mix their attributes Next Generation

Uses of NNs abcdef ghijk ? Pattern recognition: Predicting the future
Face recognition Character recognition Sound recognition Predicting the future Stock market Weather Signal processing abcdef ghijk ? DAX today

Example: NETTALK Written language → spoken language
Nontrivial problem: pronounce ‘a‘ in have, brave, read Simple Network: 3-layer perceptron After words trained Reading a new text Training Ref: Sejnowsk, Rosenberg: ‘Parallel Networks that Learn to Pronounce English Text’, Complex Systems 1 (1987)

Problems with Neural Networks
Interpretation of Hidden Layers Overfitting

Interpretation of Hidden Layers
What are the hidden layers doing?! Feature Extraction The non-linearities in the feature extraction can make interpretation of the hidden layers very difficult. This leads to Neural Networks being treated as black boxes.

Overfitting in Neural Networks
Neural Networks are especially prone to overfitting. Recall Perceptron Error Zero error is possible, but so is more extreme overfitting Logistic Regression Perceptron

Perceptron vs. Logistic Regression
Logistic Regression has a hard time eliminating all errors Errors: 2 Perceptrons often do better. Increased importance of errors. Errors: 0

Introduction to Neural Networks

Similar presentations

Presentation on theme: "Introduction to Neural Networks"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Introduction to Neural Networks

Similar presentations

Presentation on theme: "Introduction to Neural Networks"— Presentation transcript:

Similar presentations

About project

Feedback