Neural Networks A neural network is a network of simulated neurons that can be used to recognize instances of patterns. NNs learn by searching through.

Slides:



Advertisements
Similar presentations
Beyond Linear Separability
Advertisements

Slides from: Doug Gray, David Poole
NEURAL NETWORKS Backpropagation Algorithm
Neural networks Introduction Fitting neural networks
Ch. Eick: More on Machine Learning & Neural Networks Different Forms of Learning: –Learning agent receives feedback with respect to its actions (e.g. using.
1 Neural networks. Neural networks are made up of many artificial neurons. Each input into the neuron has its own weight associated with it illustrated.
Artificial Intelligence 13. Multi-Layer ANNs Course V231 Department of Computing Imperial College © Simon Colton.
Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.
Neural Networks  A neural network is a network of simulated neurons that can be used to recognize instances of patterns. NNs learn by searching through.
Handwritten Character Recognition Using Artificial Neural Networks Shimie Atkins & Daniel Marco Supervisor: Johanan Erez Technion - Israel Institute of.
The back-propagation training algorithm
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Connectionist models. Connectionist Models Motivated by Brain rather than Mind –A large number of very simple processing elements –A large number of weighted.
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Presentation on Neural Networks.. Basics Of Neural Networks Neural networks refers to a connectionist model that simulates the biophysical information.
Artificial Neural Networks (ANN). Output Y is 1 if at least two of the three inputs are equal to 1.
Artificial Neural Networks
Artificial Neural Nets and AI Connectionism Sub symbolic reasoning.
Explorations in Neural Networks Tianhui Cai Period 3.
Neural Networks AI – Week 23 Sub-symbolic AI Multi-Layer Neural Networks Lee McCluskey, room 3/10
Chapter 11 – Neural Networks COMP 540 4/17/2007 Derek Singer.
Introduction to Artificial Neural Network Models Angshuman Saha Image Source: ww.physiol.ucl.ac.uk/fedwards/ ca1%20neuron.jpg.
Machine Learning Dr. Shazzad Hosain Department of EECS North South Universtiy
NEURAL NETWORKS FOR DATA MINING
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Artificial Intelligence Techniques Multilayer Perceptrons.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
1 CMSC 671 Fall 2001 Class #25-26 – Tuesday, November 27 / Thursday, November 29.
CSE & CSE6002E - Soft Computing Winter Semester, 2011 Neural Networks Videos Brief Review The Next Generation Neural Networks - Geoff Hinton.
Introduction to Neural Networks Introduction to Neural Networks Applied to OCR and Speech Recognition An actual neuron A crude model of a neuron Computational.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
Artificial Neural Networks Chapter 4 Perceptron Gradient Descent Multilayer Networks Backpropagation Algorithm 1.
Image Source: ww.physiol.ucl.ac.uk/fedwards/ ca1%20neuron.jpg
EEE502 Pattern Recognition
Each neuron has a threshold value Each neuron has weighted inputs from other neurons The input signals form a weighted sum If the activation level exceeds.
Chapter 6 Neural Network.
Learning: Neural Networks Artificial Intelligence CMSC February 3, 2005.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
Neural networks.
Big data classification using neural network
Fall 2004 Backpropagation CS478 - Machine Learning.
CS 388: Natural Language Processing: Neural Networks
Artificial Neural Networks
Supervised Learning in ANNs
The Gradient Descent Algorithm
Learning with Perceptrons and Neural Networks
Learning in Neural Networks
Data Mining, Neural Network and Genetic Programming
Advanced information retreival
Artificial Intelligence (CS 370D)
Real Neurons Cell structures Cell body Dendrites Axon
Neural Networks CS 446 Machine Learning.
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
FUNDAMENTAL CONCEPT OF ARTIFICIAL NETWORKS
General Aspects of Learning
network of simple neuron-like computing elements
What is an artificial neural network?
The use of Neural Networks to schedule flow-shop with dynamic job arrival ‘A Multi-Neural Network Learning for lot Sizing and Sequencing on a Flow-Shop’
Capabilities of Threshold Neurons
Backpropagation.
Ch4: Backpropagation (BP)
Neural networks (1) Traditional multi-layer perceptrons
COSC 4335: Part2: Other Classification Techniques
Computer Vision Lecture 19: Object Recognition III
Prediction Networks Prediction A simple example (section 3.7.3)
David Kauchak CS158 – Spring 2019

Sanguthevar Rajasekaran University of Connecticut
General Aspects of Learning
Ch4: Backpropagation (BP)
Presentation transcript:

Neural Networks A neural network is a network of simulated neurons that can be used to recognize instances of patterns. NNs learn by searching through a space of network weights http://www.cs.unr.edu/~sushil/class/ai/classnotes/glickman/1.pgm.txt

Neural network nodes simulate some properties of real neurons A neuron fires when the sum of its collective inputs reaches a threshold A real neuron is an all-or-none device There are about 10^11 neurons per person Each neuron may be connected with up to 10^5 other neurons There are about 10^16 synapses (300 X characters in library of congress)

Simulated neurons use a weighted sum of inputs A simulated nn node is connected to other nodes via links Each link has an associated weight that determines the strength and nature (+/-) of one nodes influence on another Influence = weight * output Activation function can be a threshold function. Node output is then a 0 or 1 Real neurons do a lot more computation. Spikes, frequency, output…

Feed-forward NNs can model siblings and acquaintances We present the input nodes with a pair of 1’s for the people whose relationship we want to know. All other inputs are 0. Assume that the top group of three are siblings Assume that the bottom group of three are siblings Any pair not siblings are aquaintances H1 and H2 are hidden nodes – their outputs are not observable The network is not fully connected The number inside node is node threshold 1.0 1.0

Search provides a method for finding correct weights In general, link and node roles are obscure because the recognition capability is diffused over a number of nodes and links We can use a simple hill climbing search method to learn NN weights The quality metric is to minimize error

Training a NN with a hill-climber Repeat Present a training example to the network Compute the values at the output nodes Error = difference between observed and NN-computed values Make small changes to weights to reduce the error Until (there are no more training examples);

Back-propagation is well-known hill-climber for NN weight adjustment Back-propagation propagates weight changes in output layer backwards towards input layer. Theoretical guarantee of convergence for smooth error surfaces with one optimum. We need two modifications to neural nets

Nonzero thresholds can be eliminated A node with a non-zero threshold is equivalent to a node with zero threshold and an extra link connected from an output held at -1.0

Hill-climbing benefits from smooth threshold function All-or-none nature produces flat plains and abrupt cliffs in the space of weights – making it difficult to search We use a sigmoid function – squashed S shaped function. Note how the slope changes

A trainable neural net

Intuition for BP Make change in weight proportional to reduction in error at the output nodes For each sample input-combination, consider each output’s desired value (d), its actual computed value (o) and the influence of a particular weight (w) on the error (d – o). Make a large change to w if it leads to a large reduction in error Make a small change to w if it does not significantly reduce a large error

More intuition for BP Consider how we might change the weights of links connecting nodes in layer (i) to layer (j) First: A change in node (j)’s input results in a change in node (j)’s output that depends on the slope of the threshold function Let us therefore make the change in (wij) proportional to slope of sigmoid function. Slope = o (1 – o)

Weight change The change in the input to node, given a change in weight, (wij), depends on the output of node i. Also we need to consider how beneficial it is to change the output of node j, Benefit  β

How beneficial is it to change the output (o) of node j? (oj) Depends on how it effects the outputs at layer k. How do we analyze the effect? Suppose node j is connected to only one node (k) in layer k. Benefit at layer j depends on changes at node k Applying the same reasoning

BP propagates changes back Summing over all nodes in layer k

Stopping the recursion Remember And we now know the benefit at layer j So now: Where does the recursion stop? At the output layer where the benefit is given by the error at the output node!

Putting it all together Benefit at output layer (z) , βz = dz – oz Let us also introduce a rate parameter, r, to give us external control of the learning rate (the size of changes to weights). So Change in wij is proportional to r

Back Propagation weights

Other issues When do you make the changes After every examplar? After all exemplars? After all exemplars is consistent with the mathematics of BP If an output node’s output is close to 1, consider it as 1. Thus, usually we consider that an output node’s output is 1 when it is > 0.9 (or 0.8)

Training NNs with BP

How do we train an NN? Assume exactly two of the inputs are on If the output node value > 0.9, then the people represented by the two on-inputs are acquaintances If the output node value < 0.1, then they are siblinfs

We need training examples to tell us correct outputs (o) so we can calculate output error for BP

Initial Weights usually chosen randomly We initialize the weights as on the right for simplicity For this simple problem randomly choosing the initial weights gives the same performance

Training takes many cycles 225 weight changes Each weight change comes after all sample inputs are presented 225 * 15 = 3375 inputs presented !

Learning rate: r Best value for r depends on the problem being solved

BP can be done in stages

Exemplars in the form of a table

Sequential and parallel learning of multiple concepts

NNs can make predictions Testing and training sets

Training set versus Test set We have divided our sample into a training set and a test set 20% of the data is our test set The NN is trained on the training set only (80% of the data) – it never sees the exemplars in the test set The NN deals successfully on the test set

Excess weights can lead to overfitting How many nodes in the hidden layer ? Too many and you might over-train Too few and you may not get good accuracy How many hidden layers ?

Over-fitting BP requires fewer weight changes (300) versus about 450. However we get poorer performance on test set

Over-fitting To avoid over-fitting: Be sure that the number of trainable weights influencing any particular output is smaller than the number of training samples First net with two hidden nodes: 11 training, 12 weights  ok Second net with three hidden notes: 11 training, 19 weights  overfitting

Like GAs: Using NNs is an art How can you represent information for a neural network? How many neurons? Inputs, outputs, hidden What rate parameter should be used? Sequential or parallel training?