Artificial Neural Networks

Slides:



Advertisements
Similar presentations
Artificial Neural Networks
Advertisements

© Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems Introduction.
Artificial Intelligence 12. Two Layer ANNs
Multi-Layer Perceptron (MLP)
Computational Intelligence Winter Term 2011/12 Prof. Dr. Günter Rudolph Lehrstuhl für Algorithm Engineering (LS 11) Fakultät für Informatik TU Dortmund.
Beyond Linear Separability
Backpropagation Learning Algorithm
There are two basic categories: There are two basic categories: 1. Feed-forward Neural Networks These are the nets in which the signals.
Slides from: Doug Gray, David Poole
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
1 Neural networks. Neural networks are made up of many artificial neurons. Each input into the neuron has its own weight associated with it illustrated.
Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Classification Neural Networks 1
Machine Learning Neural Networks
Artificial Neural Networks
Simple Neural Nets For Pattern Classification
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.
Chapter 6: Multilayer Neural Networks
Machine Learning Motivation for machine learning How to set up a problem How to design a learner Introduce one class of learners (ANN) –Perceptrons –Feed-forward.
Data Mining with Neural Networks (HK: Chapter 7.5)
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
CS 484 – Artificial Intelligence
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Artificial Neural Networks
Artificial Neural Networks
Advanced information retreival Chapter 02: Modeling - Neural Network Model Neural Network Model.
Neural Networks AI – Week 23 Sub-symbolic AI Multi-Layer Neural Networks Lee McCluskey, room 3/10
ANNs (Artificial Neural Networks). THE PERCEPTRON.
CS464 Introduction to Machine Learning1 Artificial N eural N etworks Artificial neural networks (ANNs) provide a general, practical method for learning.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 16: NEURAL NETWORKS Objectives: Feedforward.
Classification / Regression Neural Networks 2
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Artificial Intelligence Techniques Multilayer Perceptrons.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 31: Feedforward N/W; sigmoid.
Non-Bayes classifiers. Linear discriminants, neural networks.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
1 Lecture 6 Neural Network Training. 2 Neural Network Training Network training is basic to establishing the functional relationship between the inputs.
Introduction to Neural Networks Introduction to Neural Networks Applied to OCR and Speech Recognition An actual neuron A crude model of a neuron Computational.
EEE502 Pattern Recognition
Previous Lecture Perceptron W  t+1  W  t  t  d(t) - sign (w(t)  x)] x Adaline W  t+1  W  t  t  d(t) - f(w(t)  x)] f’ x Gradient.
Bab 5 Classification: Alternative Techniques Part 4 Artificial Neural Networks Based Classifer.
Learning: Neural Networks Artificial Intelligence CMSC February 3, 2005.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
Neural networks.
Fall 2004 Backpropagation CS478 - Machine Learning.
Artificial Neural Networks
Learning with Perceptrons and Neural Networks
第 3 章 神经网络.
CSE 473 Introduction to Artificial Intelligence Neural Networks
Classification with Perceptrons Reading:
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
CSE P573 Applications of Artificial Intelligence Neural Networks
CS621: Artificial Intelligence
Machine Learning Today: Reading: Maria Florina Balcan
Classification Neural Networks 1
Artificial Intelligence Chapter 3 Neural Networks
CSE 573 Introduction to Artificial Intelligence Neural Networks
Neural Network - 2 Mayank Vatsa
Lecture Notes for Chapter 4 Artificial Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Outline Announcement Neural networks Perceptrons - continued
Presentation transcript:

Artificial Neural Networks

Artificial Neural Networks Interconnected networks of simple units ("artificial neurons"). Weight wij is the weight of the ith input into unit j. Depending on the final output value we determine the class. If more than one output units then we choose the one with the greatest value. Learning takes place by adjusting the weights in the network so that the desired output is produced whenever a training instance is presented.

Single Perceptron Unit We start by looking at a simpler kind of "neural-like" unit called a perceptron. Depending on the value of h(x) it outputs one class or the other.

Beyond Linear Separability Values of the XOR boolean function cannot be separated by a single perceptron unit.

Multi-Layer Perceptron Solution: Combine multiple linear separators. Introduction of "hidden" units into NN make them much more powerful: they are no longer limited to linearly separable problems. Earlier layers transform the problem into more tractable problems for the latter layers.

Example: XOR problem Output: “class 0” or “class 1”

Example: XOR problem

Example: XOR problem w23o2+w13o1+w03=0 w03=-1/2, w13=-1, w23=1 o2-o1-1/2=0

Multi-Layer Perceptron Any set of training points can be separated by a three-layer perceptron network. “Almost any” set of points is separable by a two-layer perceptron network.

Backpropagation technique High level summary: Present a training sample to the neural network. Calculate the error in each output neuron. This is the local error. Adjust the weights of each neuron to lower the local error. Assign "blame" for the local error to neurons at the previous level, giving greater responsibility to neurons connected by stronger weights. Repeat from step 3 on the neurons at the previous level, using each one's "blame" as its error.

Autonomous Land Vehicle In a Neural Network (ALVINN) ALVINN is an automatic steering system for a car based on input from a camera mounted on the vehicle. Successfully demonstrated in a cross-country trip.

ALVINN The ALVINN neural network is shown here. It has 960 inputs (a 30x32 array derived from the pixels of an image), 4 hidden units and 30 output units (each representing a steering command).

SVMs vs. ANNs Comparable in practice. Some comment: "SVMs have been developed in the reverse order to the development of neural networks (NNs). SVMs evolved from the sound theory to the implementation and experiments, while the NNs followed more heuristic path, from applications and extensive experimentation to the theory.“ (Wang 2005)

Soft Threshold A natural question to ask is whether we could use gradient ascent/descent to train a multi-layer perceptron. The answer is that we can't as long as the output is discontinuous with respect to changes in the inputs and the weights. In a perceptron unit it doesn't matter how far a point is from the decision boundary, we will still get a 0 or a 1. We need a smooth output (as a function of changes in the network weights) if we're to do gradient descent.

Sigmoid Unit Value z is also called the "activation" of a neuron. Commonly used in neural nets is a "sigmoid" (S-like) function (see on the right). The one used here is called the logistic function. Value z is also called the "activation" of a neuron.

Training Key property of the sigmoid is that it is differentiable. This means that we can use gradient based methods of minimization for training. The output of a multi-layer net of sigmoid units is a function of two vectors, the inputs (x) and the weights (w). Well, as we train the ANN the training instances are considered fixed. The output of this function (y) varies smoothly with changes in the weights.

Training

Training ½ is only to simplify the derivations.

As a shorthand, we will denote y(xm,w) just by y. Gradient Descent We follow gradient descent Gradient of the training error is computed as a function of the weights. Online version: We consider each time only the error for one data item As a shorthand, we will denote y(xm,w) just by y.

Gradient Descent – Single Unit Substituting in the equation of previous slide we get (for the arbitrary ith element of w): Delta rule

Derivative of the sigmoid

Generalized Delta Rule For an output unit p we similarly have: y3 z3 y1 z1 y2 z2 1 2 3 x1 x2 w01 w03 w02 w11 w22 w21 w13 w23 w12 p=3 in this example

Backpropagation Example First do forward propagation: Compute zi’s and yi’s. y3 z3 y1 z1 y2 z2 1 2 3 x1 x2 w01 w03 w02 w11 w22 w21 w13 w23 w12 We'll see soon why delta2 and delta3 have these formulas.

Deriving 2 and 2 y3 z3 y1 z1 y2 z2 1 2 3 x1 x2 w01 w03 w02 w11 w22 We similarly derive delta1.

Backpropagation Algorithm Initialize weights to small random values Choose a random sample training item, say (xm, ym) Compute total input zj and output yj for each unit (forward prop) Compute p for output layer p = yp(1-yp)(yp-ym) Compute j for all preceding layers by backprop rule Compute weight change by descent rule (repeat for all weights) Note that each expression involves data local to a particular unit, we don't have to look around summing things over the whole network. It is for this reason, simplicity, locality and, therefore, efficiency that backpropagation has become the dominant paradigm for training neural nets.

Generalized Delta Rule In general, for a hidden unit j we have

Input And Output Encoding For neural networks, all attribute values must be encoded in a standardized manner, taking values between 0 and 1, even for categorical variables. For continuous variables, we simply apply the min-max normalization: X* = [X - min(X)]/[max(X)-min(X)] For categorical variables use indicator (flag) variables. E.g. marital status attribute, containing values single, married, divorced. Records for single would have 1 for single, and 0 for the rest, i.e. (1,0,0) Records for married would have 1 for married, and 0 for the rest, i.e. (0,1,0) Records for divorced would have 1 for divorced, and 0 for the rest, i.e. (0,0,1) Records for unknown would have 0 for all, i.e. (0,0,0) In general, categorical attributes with k values can be translated into k - 1 indicator attributes.

Output Neural network output nodes always return a continuous value between 0 and 1 as output. Many classification problems have a dichotomous result, with only two possible outcomes. E.g., “Meningitis, yes or not" For such problems, one option is to use a single output node, with a threshold value set a priori which would separate the classes. For example, with the threshold of “Yes if output  0.3," an output of 0.4 from the output node would classify that record as likely to be “Yes”. Single output nodes may also be used when the classes are clearly ordered. E.g., suppose that we would like to classify patients’ disease levels. We can say: If 0  output < 0.33, classify “mild” If 0.33  output < 0.66, classify “severe” If 0.66  output < 1, classify “grave”

Multiple Output Nodes If we have unordered categories for the target attribute, we create one output node for each possible category. E.g. for marital status as target attribute, the network would have four output nodes in the output layer, one for each of: single, married, divorced, and unknown. Output node with the highest value is then chosen as the classification for that particular record.

NN for Estimation And Prediction Since NN produce continuous output, they can be used for estimation and prediction. Suppose, we are interested in predicting the price of a stock three months in the future. Presumably, we would have encoded price information using the min-max normalization. However, the neural network would output a value between zero and 1. The min-max normalization needs to be inverted. This denormalization is: prediction = output * (max – min) + min

ANN Example y3 z3 y1 z1 y2 z2 1 2 3 x1 x3 w01 w03 w02 w11 w22 w21 w13

Learning Weights For an output unit p we similarly have: y3 z3 y1 z1 2 3 x1 x3 w01 w03 w02 w11 w22 w21 w13 w23 w12 x2 w32 w31 p=3 in this example

Backpropagation First do forward propagation: Compute zi’s and yi’s. 1 2 3 x1 x3 w01 w03 w02 w11 w22 w21 w13 w23 w12 x2 w32 w31

Backpropagation Example First do forward propagation: Compute zi’s and yi’s. Suppose we have initially chosen (randomly) the weights given in the table. Also, in the table is given one training instance (first column). y3 z3 y1 z1 y2 z2 1 2 3 x1 x3 w01 w03 w02 w11 w22 w21 w13 w23 w12 x2 w32 w31 x0 = 1.0 w01 = 0.5 w02 = 0.7 w03 = 0.5 x1 = 0.4 w11 = 0.6 w12 = 0.9 w13 = 0.9 x2 = 0.2 w21 = 0.8 w22 = 0.8 w23 = 0.9 x3 = 0.7 w31 = 0.6 w32 = 0.4

Feed-Forward Example z1 = 1.0*0.5+0.4*0.6+0.2*0.8+0.7*0.6 = 1.32 y1 = 1/(1+e^(-z1)) = 1/(1+e(-1.32)) = 0.7892 z2 = 1.0*0.7+0.4*0.9+0.2*0.8+0.7*0.4 = 1.5 y2 = 1/(1+e^(-z2)) = 1/(1+e(-1.5)) = 0.8175 z3 = 1.0*0.5+ 0.79*0.9+ 0.82*0.9= 1.95 y3 = 1/(1+e^(-z3)) = 1/(1+e(-1.95)) = 0.87 y3 z3 3 w03 1 w13 w23 y1 z1 y2 z2 1 2 w31 w12 w02 w01 w21 w11 w22 w32 1 1 x2 x3 x1 x0 = 1.0 w01 = 0.5 w02 = 0.7 w03 = 0.5 x1 = 0.4 w11 = 0.6 w12 = 0.9 w13 = 0.9 x2 = 0.2 w21 = 0.8 w22 = 0.8 w23 = 0.9 x3 = 0.7 w31 = 0.6 w32 = 0.4

Backpropagation So, the network output, for the given training example, is y3=0.87. Assume the actual value of the target attribute is y=0.8 Then the prediction error equals 0.8 – 0.8750 = -0.075. Now 3 = y3(1-y3)(y3-y) = 0.87*(1-0.87)*(0.87-0.8) = 0.008 Let’s have a learning rate of =0.01. Then, we update weights: w03 = w03 -  3 (1) = 0.5 - 0.01*0.008*1 = 0.49918 w13 = w13 -  3 y1 = 0.9 - 0.01*0.008* 0.7892 = 0.8999 w23 = w23 -  3 y2 = 0.9 - 0.01*0.008* 0.8175 = 0.8999

Backpropagation 2 = y2(1-y2)3w23 = 0.8175*(1-0.8175)*0.008*0.9 = 0.001 1 = y1(1-y1)3w13 = 0.7892*(1- 0.7892)*0.008*0.9 = 0.0012 Then, we update weights: w02 = w02 -  2 (1) = 0.7 - 0.01*0.001*1 = 0.6999 w12 = w12 -  2 x1 = 0.9 - 0.01*0.001* 0.4 = 0.8999 w22 = w22 -  2 x2 = 0.8 - 0.01*0.001* 0.2 = 0.7999 w32 = w32 -  2 x3 = 0.4 - 0.01*0.001* 0.7 = 0.3999 w01 = w01 -  1 (1) = 0.5 - 0.01*0.001*1 = 0.4999 w11 = w11 -  1 x1 = 0.6 - 0.01*0.001* 0.4 = 0.5999 w21 = w21 -  1 x2 = 0.8 - 0.01*0.001* 0.2 = 0.7999 w31 = w31 -  1 x3 = 0.6 - 0.01*0.001* 0.7 = 0.5999