Machine Learning: Lecture 4

Slides:



Advertisements
Similar presentations
Artificial Neural Networks
Advertisements

Multi-Layer Perceptron (MLP)
Slides from: Doug Gray, David Poole
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
1 Neural networks. Neural networks are made up of many artificial neurons. Each input into the neuron has its own weight associated with it illustrated.
Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.
Machine Learning Neural Networks
Simple Neural Nets For Pattern Classification
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
The back-propagation training algorithm
Connectionist Modeling Some material taken from cspeech.ucd.ie/~connectionism and Rich & Knight, 1991.
Back-Propagation Algorithm
Artificial Neural Networks
Artificial Neural Networks
LOGO Classification III Lecturer: Dr. Bo Yuan
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
CS 4700: Foundations of Artificial Intelligence
September 28, 2010Neural Networks Lecture 7: Perceptron Modifications 1 Adaline Schematic Adjust weights i1i1i1i1 i2i2i2i2 inininin …  w 0 + w 1 i 1 +
CS 484 – Artificial Intelligence
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Neural networks.
Artificial neural networks:
1 Introduction to Artificial Neural Networks Andrew L. Nelson Visiting Research Faculty University of South Florida.
Neural Networks. Plan Perceptron  Linear discriminant Associative memories  Hopfield networks  Chaotic networks Multilayer perceptron  Backpropagation.
Artificial Neural Networks
Artificial Neural Networks (ANN). Output Y is 1 if at least two of the three inputs are equal to 1.
Computer Science and Engineering
Artificial Neural Networks
1 Artificial Neural Networks Sanun Srisuk EECP0720 Expert Systems – Artificial Neural Networks.
CS464 Introduction to Machine Learning1 Artificial N eural N etworks Artificial neural networks (ANNs) provide a general, practical method for learning.
Machine Learning Chapter 4. Artificial Neural Networks
Artificial Neural Network Supervised Learning دكترمحسن كاهاني
 Diagram of a Neuron  The Simple Perceptron  Multilayer Neural Network  What is Hidden Layer?  Why do we Need a Hidden Layer?  How do Multilayer.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi.
Artificial Intelligence Chapter 3 Neural Networks Artificial Intelligence Chapter 3 Neural Networks Biointelligence Lab School of Computer Sci. & Eng.
Non-Bayes classifiers. Linear discriminants, neural networks.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
Neural Networks - lecture 51 Multi-layer neural networks  Motivation  Choosing the architecture  Functioning. FORWARD algorithm  Neural networks as.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
Artificial Neural Network
IE 585 History of Neural Networks & Introduction to Simple Learning Rules.
Perceptrons Michael J. Watts
Artificial Intelligence CIS 342 The College of Saint Rose David Goldschmidt, Ph.D.
Chapter 6 Neural Network.
Bab 5 Classification: Alternative Techniques Part 4 Artificial Neural Networks Based Classifer.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Machine Learning Supervised Learning Classification and Regression
Neural networks.
Fall 2004 Backpropagation CS478 - Machine Learning.
Artificial neural networks:
第 3 章 神经网络.
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Artificial Neural Networks
Machine Learning Today: Reading: Maria Florina Balcan
CSC 578 Neural Networks and Deep Learning
Data Mining with Neural Networks (HK: Chapter 7.5)
Synaptic DynamicsII : Supervised Learning
Artificial Intelligence Chapter 3 Neural Networks
Perceptron as one Type of Linear Discriminants
Artificial Neural Networks
CS 621 Artificial Intelligence Lecture 25 – 14/10/05
Artificial Intelligence Chapter 3 Neural Networks
Machine Learning: UNIT-2 CHAPTER-1
Artificial Intelligence Chapter 3 Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Seminar on Machine Learning Rada Mihalcea
CS621: Artificial Intelligence Lecture 22-23: Sigmoid neuron, Backpropagation (Lecture 20 and 21 taken by Anup on Graphical Models) Pushpak Bhattacharyya.
Artificial Intelligence Chapter 3 Neural Networks
Presentation transcript:

Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)

What is an Artificial Neural Network? It is a formalism for representing functions inspired from biological systems and composed of parallel computing units which each compute a simple function. Some useful computations taking place in Feedforward Multilayer Neural Networks are: Summation Multiplication Threshold (e.g., 1/(1+e ) [the sigmoidal threshold function]. Other functions are also possible -x

Biological Motivation Biological Learning Systems are built of very complex webs of interconnected neurons. Information-Processing abilities of biological neural systems must follow from highly parallel processes operating on representations that are distributed over many neurons ANNs attempt to capture this mode of computation

Multilayer Neural Network Representation Examples: Input Units Hidden Units Output units weights Heteroassociation Autoassociation

How is a function computed by a Multilayer Neural Network? hj=g(wji.xi) y1=g(wkj.hj) where g(x)= 1/(1+e ) Typically, y1=1 for positive example and y1=0 for negative example i j -x x1 x2 x3 x4 x5 x6 h1 h2 h3 y1 k j i wji’s wkj’s g (sigmoid): 1/2 1

Learning in Multilayer Neural Networks Learning consists of searching through the space of all possible matrices of weight values for a combination of weights that satisfies a database of positive and negative examples (multi-class as well as regression problems are possible). Note that a Neural Network model with a set of adjustable weights defines a restricted hypothesis space corresponding to a family of functions. The size of this hypothesis space can be increased or decreased by increasing or decreasing the number of hidden units present in the network.

Appropriate Problems for Neural Network Learning Instances are represented by many attribute-value pairs (e.g., the pixels of a picture. ALVINN [Mitchell, p. 84]). The target function output may be discrete-valued, real-valued, or a vector of several real- or discrete-valued attributes. The training examples may contain errors. Long training times are acceptable. Fast evaluation of the learned target function may be required. The ability for humans to understand the learned target function is not important.

History of Neural Networks 1943: McCulloch and Pitts proposed a model of a neuron --> Perceptron (read [Mitchell, section 4.4 ]) 1960s: Widrow and Hoff explored Perceptron networks (which they called “Adelines”) and the delta rule. 1962: Rosenblatt proved the convergence of the perceptron training rule. 1969: Minsky and Papert showed that the Perceptron cannot deal with nonlinearly-separable data sets---even those that represent simple function such as X-OR. 1970-1985: Very little research on Neural Nets 1986: Invention of Backpropagation [Rumelhart and McClelland, but also Parker and earlier on: Werbos] which can learn from nonlinearly-separable data sets. Since 1985: A lot of research in Neural Nets!

Backpropagation: Purpose and Implementation Purpose: To compute the weights of a feedforward multilayer neural network adaptatively, given a set of labeled training examples. Method: By minimizing the following cost function (the sum of square error): E= 1/2 n=1 k=1[yk-fk(x )] where N is the total number of training examples and K, the total number of output units (useful for multiclass problems) and fk is the function implemented by the neural net N K n 2

Backpropagation: Overview Backpropagation works by applying the gradient descent rule to a feedforward network. The algorithm is composed of two parts that get repeated over and over until a pre-set maximal number of epochs, EPmax. Part I, the feedforward pass: the activation values of the hidden and then output units are computed. Part II, the backpropagation pass: the weights of the network are updated--starting with the hidden to output weights and followed by the input to hidden weights--with respect to the sum of squares error and through a series of weight update rules called the Delta Rule.

From T. Mitchell, Machine Learning, 1997: pp. 98-99

From T. Mitchell, Machine Learning, 1997: pp. 98-99

Backpropagation: The Delta Rule I For the hidden to output connections (easy case) wkj = - E/wkj =  n=1[yk - fk(x )] g’(hk) Vj =  n=1k Vj with n n N n n N n n  corresponding to the learning rate (an extra parameter of the neural net) hk = j=0 wkj Vj Vj = g(i=0 wjixi) and k = g’(hk)(yk - fk(x )) n M d M is the number of hidden units and d the number of input units

Backpropagation: The Delta Rule II For the input to hidden connections (hard case: no pre-fixed values for the hidden units) wji = - E/wji = - n=1 E/Vj Vj/wji (Chain Rule) =  k,n[yk - fk(x )] g’(hk) wkj g’(hj)xi =  kwkjg’(hj )xi =  n=1j xi with n N hj = i=0 wjixi j = g’(hj ) k=1 wkj k and all the other quantities already defined d n K

Backpropagation: The Algorithm 1. Initialize the weights to small random values; create a random pool of all the training patterns; set EP, the number of epochs of training to 0. 2. Pick a training pattern  from the remaining pool of patterns and propagate it forward through the network. 3. Compute the deltas, k for the output layer. 4. Compute the deltas, j for the hidden layer by propagating the error backward. 5. Update all the connections such that wji = wji + wji and wkj = wkj + wkj 6. If any pattern remains in the pool, then go back to Step 2. If all the training patterns in the pool have been used, then set EP = EP+1, and if EP  EPMax, then create a random pool of patterns and go to Step 2. If EP = EPMax, then stop.   New Old New Old

Backpropagation: The Momentum To this point, Backpropagation has the disadvantage of being too slow if  is small and it can oscillate too widely if  is large. To solve this problem, we can add a momentum to give each connection some inertia, forcing it to change in the direction of the downhill “force”. New Delta Rule: wpq(t+1) = - E/wpq +  wpq(t) where p and q are any input and hidden, or, hidden and outpu units; t is a time step or epoch; and  is the momentum parameter which regulates the amount of inertia of the weights.