One-layer neural networks Approximation problems

Slides:



Advertisements
Similar presentations
Beyond Linear Separability
Advertisements

1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Ch. Eick: More on Machine Learning & Neural Networks Different Forms of Learning: –Learning agent receives feedback with respect to its actions (e.g. using.
Artificial neural networks
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Simple Neural Nets For Pattern Classification
Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods.
Least-Mean-Square Algorithm CS/CMPE 537 – Neural Networks.
November 19, 2009Introduction to Cognitive Science Lecture 20: Artificial Neural Networks I 1 Artificial Neural Network (ANN) Paradigms Overview: The Backpropagation.
November 2, 2010Neural Networks Lecture 14: Radial Basis Functions 1 Cascade Correlation Weights to each new hidden node are trained to maximize the covariance.
Back-Propagation Algorithm
Before we start ADALINE
September 23, 2010Neural Networks Lecture 6: Perceptron Learning 1 Refresher: Perceptron Training Algorithm Algorithm Perceptron; Start with a randomly.
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
CS 4700: Foundations of Artificial Intelligence
September 28, 2010Neural Networks Lecture 7: Perceptron Modifications 1 Adaline Schematic Adjust weights i1i1i1i1 i2i2i2i2 inininin …  w 0 + w 1 i 1 +
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences
Approximating the Algebraic Solution of Systems of Interval Linear Equations with Use of Neural Networks Nguyen Hoang Viet Michal Kleiber Institute of.
Radial Basis Function Networks
Biointelligence Laboratory, Seoul National University
Neural NetworksNN 11 Neural netwoks thanks to: Basics of neural network theory and practice for supervised and unsupervised.
Chapter 11 – Neural Networks COMP 540 4/17/2007 Derek Singer.
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
Lecture 3 Introduction to Neural Networks and Fuzzy Logic President UniversityErwin SitompulNNFL 3/1 Dr.-Ing. Erwin Sitompul President University
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
Artificial Intelligence Chapter 3 Neural Networks Artificial Intelligence Chapter 3 Neural Networks Biointelligence Lab School of Computer Sci. & Eng.
Multi-Layer Perceptron
Akram Bitar and Larry Manevitz Department of Computer Science
ADALINE (ADAptive LInear NEuron) Network and
1 Lecture 6 Neural Network Training. 2 Neural Network Training Network training is basic to establishing the functional relationship between the inputs.
Chapter 2 Single Layer Feedforward Networks
Neural Networks - lecture 51 Multi-layer neural networks  Motivation  Choosing the architecture  Functioning. FORWARD algorithm  Neural networks as.
SUPERVISED LEARNING NETWORK
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Perceptrons Michael J. Watts
Artificial Intelligence Methods Neural Networks Lecture 3 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
1 Technological Educational Institute Of Crete Department Of Applied Informatics and Multimedia Intelligent Systems Laboratory.
Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo.
Lecture 2 Introduction to Neural Networks and Fuzzy Logic President UniversityErwin SitompulNNFL 2/1 Dr.-Ing. Erwin Sitompul President University
Multiple-Layer Networks and Backpropagation Algorithms
Fall 2004 Backpropagation CS478 - Machine Learning.
Supervised Learning in ANNs
Chapter 2 Single Layer Feedforward Networks
第 3 章 神经网络.
Soft Computing Applied to Finite Element Tasks
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Ch 2. Concept Map ⊂ ⊂ Single Layer Perceptron = McCulloch – Pitts Type Learning starts in Ch 2 Architecture, Learning Adaline : Linear Learning.
Disadvantages of Discrete Neurons
Biological and Artificial Neuron
Collaborative Filtering Matrix Factorization Approach
Chapter 3. Artificial Neural Networks - Introduction -
Biological and Artificial Neuron
Artificial Neural Network & Backpropagation Algorithm
of the Artificial Neural Networks.
Artificial Intelligence Chapter 3 Neural Networks
Ch2: Adaline and Madaline
Biological and Artificial Neuron
network of simple neuron-like computing elements
Capabilities of Threshold Neurons
Artificial Intelligence Chapter 3 Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Artificial Intelligence 10. Neural Networks
Chapter - 3 Single Layer Percetron
Artificial Intelligence Chapter 3 Neural Networks
Computer Vision Lecture 19: Object Recognition III
Artificial Intelligence Chapter 3 Neural Networks
Akram Bitar and Larry Manevitz Department of Computer Science
Presentation transcript:

One-layer neural networks Approximation problems Architecture and functioning (ADALINE, MADALINE) Learning based on error minimization The gradient algorithm Widrow-Hoff and “delta” algorithms Neural Networks - lecture 4

Approximation problems Approximation (regression): Problem: estimate a functional dependence between two variables The training set contains pairs of corresponding values Linear approximation Nonlinear approximation Neural Networks - lecture 4

Neural Networks - lecture 4 Architecture One layer NN = one layer of input units and one layer of functional units Fictive unit -1 W X Y Total connectivity Output vector Input vector N input units M functional units (output units Neural Networks - lecture 4

Neural Networks - lecture 4 Functioning Computing the output signal: Usually the activation function is linear Examples: ADALINE (ADAptive LINear Element) MADALINE (Multiple ADAptive LINear Element) Neural Networks - lecture 4

Learning based on error minimization Training set: {(X1,d1),…,(XL,dL)}, Xl - vector from RN, dl – vector from RM Error function: measure of the “distance between the output produced by the network and the desired output Notations: Neural Networks - lecture 4

Learning based on error minimization Learning = optimization task = find W which minimizes E(W) Variants: In the case of linear activation functions W can be computed by using tools from linear algebra In the case of nonlinear functions the minimum can be estimated by using a numerical method Neural Networks - lecture 4

Learning based on error minimization First variant. Particular case: M=1 (one output unit with linear activation function) L=1 (one example) Neural Networks - lecture 4

Learning based on error minimization First variant: Neural Networks - lecture 4

Learning based on error minimization Second variant: use of a numerical minimization method Gradient method: Is an iterative method based on the idea that the gradient of a function indicates the direction on which the function is increasing In order to estimate the minimum of a function the current position is moved in the opposite direction of the gradient Neural Networks - lecture 4

Learning based on error minimization Gradient method: Direction opposite to the gradient Direction opposite to the gradient f’(x)<0 f’(x)>0 xk-1 x1 x0 Neural Networks - lecture 4

Learning based on error minimization Algorithm to minimize E(W) based on the gradient method: Initialization: W(0):=initial values, k:=0 (iteration counter) Iterative process REPEAT W(k+1)=W(k)-eta*grad(E(W(k))) k:=k+1 UNTIL a stopping condition is satisfied Neural Networks - lecture 4

Learning based on error minimization Remark: the gradient method is a local optimization method = it can be easily trapped in local minima Neural Networks - lecture 4

Widrow-Hoff algorithm = learning algorithm for a linear network = it minimizes E(W) by applying a gradient-like adjustment for each example from the training set Gradient computation: Neural Networks - lecture 4

Widrow-Hoff algorithm Algorithm’s structure: Initialization: wij(0):=rand(-1,1) (the weights are randomly initialized in [-1,1]), k:=0 (iteration counter) Iterative process REPEAT FOR l:=1,L DO Compute yi(l) and deltai(l)=di(l)-yi(l), i=1,M Adjust the weights: wij:=wij+eta*deltai(l)*xj(l) Compute the E(W) for the new values of the weights k:=k+1 UNTIL E(W)<E* OR k>kmax Neural Networks - lecture 4

Widrow-Hoff algorithm Remarks: If the error function has only one optimum the algorithm converges (but not in a finite number of steps) to the optimal values of W The convergence speed is influenced by the value of the learning rate (eta) The value E* is a measure of the accuracy we expect to obtain Is one of the simplest learning algorithms but it can by applied only for one-layer networks with linear activation functions Neural Networks - lecture 4

Neural Networks - lecture 4 Delta algorithm = algorithm similar with Widrow-Hoff but for networks with nonlinear activation functions = the only difference is in the gradient computation Gradient computation: Neural Networks - lecture 4

Neural Networks - lecture 4 Delta algorithm Particularities: 1. The error function can have many minima, thus the algorithm can be trapped in one of these (meaning that the learning is not complete) 2. For sigmoidal functions the derivates can be computed in an efficient way by using the following relations Neural Networks - lecture 4

Limits of one-layer networks The one layer networks have limited capability being able only to: Solve simple (e.g. linearly separable) classification problems Approximate simple (e.g. linear) dependences Solution: include hidden layers Remark: the hidden units should have nonlinear activation functions Neural Networks - lecture 4