CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.

Slides:



Advertisements
Similar presentations
Multi-Layer Perceptron (MLP)
Advertisements

Backpropagation Learning Algorithm
Slides from: Doug Gray, David Poole
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
1 Neural networks. Neural networks are made up of many artificial neurons. Each input into the neuron has its own weight associated with it illustrated.
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Machine Learning Neural Networks
Artificial Neural Networks
Lecture 14 – Neural Networks
Simple Neural Nets For Pattern Classification
RBF Neural Networks x x1 Examples inside circles 1 and 2 are of class +, examples outside both circles are of class – What NN does.
The back-propagation training algorithm
20.5 Nerual Networks Thanks: Professors Frank Hoffmann and Jiawei Han, and Russell and Norvig.
Back-Propagation Algorithm
Before we start ADALINE
Data Mining with Neural Networks (HK: Chapter 7.5)
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Artificial Neural Network
Radial-Basis Function Networks
Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences
Multiple-Layer Networks and Backpropagation Algorithms
Artificial Neural Networks
Neural NetworksNN 11 Neural netwoks thanks to: Basics of neural network theory and practice for supervised and unsupervised.
1 Artificial Neural Networks Sanun Srisuk EECP0720 Expert Systems – Artificial Neural Networks.
CS464 Introduction to Machine Learning1 Artificial N eural N etworks Artificial neural networks (ANNs) provide a general, practical method for learning.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 16: NEURAL NETWORKS Objectives: Feedforward.
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Artificial Intelligence Techniques Multilayer Perceptrons.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 31: Feedforward N/W; sigmoid.
Multi-Layer Perceptron
Linear Discrimination Reading: Chapter 2 of textbook.
Non-Bayes classifiers. Linear discriminants, neural networks.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
CS621 : Artificial Intelligence
SUPERVISED LEARNING NETWORK
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
EEE502 Pattern Recognition
Hazırlayan NEURAL NETWORKS Backpropagation Network PROF. DR. YUSUF OYSAL.
Neural Network and Deep Learning 王强昌 MLA lab.
Neural Networks 2nd Edition Simon Haykin
Previous Lecture Perceptron W  t+1  W  t  t  d(t) - sign (w(t)  x)] x Adaline W  t+1  W  t  t  d(t) - f(w(t)  x)] f’ x Gradient.
Artificial Intelligence Methods Neural Networks Lecture 3 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with.
Bab 5 Classification: Alternative Techniques Part 4 Artificial Neural Networks Based Classifer.
Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
Neural networks.
Multiple-Layer Networks and Backpropagation Algorithms
Fall 2004 Backpropagation CS478 - Machine Learning.
Artificial Neural Networks
Chapter 2 Single Layer Feedforward Networks
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
CS621: Artificial Intelligence
CSC 578 Neural Networks and Deep Learning
Classification Neural Networks 1
Chapter 3. Artificial Neural Networks - Introduction -
Artificial Neural Network & Backpropagation Algorithm
Neuro-Computing Lecture 4 Radial Basis Function Network
of the Artificial Neural Networks.
Neural Network - 2 Mayank Vatsa
Multilayer Perceptron & Backpropagation
Capabilities of Threshold Neurons
Lecture Notes for Chapter 4 Artificial Neural Networks
Neuro-Computing Lecture 2 Single-Layer Perceptrons
Computer Vision Lecture 19: Object Recognition III
CS621: Artificial Intelligence Lecture 22-23: Sigmoid neuron, Backpropagation (Lecture 20 and 21 taken by Anup on Graphical Models) Pushpak Bhattacharyya.
Presentation transcript:

CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.

The Perceptron  Binary classifier functions  Threshold activation function

What is here to learn? The Perceptron Training Rule

The Perceptron: Threshold Activation Function  Binary classifier functions  Threshold activation function

 One way to learn an acceptable weight vector is to begin with random weights  Then iteratively apply the perceptron weights to each training example  Modifying the perceptron weights whenever it misclassifies an example  This process is repeated, iterating through the training examples as many times as needed until the perceptron classifies all training examples correctly.  Weights are modified at each step according to the perceptron training rule, which revises the weight associated with input The Perceptron Training Rule

Assuming the problem is linearly separable, there is a learning rule that converges in a finite time Motivation A new (unseen) input pattern that is similar to an old (seen) input pattern is likely to be classified correctly The Perceptron Training Rule

 Basic Idea – go over all existing data patterns, whose labeling is known, and check their classification with a current weight vector  If correct, continue  If not, add to the weights a quantity that is proportional to the product of the input pattern with the desired output Z (1 or –1) The Perceptron Training Rule

Weight Update Rule

 The delta training rule is best understood by considering the task of training an unthresholded perceptron; that is, a linear unit for which the output o is given by Gradient Descent and Delta Rule

In order to derive a weight learning rule for linear units, let us begin by specifying a measure for the training error of a hypothesis (weight vector), relative to the training examples.

Derivation GDR The vector derivative is called the gradient of E with respect to The gradient specifies the direction that produces the steepest increase in E. The negative of this vector therefore gives the direction of steepest decrease. The training rule for gradient descent is

Derivation of GDR … The negative sign is presented because we want to move the weight vector in the direction that decreases E. This training rule can also written in its component form which makes it clear that steepest descent is achieved by altering each component of in proportion to.

The vector of derivatives that form the gradient can be obtained by differentiating E The weight update rule for standard gradient descent can be summarized as Derivation of GDR …

AB problem

XOR problem

1

2

1 1

2 2

1 2 AND

xnxn x1x1 x2x2 Input Output Three-layer networks Hidden layers

Feed-forward layered network Output layer 2 nd hidden layer 1 st hidden layer Input layer

Different Non-Linearly Separable Problems Structure Types of Decision Regions Exclusive-OR Problem Class Separation Most General Region Shapes Single-Layer Two-Layer Three-Layer Half Plane Bounded By Hyperplane Convex Open Or Closed Regions Arbitrary (Complexity Limited by No. of Nodes) A AB B A AB B A AB B B A B A B A

In the perceptron/single layer nets, we used gradient descent on the error function to find the correct weights:  w ji = (t j - y j ) x i We see that errors/updates are local to the node i.e. the change in the weight from node i to output j (w ji ) is controlled by the input that travels along the connection and the error signal from output j x1x1 (t j - y j )

x1x1 x2x2 But with more layers how are the weights for the first 2 layers found when the error is computed for layer 3 only? There is no direct error signal for the first layers!!!!! ?

Objective of Multilayer NNet x1x1 x2x2 xnxn w1w1 w2w2 wmwm x = Training set Goalfor all k

Learn the Optimal Weight Vector w1w1 w2w2 wmwm x1x1 x2x2 xnxn x = Training set Goalfor all k

First Complex NNet Algorithm  Multilayer feedforward NNet

Training: Backprop algorithm  Searches for weight values that minimize the total error of the network over the set of training examples.  Repeated procedures of the following two passes:  Forward pass: Compute the outputs of all units in the network, and the error of the output layers.  Backward pass: The network error is used for updating the weights (credit assignment problem).  Starting at the output layer, the error is propagated backwards through the network, layer by layer. This is done by recursively computing the local gradient of each neuron.

 Back-propagation training algorithm illustrated:  Backprop adjusts the weights of the NN in order to minimize the network total mean squared error. Network activation Error computation Forward Step Error propagation Backward Step

Learning Algorithm: Backpropagation Learning Algorithm: Backpropagation Pictures below illustrate how signal is propagating through the network, Symbols w (xm)n represent weights of connections between network input x m and neuron n in input layer. Symbols y n represents output signal of neuron n.

Learning Algorithm: Backpropagation Learning Algorithm: Backpropagation

Propagation of signals through the hidden layer. Symbols w mn represent weights of connections between output of neuron m and input of neuron n in the next layer.

Learning Algorithm: Backpropagation Learning Algorithm: Backpropagation

Propagation of signals through the output layer.

Learning Algorithm: Backpropagation Learning Algorithm: Backpropagation In the next algorithm step the output signal of the network y is compared with the desired output value (the target), which is found in training data set. The difference is called error signal d of output layer neuron

Learning Algorithm: Backpropagation Learning Algorithm: Backpropagation The idea is to propagate error signal d (computed in single teaching step) back to all neurons, which output signals were input for discussed neuron.

Learning Algorithm: Backpropagation Learning Algorithm: Backpropagation The idea is to propagate error signal d (computed in single teaching step) back to all neurons, which output signals were input for discussed neuron.

Learning Algorithm: Backpropagation Learning Algorithm: Backpropagation The weights' coefficients w mn used to propagate errors back are equal to this used during computing output value. Only the direction of data flow is changed (signals are propagated from output to inputs one after the other). This technique is used for all network layers. If propagated errors came from few neurons they are added. The illustration is below:

Learning Algorithm: Backpropagation Learning Algorithm: Backpropagation When the error signal for each neuron is computed, the weights coefficients of each neuron input node may be modified. In formulas below df(e)/de represents derivative of neuron activation function (which weights are modified).

Learning Algorithm: Backpropagation Learning Algorithm: Backpropagation When the error signal for each neuron is computed, the weights coefficients of each neuron input node may be modified. In formulas below df(e)/de represents derivative of neuron activation function (which weights are modified).

Learning Algorithm: Backpropagation Learning Algorithm: Backpropagation When the error signal for each neuron is computed, the weights coefficients of each neuron input node may be modified. In formulas below df(e)/de represents derivative of neuron activation function (which weights are modified).

y 11 11 22 22 mm mm x1x1 x2x2 xnxn w1w1 w2w2 wmwm x = Single-Hidden Layer NNet Hidden Units

y 11 11 22 22 mm mm x1x1 x2x2 xnxn w1w1 w2w2 wmwm x = Radial Basis Function Networks Hidden Units

Non-Linear Models Adjusted by the Learning process Weights

Typical Radial Functions  Gaussian  Hardy Multiquadratic  Inverse Multiquadratic

Gaussian Basis Function (  =0.5,1.0,1.5)

+ + Most General RBF +

The Topology of RBF NNet Feature Vectors x1x1 x2x2 xnxn y1y1 ymym Inputs Hidden Units Output Units Subclasses Classes

Radial Basis Function Networks x1x1 x2x2 xnxn w1w1 w2w2 wmwm x = Training set Goalfor all k

Learn the Optimal Weight Vector w1w1 w2w2 wmwm x1x1 x2x2 xnxn x = Training set Goalfor all k

Regularization Training set Goalfor all k If regularization is not needed, set

Learn the Optimal Weight Vector Minimize

Learning Kernel Parameters x1x1 x2x2 xnxn y1y1 ymym 11 22 ll w 11 w 12 w1lw1l wm1wm1 wm2wm2 w ml Training set Kernels

What to Learn?  Weights w ij ’s  Centers  j ’s of  j ’s  Widths  j ’s of  j ’s  Number of  j ’s x1x1 x2x2 xnxn y1y1 ymym 11 22 ll w 11 w 12 w1lw1l wm1wm1 wm2wm2 w ml

Two-Stage Training x1x1 x2x2 xnxn y1y1 ymym 11 22 ll w 11 w 12 w1lw1l wm1wm1 wm2wm2 w ml Step 1 Step 2 Determines  Centers  j ’s of  j ’s.  Widths  j ’s of  j ’s.  Number of  j ’s. Determines  Centers  j ’s of  j ’s.  Widths  j ’s of  j ’s.  Number of  j ’s. Determines w ij ’s.

Learning Rule  Backpropagation learning rule will apply in RBF also.

Three-layer RBF neural network

Deep NNet

Deep Supervised Learning

Convolution NNet – Example of DNNet

Next Class – Deep Learning (CNN)