Artificial Neural Networks

Slides:



Advertisements
Similar presentations
Multi-Layer Perceptron (MLP)
Advertisements

Beyond Linear Separability
Slides from: Doug Gray, David Poole
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Artificial Intelligence 13. Multi-Layer ANNs Course V231 Department of Computing Imperial College © Simon Colton.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Reading for Next Week Textbook, Section 9, pp A User’s Guide to Support Vector Machines (linked from course website)
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Classification Neural Networks 1
Machine Learning Neural Networks
Overview over different methods – Supervised Learning
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
The back-propagation training algorithm
Connectionist models. Connectionist Models Motivated by Brain rather than Mind –A large number of very simple processing elements –A large number of weighted.
Back-Propagation Algorithm
Machine Learning Motivation for machine learning How to set up a problem How to design a learner Introduce one class of learners (ANN) –Perceptrons –Feed-forward.
Artificial Neural Networks
Data Mining with Neural Networks (HK: Chapter 7.5)
Artificial Neural Networks
LOGO Classification III Lecturer: Dr. Bo Yuan
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
CS 4700: Foundations of Artificial Intelligence
CS 484 – Artificial Intelligence
Artificial neural networks:
Artificial Neural Networks
Presentation on Neural Networks.. Basics Of Neural Networks Neural networks refers to a connectionist model that simulates the biophysical information.
Computer Science and Engineering
Artificial Neural Networks
Artificial Neural Nets and AI Connectionism Sub symbolic reasoning.
1 Artificial Neural Networks Sanun Srisuk EECP0720 Expert Systems – Artificial Neural Networks.
CS464 Introduction to Machine Learning1 Artificial N eural N etworks Artificial neural networks (ANNs) provide a general, practical method for learning.
Machine Learning Chapter 4. Artificial Neural Networks
START OF DAY 4 Reading: Chap. 3 & 4. Project Topics & Teams Select topics/domains Select teams Deliverables – Description of the problem – Selection.
Introduction to Artificial Neural Network Models Angshuman Saha Image Source: ww.physiol.ucl.ac.uk/fedwards/ ca1%20neuron.jpg.
Machine Learning Dr. Shazzad Hosain Department of EECS North South Universtiy
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Artificial Neural Network Supervised Learning دكترمحسن كاهاني
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 16: NEURAL NETWORKS Objectives: Feedforward.
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Artificial Intelligence Techniques Multilayer Perceptrons.
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi.
Non-Bayes classifiers. Linear discriminants, neural networks.
EE459 Neural Networks Backpropagation
Neural Networks and Backpropagation Sebastian Thrun , Fall 2000.
CS621 : Artificial Intelligence
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Image Source: ww.physiol.ucl.ac.uk/fedwards/ ca1%20neuron.jpg
Artificial Neural Network
1 Perceptron as one Type of Linear Discriminants IntroductionIntroduction Design of Primitive UnitsDesign of Primitive Units PerceptronsPerceptrons.
Chapter 6 Neural Network.
Artificial Intelligence Methods Neural Networks Lecture 3 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Artificial Neural Network. Introduction Robust approach to approximating real-valued, discrete-valued, and vector-valued target functions Backpropagation.
Artificial Neural Network. Introduction Robust approach to approximating real-valued, discrete-valued, and vector-valued target functions Backpropagation.
129 Feed-Forward Artificial Neural Networks AMIA 2003, Machine Learning Tutorial Constantin F. Aliferis & Ioannis Tsamardinos Discovery Systems Laboratory.
Neural networks.
Fall 2004 Backpropagation CS478 - Machine Learning.
Artificial Neural Networks
Artificial Neural Networks
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Artificial Neural Networks
Machine Learning Today: Reading: Maria Florina Balcan
CSC 578 Neural Networks and Deep Learning
Classification Neural Networks 1
Perceptron as one Type of Linear Discriminants
Lecture Notes for Chapter 4 Artificial Neural Networks
Seminar on Machine Learning Rada Mihalcea
Outline Announcement Neural networks Perceptrons - continued
Presentation transcript:

Artificial Neural Networks

Overview Motivation & Goals Perceptron-Learning Gradient Algorithms & the D-Rule Multi Layer Nets The Backpropagation Algorithm Example Application: Recognition of Faces More Network Architectures Application Areas of ANNs

Model: The Brain A complex learning system with simple learning units: the neurons. A network of ~ neurons where each of the neurons has ~ connections. Transmission time of a neuron: ~ (speed versus flexibility) Observation: face recognition time = ~ ® parallelism.

Goals of ANNs input output Learning instead of programming Learning complex functions with simple learning units Parallel computation (e.g. layer model) Network parameter shall be automatically found by a learning algorithm An ANN « black box. input output

When are ANNs used? input output Input instances are described as a vector of discrete or real values The output of a target function is a single value or a vector of discrete or real valued attributes Input data contains noise Target function unknown or difficult to describe input output

The Perceptron (as a NN Unit) (1/2) A linear unit with threshold. S

The Perceptron (as a NN Unit) (2/2)

Geometrical Classification (Decision Surface) A perceptron can classify only linear separable training data. ® We need networks of these units. + - not linear separable Ex. XOR-Function + + - + linear separable Ex. OR-Function 0.5 0.3 0.5

The Perceptron Learning Rule (1/2) Training of a perceptron = Learning the best hypothesis, which classifies all training data A hypothesis = a vector of weights

The Perceptron Learning Rule (2/2) Idea: Initialise the weights with random values Apply the perceptron iterative to each training example and modify the weights according to the learning rule where: · t : target output · o: actual output · h: the learning rate Step 2 is repeated for all training examples until all of them are correctly classified.

The Perceptron-Learning Rule: Convergence The perceptron learning rule converges if: The training examples are linear separable and h is chosen small enough (e.g. 0.1). Intuitive explanation:

The Gradient Descend Algorithm & the D-Rule (1/5) Better: the D-Rule converges even if the training examples are not linear separable. Idea: Use the gradient descend algorithm to search for the best hypothesis in hypothesis space. The best hypothesis is the one which maximally minimises the square error. Þ Basis of the backpropagation algorithm.

The Gradient Descend Algorithm & the D-Rule (2/5) Because of steadiness the D-learning rule is applied on a linear unit instead of on the perceptron. Linear unit: The square error to be minimised: 1 S D: set of training examples : target output of example d : computed output of example d ,where:

The Gradient Descend Algorithm & the D-Rule (3/5) Geometric Interpretation: H-Space, error function (e.g. 2-dimensional).

The Gradient Descend Algorithm & the D-Rule (4/5) Derivation Gradient: Learning Rule:

The Gradient Descend Algorithm & the D-Rule (5/5) Standard methode: Do until termination criterion is satisfied Initialise For all Compute o For all For all

The D-rule Stochastic methode: Do until termination criterion is satisfied Initialise For all Compute o For all Ü the D Rule

Remarks Advantages of the stochastic approximation of the gradient: Þ quicker convergence (incremental update of the weights). Þ less likely to stuck in a local minimum.

Remarks x2 - + - + x1 not linear separable Single perceptrons learn only linear separable training data. Þ We need multi layer networks of several 'neurons'. Example: the XOR problem: x2 - + - 0.5 x1 x2 1. -1. + x1 not linear separable

XOR-Function 0.5 1. -1. 1. 0.5 1 1. -1. 1. 1 0.5 1 1. -1. 0.5 1. -1. -1. 0.5 0.5 1. 1.

Supervised Learning Backpropagation NN Since 1985 the BP algorithm has become one of the widely spread and successful learning algorithms for NNs. Idea: The minimum of the error function of a learning function is searched by descending in direction of the gradient. The vector of weights which minimises the error in the network is seen as the solution of the learning problem. So the gradient of the error function must exist for all points inside the weight space. must be differentiable

Learning in Backpropagation Networks The sigmoid unit: Properties of the sigmoid unit: with

Definitions used by the BP Algorithm Input units i : input from node i to unit j : weight of the jth input to unit I outputs: set of output units : output of unit i : target output of unit i : error term of unit n Backpropagation j Hidden units Output units

The Backpropagation Algorithm Initialise all weights to small random numbers Until termination criterion satisfied do For each training example do Compute the network's outcome For each output unit k For each hidden unit h Update each network weight where

Derivation of the BP Algorithm For each training example d: with and where (weighted sum of inputs for unit j) Hidden units i Input units Output units j

Derivation of the BP Algorithm Output layer: Hidden layer: And therefore Downstream(j): the set of units whose immediate inputs include the output of unit j

Derivation of the BP Algorithm (Explanation)

Convergence of the BP Algorithm Generalisation to arbitrary acyclic directed network architectures is simple. In practice it works well, but it sometimes sticks in a local but not always global minimum Þ introduction of a momentum a (“escape routes”) : Disadvantage: global minima can be left out by this “jumping”! Training can take thousands of iterations ® slow (accelerated by momentum). Over-fitting versus adaptability of the NN.

Example: Recognition of Faces Given: 32 photos of 20 persons, in different positions: ® Direction of view: right, left, up or straight. ® With and without sunglasses. ® Expression: happy, sad, neutral...

Example: Recognition of Faces Goal: Classification of the photos concerning the direction of view Preparation of the input: • Rastering the photos acceleration of the learning process • Input vector = the grayscale values of the 30 * 32 pixels. • Output vector = (left, straight, right, up). Solution = max(left, right, up, straight). e.g. o = (0.9, 0.1, 0.1, 0.1) looking to the left

Recognition of the direction of view

Recurrent Neural Networks They are directed cyclic networks “with memory” ® Outputs at time t = Inputs at time t+1 ® The cycles allow to feed results back into the network. (+) They are more expressive than acyclic networks (-) Training of recurrent networks is expensive. In some cases recurrent networks can be trained using a variant of the Backpropagation algorithm. Example: Forecast of the next stock market prices y(t+1), based on the current indicator x(t) and the last indicator x(t-1).

Recurrent NNs y(t+1) y(t+1) b c(t) x(t) c(t) x(t) Recurrent Feedforward network Recurrent network c(t-1) x(t-2) Recurrent network (unfolded in time) c(t-2)