Introduction to Artificial Neural Network Models Angshuman Saha Image Source: ww.physiol.ucl.ac.uk/fedwards/ ca1%20neuron.jpg.

Slides:



Advertisements
Similar presentations
Artificial Neural Networks
Advertisements

NEURAL NETWORKS Biological analogy
A Brief Overview of Neural Networks By Rohit Dua, Samuel A. Mulder, Steve E. Watkins, and Donald C. Wunsch.
Slides from: Doug Gray, David Poole
NEURAL NETWORKS Backpropagation Algorithm
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Neural Networks  A neural network is a network of simulated neurons that can be used to recognize instances of patterns. NNs learn by searching through.
Mehran University of Engineering and Technology, Jamshoro Department of Electronic Engineering Neural Networks Feedforward Networks By Dr. Mukhtiar Ali.
Kostas Kontogiannis E&CE
Artificial Neural Networks - Introduction -
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Machine Learning Neural Networks
Artificial Neural Networks ECE 398BD Instructor: Shobha Vasudevan.
Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods.
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
Connectionist Modeling Some material taken from cspeech.ucd.ie/~connectionism and Rich & Knight, 1991.
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Data Mining with Neural Networks (HK: Chapter 7.5)
CS 484 – Artificial Intelligence
Image Compression Using Neural Networks Vishal Agrawal (Y6541) Nandan Dubey (Y6279)
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
MSE 2400 EaLiCaRA Spring 2015 Dr. Tom Way
Artificial Neural Networks
Classification Part 3: Artificial Neural Networks
Artificial Neural Networks (ANN). Output Y is 1 if at least two of the three inputs are equal to 1.
Computer Science and Engineering
Artificial Neural Networks
Neural Networks Ellen Walker Hiram College. Connectionist Architectures Characterized by (Rich & Knight) –Large number of very simple neuron-like processing.
Artificial Intelligence Neural Networks ( Chapter 9 )
Chapter 3 Neural Network Xiu-jun GONG (Ph. D) School of Computer Science and Technology, Tianjin University
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
NEURAL NETWORKS FOR DATA MINING
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Artificial Intelligence Methods Neural Networks Lecture 4 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
Artificial Neural Networks An Introduction. What is a Neural Network? A human Brain A porpoise brain The brain in a living creature A computer program.
Non-Bayes classifiers. Linear discriminants, neural networks.
Introduction to Neural Networks. Biological neural activity –Each neuron has a body, an axon, and many dendrites Can be in one of the two states: firing.
Introduction to Neural Networks Introduction to Neural Networks Applied to OCR and Speech Recognition An actual neuron A crude model of a neuron Computational.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
Image Source: ww.physiol.ucl.ac.uk/fedwards/ ca1%20neuron.jpg
EEE502 Pattern Recognition
Learning Neural Networks (NN) Christina Conati UBC
Neural Network and Deep Learning 王强昌 MLA lab.
NEURAL NETWORKS LECTURE 1 dr Zoran Ševarac FON, 2015.
Bab 5 Classification: Alternative Techniques Part 4 Artificial Neural Networks Based Classifer.
Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo.
Kim HS Introduction considering that the amount of MRI data to analyze in present-day clinical trials is often on the order of hundreds or.
CPH Dr. Charnigo Chap. 11 Notes Figure 11.2 provides a diagram which shows, at a glance, what a neural network does. Inputs X 1, X 2,.., X P are.
Artificial Neural Networks An Introduction. Outline Introduction Biological and artificial neurons Perceptrons (problems) Backpropagation network Training.
Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
CS 388: Natural Language Processing: Neural Networks
Artificial Neural Networks
Supervised Learning in ANNs
Learning with Perceptrons and Neural Networks
Real Neurons Cell structures Cell body Dendrites Axon
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Prof. Carolina Ruiz Department of Computer Science
CSC 578 Neural Networks and Deep Learning
Data Mining with Neural Networks (HK: Chapter 7.5)
Stat 6601 Project: Neural Networks (V&R 6.3)
Lecture Notes for Chapter 4 Artificial Neural Networks
Artificial Neural Networks
David Kauchak CS158 – Spring 2019
Prof. Carolina Ruiz Department of Computer Science
Presentation transcript:

Introduction to Artificial Neural Network Models Angshuman Saha Image Source: ww.physiol.ucl.ac.uk/fedwards/ ca1%20neuron.jpg

Author: Angshuman Saha Neural Network A broad class of models that mimic functioning inside the human brain There are various classes of NN models. They are different from each other depending on  Problem types Prediction, Classification, Clustering  Structure of the model  Model building algorithm For this discussion we are going to focus on Feed-forward Back-propagation Neural Network (used for Prediction and Classification problems) Definition

Author: Angshuman Saha Most important functional unit in human brain – a class of cells called – NEURON Dendrites – Receive information A bit of biology... Hippocampal Neurons Source: heart.cbl.utoronto.ca/ ~berj/projects.html Cell Body – Process information Axon – Carries processed information to other neurons Synapse – Junction between Axon end and Dendrites of other Neurons Dendrites Cell Body Axon Schematic Synapse

Author: Angshuman Saha An Artificial Neuron Receives Inputs X 1 X 2 … X p from other neurons or environment Inputs fed-in through connections with ‘weights’ Total Input = Weighted sum of inputs from all sources Transfer function (Activation function) converts the input to output Output goes to other neurons or environment f X1X1 X2X2 XpXp I I = w 1 X 1 + w 2 X 2 + w 3 X 3 +… + w p X p V = f(I) w1w1 w2w wpwp DendritesCell BodyAxon Direction of flow of Information

Author: Angshuman Saha There are various choices for Transfer / Activation functions Tanh f(x) = (e x – e -x ) / (e x + e -x ) Logistic f(x) = e x / (1 + e x ) Threshold 0 if x< 0 f(x) = 1 if x >= 1 Transfer Functions

Author: Angshuman Saha ANN – Feed-forward Network A collection of neurons form a ‘Layer’ Direction of information flow X1X1 X2X2 X3X3 X4X4 y1y1 y2y2 Input Layer - Each neuron gets ONLY one input, directly from outside Output Layer - Output of each neuron directly goes to outside Hidden Layer - Connects Input and Output layers

Author: Angshuman Saha Number of hidden layers can beNoneOneMore ANN – Feed-forward Network

Author: Angshuman Saha Couple of things to note X1X1 X2X2 X3X3 X4X4 y1y1 y2y2 Within a layer neurons are NOT connected to each other. Neuron in one layer is connected to neurons ONLY in the NEXT layer. (Feed-forward) Jumping of layer is NOT allowed ANN – Feed-forward Network

Author: Angshuman Saha What do we mean by ‘ A particular Model ‘ ? Input: X 1 X 2 X 3 Output: YModel: Y = f(X 1 X 2 X 3 ) For an ANN :Algebraic form of f(.) is too complicated to write down. However it is characterized by # Input Neurons # Hidden Layers # Neurons in each Hidden Layer # Output Neurons WEIGHTS for all the connections ‘ Fitting ‘ an ANN model = Specifying values for all those parameters One particular ANN model

Author: Angshuman Saha Free parameters Decided by the structure of the problem # Input Nrns = # of X’s # Output Nrns = # of Y’s One particular Model – an Example Input: X 1 X 2 X 3 Output: Y Model: Y = f(X 1 X 2 X 3 ) X1X1 X2X2 X3X3 Y ParametersExample # Input Neurons3 # Hidden Layers1 # Hidden Layer Size3 # Output Neurons3 Weights Specified

Author: Angshuman Saha Prediction using a particular ANN Model Input: X 1 X 2 X 3 Output: YModel: Y = f(X 1 X 2 X 3 ) X 1 =1X 2 =-1X 3 =2 0.2 f (0.2) = f (0.9) = f (-0.087) = = 0.5 * 1 –0.1*(-1) –0.2 * 2Predicted Y = Suppose Actual Y = 2 Then Prediction Error = ( ) =1.522 f(x) = e x / (1 + e x ) f( 0.2 ) = e 0.2 / (1 + e 0.2 ) = 0.55

Author: Angshuman Saha How to build the Model ? Input: X 1 X 2 X 3 Output: Y Model: Y = f(X 1 X 2 X 3 ) # Input Neurons= # Inputs = 3 # Output Neurons = # Outputs = 1 # Hidden Layer = ??? Try 1 # Neurons in Hidden Layer = ??? Try 2 No fixed strategy. By trial and error Architecture is now defined … How to get the weights ??? Given the Architecture There are 8 weights to decide. W = (W 1, W 2, …, W 8 ) Training Data: (Yi, X1i, X2i, …, Xpi ) i= 1,2,…,n Given a particular choice of W, we will get predicted Y’s ( V 1,V 2,…,V n ) They are function of W. Choose W such that over all prediction error E is minimized E =  (Yi – Vi) 2 Building ANN Model

Author: Angshuman Saha How to train the Model ? E =  (Yi – Vi) 2 Start with a random set of weights. Feed forward the first observation through the net X 1 NetworkV 1 ; Error = (Y 1 – V 1 ) Adjust the weights so that this error is reduced ( network fits the first observation well ) Feed forward the second observation. Adjust weights to fit the second observation well Keep repeating till you reach the last observation This finishes one CYCLE through the data Perform many such training cycles till the overall prediction error E is small. Feed forward Back Propagation Training the Model

Author: Angshuman Saha Bit more detail on Back Propagation E =  (Yi – Vi) 2 Each weight ‘Shares the Blame’ for prediction error with other weights. Back Propagation algorithm decides how to distribute the blame among all weights and adjust the weights accordingly. Small portion of blame leads to small adjustment. Large portion of the blame leads to large adjustment. Back Propagation

Author: Angshuman Saha Weight adjustment formula in Back Propagation Gradient Descent Method : For every individual weight W i, updation formula looks like W new = W old +  * (  E /  W) | Wold  = Learning Parameter (between 0 and 1) W (t+1) = W (t) +  * (  E /  W) | W(t) +  * (W (t) - W (t-1) )  = Momentum (between 0 and 1) E( W ) =  [ Yi – Vi( W ) ] 2 V i – the prediction for ith observation – is a function of the network weights vector W = ( W 1, W 2,….) Hence, E, the total prediction error is also a function of W Another slight variation is also used sometimes Weight adjustment during Back Propagation

Author: Angshuman Saha Geometric interpretation of the Weight adjustment E( w 1, w 2 ) =  [ Yi – Vi(w 1, w 2 ) ] 2 Consider a very simple network with 2 inputs and 1 output. No hidden layer. There are only two weights whose values needs to be specified. w1w1 w2w2 A pair ( w 1, w 2 ) is a point on 2-D plane. For any such point we can get a value of E. Plot E vs ( w 1, w 2 ) - a 3-D surface - ‘Error Surface’ Aim is to identify that pair for which E is minimum That means – identify the pair for which the height of the error surface is minimum. Gradient Descent Algorithm Start with a random point ( w 1, w 2 ) Move to a ‘better’ point ( w’ 1, w’ 2 ) where the height of error surface is lower. Keep moving till you reach ( w* 1, w* 2 ), where the error is minimum.

Author: Angshuman Saha Crawling the Error Surface w*w* w0w0 Error Surface Local Minima Weight Space Global Minima

Author: Angshuman Saha E =  (Yi – Vi) 2 Decide the Network architecture (# Hidden layers, #Neurons in each Hidden Layer) Training Algorithm Initialize the Network with random weights Feed forward the I-th observation thru the Net Compute the prediction error on I-th observation Back propagate the error and adjust weights Check for Convergence For I = 1 to # Training Data points Next I Do till Convergence criterion is not met End Do Decide the Learning parameter and Momentum

Author: Angshuman Saha When to stop training the Network ? Convergence Criterion Ideally – when we reach the global minima of the error surface Suggestion: 1.Stop if the decrease in total prediction error (since last cycle) is small. 2.Stop if the overall changes in the weights (since last cycle) are small. We don’t …How do we know we have reached there ? Drawback: Error keeps on decreasing. We get a very good fit to training data. BUT … The network thus obtained have poor generalizing power on unseen data The phenomenon is also known as - Over fitting of the Training data The network is said to Memorize the training data. - so that when an X in training set is given, the network faithfully produces the corresponding Y. -However for X’s which the network didn’t see before, it predicts poorly.

Author: Angshuman Saha Convergence Criterion Modified Suggestion: Partition the training data into Training set and Validation set Use Training set - build the model Validation set - test the performance of the model on unseen data Typically as we have more and more training cycles Error on Training set keeps on decreasing. Error on Validation set keeps first decreases and then increases. Error Cycle Validation Training Stop training when the error on Validation set starts increasing

Author: Angshuman Saha Choice of Training Parameters Learning Parameter Too big – Large leaps in weight space – risk of missing global minima. Too small – - Takes long time to converge to global minima - Once stuck in local minima, difficult to get out of it. Learning Parameter and Momentum - needs to be supplied by user from outside. Should be between 0 and 1 What should be the optimal values of these training parameters ? - No clear consensus on any fixed strategy. - However, effects of wrongly specifying them are well studied. Suggestion Trial and Error – Try various choices of Learning Parameter and Momentum See which choice leads to minimum prediction error

Author: Angshuman Saha Wrap Up  Artificial Neural network (ANN) – A class of models inspired by biological Neurons  Used for various modeling problems – Prediction, Classification, Clustering,..  One particular subclass of ANN’s – Feed forward Back propagation networks  Organized in layers – Input, hidden, Output  Each layer is a collection of a number of artificial Neurons  Neurons in one layer in connected to neurons in next layer  Connections have weights  Fitting an ANN model is to find the values of these weights.  Given a training data set – weights are found by Feed forward Back propagation algorithm, which is a form of Gradient Descent Method – a popular technique for function minimization.  Network architecture as well as the training parameters are decided upon by trial and error. Try various choices and pick the one that gives lowest prediction error.