1 RECENT DEVELOPMENTS IN MULTILAYER PERCEPTRON NEURAL NETWORKS Walter H. Delashmit Lockheed Martin Missiles and Fire Control Dallas, TX 75265

Slides:



Advertisements
Similar presentations
Multi-Layer Perceptron (MLP)
Advertisements

Beyond Linear Separability
A Brief Overview of Neural Networks By Rohit Dua, Samuel A. Mulder, Steve E. Watkins, and Donald C. Wunsch.
Slides from: Doug Gray, David Poole
also known as the “Perceptron”
1 Neural networks. Neural networks are made up of many artificial neurons. Each input into the neuron has its own weight associated with it illustrated.
Artificial Intelligence 13. Multi-Layer ANNs Course V231 Department of Computing Imperial College © Simon Colton.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Mehran University of Engineering and Technology, Jamshoro Department of Electronic Engineering Neural Networks Feedforward Networks By Dr. Mukhtiar Ali.
Simple Neural Nets For Pattern Classification
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
RBF Neural Networks x x1 Examples inside circles 1 and 2 are of class +, examples outside both circles are of class – What NN does.
The back-propagation training algorithm
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Radial Basis Functions
September 30, 2010Neural Networks Lecture 8: Backpropagation Learning 1 Sigmoidal Neurons In backpropagation networks, we typically choose  = 1 and 
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.
Neural Networks Marco Loog.
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Chapter 6: Multilayer Neural Networks
Before we start ADALINE
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Radial Basis Function (RBF) Networks
Radial-Basis Function Networks
Radial Basis Function Networks
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences
MSE 2400 EaLiCaRA Spring 2015 Dr. Tom Way
Evolving a Sigma-Pi Network as a Network Simulator by Justin Basilico.
Multiple-Layer Networks and Backpropagation Algorithms
Artificial Neural Networks
Multi Layer NN and Bit-True Modeling of These Networks SILab presentation Ali Ahmadi September 2007.
Artificial Neural Networks Shreekanth Mandayam Robi Polikar …… …... … net k   
DIGITAL IMAGE PROCESSING Dr J. Shanbehzadeh M. Hosseinajad ( J.Shanbehzadeh M. Hosseinajad)
Appendix B: An Example of Back-propagation algorithm
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 16: NEURAL NETWORKS Objectives: Feedforward.
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Artificial Intelligence Techniques Multilayer Perceptrons.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 31: Feedforward N/W; sigmoid.
Multi-Layer Perceptron
Soft Computing Lecture 8 Using of perceptron for image recognition and forecasting.
Non-Bayes classifiers. Linear discriminants, neural networks.
CSC321 Introduction to Neural Networks and Machine Learning Lecture 3: Learning in multi-layer networks Geoffrey Hinton.
ADALINE (ADAptive LInear NEuron) Network and
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
CS621 : Artificial Intelligence
Introduction to Neural Networks Introduction to Neural Networks Applied to OCR and Speech Recognition An actual neuron A crude model of a neuron Computational.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
Chapter 8: Adaptive Networks
Hazırlayan NEURAL NETWORKS Backpropagation Network PROF. DR. YUSUF OYSAL.
Neural Networks 2nd Edition Simon Haykin
Previous Lecture Perceptron W  t+1  W  t  t  d(t) - sign (w(t)  x)] x Adaline W  t+1  W  t  t  d(t) - f(w(t)  x)] f’ x Gradient.
Artificial Intelligence Methods Neural Networks Lecture 3 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Dimensions of Neural Networks Ali Akbar Darabi Ghassem Mirroshandel Hootan Nokhost.
Evolutionary Computation Evolving Neural Network Topologies.
Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning Supervised Learning Classification and Regression
Neural networks.
CS 9633 Machine Learning Support Vector Machines
Deep Feedforward Networks
Machine Learning Today: Reading: Maria Florina Balcan
Neuro-Computing Lecture 4 Radial Basis Function Network
Computer Vision Lecture 19: Object Recognition III
CS621: Artificial Intelligence Lecture 22-23: Sigmoid neuron, Backpropagation (Lecture 20 and 21 taken by Anup on Graphical Models) Pushpak Bhattacharyya.
Presentation transcript:

1 RECENT DEVELOPMENTS IN MULTILAYER PERCEPTRON NEURAL NETWORKS Walter H. Delashmit Lockheed Martin Missiles and Fire Control Dallas, TX Michael T. Manry The University of Texas at Arlington Arlington, TX Memphis Area Engineering and Science Conference 2005 May 11, 2005

2 Outline of Presentation Review of Multilayer Perceptron Neural Networks Network Initial Types and Training Problems Common Starting Point Initialized Networks Dependently Initialized Networks Separating Mean Processing Summary

3 Review of Multilayer Perceptron Neural Networks

4 Typical 3 Layer MLP

5 MLP Performance Equations Mean Square Error (MSE): Output: Net Function:

6 Net Control Scales and shifts all net functions so that they do not generate small gradients and do not allow large inputs to mask the potential effects of small inputs

7 Neural Network Training Algorithms Backpropagation Training Output Weight Optimization – Hidden Weight Optimization (OWO-HWO) Full Conjugate Gradient

8 Output Weight Optimization – Hidden Weight Optimization (OWO-HWO) Used in this development Linear equations used to solve for output weights in OWO Separate error functions for each hidden unit are used and multiple sets of linear equations solved to determine the weights connecting to the hidden units in HWO

9 Network Initial Types and Training Problems

10 Problem Definition Assume that a set of MLPs of different sizes are to be designed for a given training data set Let be the set of all MLPs for that training data having N h hidden units, E int (N h ) denote the corresponding training error of am initial network that belongs to Let E f (N h ) denote the corresponding training error of a well-trained network Let N hmax denote the maximum number of hidden units for which networks are to be designed Goal: Choose a set of initial networks from {S 0, S 1, S 2, … }such that E int (0)  E int (1)  E int (2)  …. E int (N hmax ) and train the network to minimize E f (N h ) such that E f (0)  E f (1)  E f (2)  …. E f (N hmax ) Axiom 3.1: If E f (N h )  E f (N h -1) then the network having N h hidden units is useless since the training resulted in a larger, more complex network with a larger or the same training error.

11 Network Design Methodologies Design Methodology One (DM-1) – A well-organized researcher may design a set of different size networks in an orderly fashion, each with one or more hidden units than the previous network oThorough design approach oMay take longer time to design oAllows achieving a trade-off between network performance and size Design Methodology Two (DM-2) – A researcher may design different size networks in no particular order oMay be quickly pursued for only a few networks oPossible that design could be significantly improved with a bit more attention to network design

12 Three Types of Networks Defined Randomly Initialized (RI) Networks – No members of this set of networks have any initial weights and thresholds in common. Practically this means that the initial random number seeds (IRNS) are widely separated. Useful when the goal is to quickly design one or more networks of the same or different sizes whose weights are statistically independent of each other. Can be designed using DM-1 or DM-2 Common Starting Points Initialized (CSPI) Networks – When a set of networks are CSPI, each one starts with the same IRNS. These networks are useful when it is desired to make performance comparisons of networks that have the same IRNS for the starting point. Can be designed using DM-1 or DM-2 Dependently Initialized (DI) Networks – A series of networks are designed with each subsequent network having one or more hidden units than the previous network. Larger size networks are initialized using the final weights and thresholds from training a smaller size network for the values of the common weights and thresholds. DI networks are useful when the goal is a thorough analysis of network performance versus size and are most relevant to being designed using DM-1.

13 Network Properties Theorem 3.1: If two initial RI networks (1) are the same size, (2) have the same training data set and (3) the training data set has more than one unique input vector, then the hidden unit basis functions are different for the two networks. Theorem 3.2: If two CSPI networks (1) are the same size and (2) use the same algorithm for processing random numbers into weights, then they are identical. Corollary 3.2: If two initial CSPI networks are the same size and use the same algorithm for processing random numbers into weights, then they have all common basis functions.

14 Problems with MLP Training Non-monotonic E f (N h ) No standard way to initialize and train additional hidden units Net control parameters are arbitrary No procedure to initialize and train DI networks Network linear and nonlinear component interference

15 Mapping Error Examples

16 Tasks Performed in this Research Analysis of RI networks Improved Initialization in CSPI networks Improved initialization of new hidden units in DI networks Analysis of separating mean training approaches

17 CSPI and CSPI-SWI Networks Improvement to RI networks Each CSPI network starts with same IRNS Extended to CSPI-SWI (Structured Weight Initialization) networks oEvery hidden unit of the larger network has the same initial weights and threshold values as the corresponding units of the smaller network oInput to output weights and thresholds are also identical Theorem 5.1: If two CSPI networks are designed with structured weight initialization, the common subset of the hidden unit basis functions are identical. Corollary 5.1: If two CSPI networks are designed using structured weight initialization, the only initial basis functions that are not the same are the hidden unit basis functions for the additional hidden units in the larger network. Detailed flow chart for CSPI-SWI initialization in dissertation

18 CSPI-SWI Examples fmtwod

19 DI Network Development and Evaluation Improvement over RI, CSPI and CSPI-SWI networks The values of the common subset of the initial weights and thresholds for the larger network are initialized with the final weights and thresholds from a previously well-trained smaller network Designed with DM-1 Single network designs  networks are implementable After training, testing is feasible on a different set of data set

20 Create an initial network with N h hidden units Train this initial network N h  N h +p N h >N hmax ? Initialize new hidden units N h-p+1  j  N h w oh (k,j)  0, 1  k  M w hi (j,i)  RN(ind+), 1  i  N+1 Net control for w hi (j,i), 1  i  N+1 Train new network Stop Yes No Basic DI Network Flowgraph

21 Properties of DI Networks E int (N h ) < E int (N h-p ) E f (N p ) curve is monotonic non-increasing (i. e., E f (N h )  E f (N h-p )) E int (N h ) = E f (N h-p )

22 Performance Results for DI Networks with Fixed Iterations fmtwod F24F17

23 RI Network and DI Network Comparison (1) DI network: standard DI network design for N h hidden units (2) RI type 1: RI networks were designed using a single network for each value of N h and every network of size N h was trained using the value of N iter that the corresponding network was trained with for the DI network. (3) RI type 2: RI networks were designed using a single network for each value of N h and every network was trained using the total number of N iter that was used for the entire sequence of DI networks. This can be expressed by This results in the RI type 2 network actually having a larger value of N iter than the DI network.

24 RI Network and DI Network Comparison Results fm twod

25 Separating Mean Processing Techniques Bottom-Up Separating Mean Top-Down Separating Mean

26 Generate linear mapping results p t  pp tt   Train MLP using new data ppp ttx  , Bottom-Up Separating Mean Basic Idea: A linear mapping is removed from the training data. The nonlinear fit to the resulting data may perform better. Generate new desired output vector

27 Bottom-up Separating Mean Results fmpower12 single2

28 Top-Down Separating Mean Basic Idea: If we know which subsets of inputs and outputs have the same means in Signal Model 2 and 3, we can estimate and remove these means. Network performance is more robust.

29 Separating Mean Results power12

30 Conclusions On the average CSPI-SWI networks have more monotonic non- increasing MSE versus N h curves than RI networks MSE versus N h curves are always monotonic non-increasing for DI networks DI network training was improved by calculating the number of training iterations and limiting the amount of training used for previously trained units DI networks always produce more consistent MSE versus N h curves than RI, CSPI and CSPI-SWI networks Separating mean processing using both a bottom-up and top-down architecture often produce improved performance results A new technique was developed to determine which inputs and outputs are similar to use for top-down separating mean processing