# Theory and Application of Artificial Neural Networks BY: M. Eftekhari M. Eftekhari.

## Presentation on theme: "Theory and Application of Artificial Neural Networks BY: M. Eftekhari M. Eftekhari."— Presentation transcript:

Theory and Application of Artificial Neural Networks BY: M. Eftekhari M. Eftekhari

Seminar Outline  Historical Review  Learning Methods of Artificial Neural Networks (ANNs)  Type of ANNs

From Biological to Artificial Neurons

The Neuron - A Biological Information Processor dendrites - the receivers (sums input signals) dendrites - the receivers (sums input signals) soma - neuron cell body soma - neuron cell body axon - the transmitter axon - the transmitter synapse - point of transmission synapse - point of transmission neuron activates after a certain threshold is met neuron activates after a certain threshold is met Learning occurs via electro-chemical changes in effectiveness of synaptic junction.

From Biological to Artificial Neurons An Artificial Neuron - The Perceptron simulated on hardware or by software simulated on hardware or by software input connections - the receivers input connections - the receivers node, unit, or PE simulates neuron body node, unit, or PE simulates neuron body output connection - the transmitter output connection - the transmitter activation function employs a threshold or bias activation function employs a threshold or bias connection weights act as synaptic junctions connection weights act as synaptic junctions Learning occurs via changes in value of the connection weights.

From Biological to Artificial Neurons An Artificial Neuron - The Perceptron Basic function of neuron is to sum inputs, and produce output given sum is greater than threshold Basic function of neuron is to sum inputs, and produce output given sum is greater than threshold ANN node produces an output as follows: ANN node produces an output as follows: 1. Multiplies each component of the input pattern by the weight of its connection 2. Sums all weighted inputs and subtracts the threshold value => total weighted input 3. Transforms the total weighted input into the output using the activation function

A simple Artificial Neuron f x1x1 x2x2 w1w1 w 0= w2w2 Activation function Has the role of events that Occur in a real neuron of brain Weights are similar to synapse Sum simulates the dendrites The learning is the process of updating weights Out put connections are similar to axons

From Biological to Artificial Neurons Behavior of an artificial neural network to any particular input depends upon: structure of each node (activation function) structure of each node (activation function) structure of the network (architecture) structure of the network (architecture) weights on each of the connections weights on each of the connections.... these must be learned !.... these must be learned !

Historical Review

The 1940s: the beginning of Neural Nets The 1940s: the beginning of Neural Nets The 1950s and 1960s: The first Golden Age of Neural Nets The 1950s and 1960s: The first Golden Age of Neural Nets The 1970s: The Quiet Years The 1970s: The Quiet Years The 1980s: Renewed Enthusiasm The 1980s: Renewed Enthusiasm

Overview of ANN Learning Methods

Three types of Learning Three types of Learning Supervised Learning  classification Supervised Learning  classification Unsupervised Learning  Clustering Unsupervised Learning  Clustering Reinforcement Learning  both above Reinforcement Learning  both above

Popular forms of Learning Methods Based on three pre- mentioned types

Hebbian or Correlative Learning Hebbian or Correlative Learning Donald Hebb 1949. Donald Hebb 1949. The input pattern corresponding desired output The input pattern corresponding desired output Numerous variants of Hebb rule (based on minimizing entropy function) Numerous variants of Hebb rule (based on minimizing entropy function)

Competitive Learning When an input pattern presented, one of the units in the layer will respond more than the other units. (change the weights of this unit, the other weights unchanged). When an input pattern presented, one of the units in the layer will respond more than the other units. (change the weights of this unit, the other weights unchanged). (“winner-takes-all”). (“winner-takes-all”). Weight adjustment is typically on modified form of hebb. (instar and outstar rules) Weight adjustment is typically on modified form of hebb. (instar and outstar rules)

Stochastic Learning Accomplished by adjusting the weights in a probabilistic manner. Accomplished by adjusting the weights in a probabilistic manner. Simulated annealing as applied to Boltzmann and Cauchy. Simulated annealing as applied to Boltzmann and Cauchy. Clamped vs. Unclamped mode until a “thermal” equilibrium then the weights updated. Clamped vs. Unclamped mode until a “thermal” equilibrium then the weights updated. Equilibrium point is when energy function minimized Equilibrium point is when energy function minimized

Error Correction Gradient Descent Learning Minimizing an Error or cost function through the use of gradient descent (Several learning paradigms) is the learning rate Minimizing an Error or cost function through the use of gradient descent (Several learning paradigms) is the learning rate E.g. popular back propagation and Widrow-Hoff Delta rule E.g. popular back propagation and Widrow-Hoff Delta rule

Gradient Descent Learning (continued) How to adjust the weights for interior layer units? How to adjust the weights for interior layer units? No clear way in which to assign credit or blame internal layer units weights. No clear way in which to assign credit or blame internal layer units weights. Credit assignment problem (BP algorithm solve it, good generalization) Credit assignment problem (BP algorithm solve it, good generalization)

More Leaning Methods… Genetic algorithms Genetic algorithms PSO algorithms PSO algorithms Other various methods…!!!!! Other various methods…!!!!!

Learning Strategy SupervisedReinforcementLearningUnsupervised Delta Rule Learning Automata Competitive Back propagation Hebbian Hebbian Stochastic

Neural Network Taxonomies based on Learning methods and their abilities

ANNs For Pattern Classification (using Error correction Learning) Perceptron Net Perceptron Net Adaline Net Adaline Net Multi-layer Nets (Madaline,Multi-layer perceptron) Multi-layer Nets (Madaline,Multi-layer perceptron) Back propagation Net Back propagation Net Radial Basis Function Net (RBF) Radial Basis Function Net (RBF) Cascade correlation Net (CCN) Cascade correlation Net (CCN) Others… Others… Supervised Learning (pre. section)

ANNs For Pattern Association (using Hebbian or delta rule Learning) Hetero-Associative (different No. of input and outputs) Hetero-Associative (different No. of input and outputs) Auto-Associative (the same No. of inputs and outputs) Auto-Associative (the same No. of inputs and outputs) Iterative Auto-Associative (Discrete Hopfield) Iterative Auto-Associative (Discrete Hopfield) Bidirectional Associative Bidirectional Associative Supervised Learning (pre. section)

ANNs For Pattern Association (continued) Aristotele observed that human memory connects items (ideas) that are similar, that are contrary or that occur in close proximity. Aristotele observed that human memory connects items (ideas) that are similar, that are contrary or that occur in close proximity. Learning is the process of forming associations between related patterns. Learning is the process of forming associations between related patterns. Hebb Rule for Pattern Association Hebb Rule for Pattern Association

ANNs For Clustering (competitive Learning) Fixed-Weight Competitive Nets (e.g. Maxnet, Mexican Hat) Fixed-Weight Competitive Nets (e.g. Maxnet, Mexican Hat) Kohonen Self-Organizing Maps (Feature Map Nets) Kohonen Self-Organizing Maps (Feature Map Nets) Learning Vector Quantization (e.g. LVQ1,LVQ2.1, LVQ3 ). (also classification!) Learning Vector Quantization (e.g. LVQ1,LVQ2.1, LVQ3 ). (also classification!) Adaptive Resonance Theory (ART) Adaptive Resonance Theory (ART) Counter Propagation Net (CPN) Counter Propagation Net (CPN) Unsupervised Learning (pre. section)

ANNs For Optimization and Pattern Association Boltzmann Machine (Both Approaches) Boltzmann Machine (Both Approaches) Hopfield Net (Both Approaches) Hopfield Net (Both Approaches) Cauchy Machine (using Cauchy probability distribution instead Boltzmann distribution) Cauchy Machine (using Cauchy probability distribution instead Boltzmann distribution) Consequently a faster annealing schedule can be used. Consequently a faster annealing schedule can be used.

Other Extensions of pre-mentioned Nets Modified Hopfield (Robust to noisy patterns) Modified Hopfield (Robust to noisy patterns) Back propagation for Recurrent Nets. Back propagation for Recurrent Nets. Plenty of various Nets exist…..!!!!. Plenty of various Nets exist…..!!!!.

Neural Network Taxonomies based on Architecture

Architectures Single Layer Feed Forward (SLFF) Single Layer Feed Forward (SLFF) Multi-Layer Feed Forward (MLFF) Multi-Layer Feed Forward (MLFF) Recurrent Recurrent Adaptive configuration with one of the above general Architectures (self-growing nets) Adaptive configuration with one of the above general Architectures (self-growing nets)

Limitations of Simple Neural Networks

The Limitations of Perceptrons (Minsky and Papert, 1969) Most functions are more complex; i.e. they are non-linear or not linearly separable Most functions are more complex; i.e. they are non-linear or not linearly separable This crippled research in neural net theory for 15 years.... This crippled research in neural net theory for 15 years....

Multi-layer Feed-forward ANNs

Over the 15 years (1969-1984) some research continued... hidden layer of nodes allowed combinations of linear functions hidden layer of nodes allowed combinations of linear functions non-linear activation functions displayed properties closer to real neurons: non-linear activation functions displayed properties closer to real neurons: output varies continuously but not linearly output varies continuously but not linearly differentiable.... sigmoid differentiable.... sigmoid non-linear ANN classifier was possible non-linear ANN classifier was possible

Generalization

Generalization The objective of learning is to achieve good generalization to new cases, otherwise just use a look-up table. The objective of learning is to achieve good generalization to new cases, otherwise just use a look-up table. Generalization can be defined as a mathematical interpolation or regression over a set of training points: Generalization can be defined as a mathematical interpolation or regression over a set of training points: f(x) x

Generalization A Probabilistic Guarantee N = # hidden nodesm = # training cases W = # weights = error tolerance (< 1/8) Network will generalize with 95% confidence if: 1. Error on training set < 2. Based on PAC theory => provides a good rule of practice.

Generalization Consider 20-bit parity problem: 20-20-1 net has 441 weights 20-20-1 net has 441 weights For 95% confidence that net will predict with, we need training examples For 95% confidence that net will predict with, we need training examples Not bad considering Not bad considering

Generalization Training Sample & Network Complexity Based on : Based on : W - to reduced size of training sample W - to supply freedom to construct desired function Optimum W => Optimum # Hidden Nodes

Generalization How can we control number of effective weights? Manually select optimum number of hidden nodes and connections Manually select optimum number of hidden nodes and connections Prevent over-fitting = over-training Prevent over-fitting = over-training Add a weight-cost term to the bp error equation Add a weight-cost term to the bp error equation

Generalization Over-Training Is the equivalent of over-fitting a set of data points to a curve which is too complex Is the equivalent of over-fitting a set of data points to a curve which is too complex Occam’s Razor (1300s) : “plurality should not be assumed without necessity” Occam’s Razor (1300s) : “plurality should not be assumed without necessity” The simplest model which explains the majority of the data is usually the best The simplest model which explains the majority of the data is usually the best

Generalization Preventing Over-training: Use a separate test or tuning set of examples Use a separate test or tuning set of examples Monitor error on the test set as network trains Monitor error on the test set as network trains Stop network training just prior to over-fit error occurring - early stopping or tuning Stop network training just prior to over-fit error occurring - early stopping or tuning Number of effective weights is reduced Number of effective weights is reduced Most new systems have automated early stopping methods Most new systems have automated early stopping methods

Network Training

How do you ensure that a network has been well trained? Objective: To achieve good generalization Objective: To achieve good generalization accuracy on new examples/cases accuracy on new examples/cases Establish a maximum acceptable error rate Establish a maximum acceptable error rate Train the network using a validation test set to tune it Train the network using a validation test set to tune it Validate the trained network against a separate test set which is usually referred to as a production test set Validate the trained network against a separate test set which is usually referred to as a production test set

Network Training Available Examples Training Set Production Set Approach #1: Large Sample When the amount of available data is large... 70% 30% Used to develop one ANN model Compute Test error Divide randomly Generalization error = test error Test Set

Network Training Available Examples Training Set Pro. Set Approach #2: Cross-validation When the amount of available data is small... 10% 90% Repeat 10 times Used to develop 10 different ANN models Accumulate test errors Generalization error determined by mean test error and stddev Test Set

Network Training How do you select between two ANN designs ? A statistical test of hypothesis is required to ensure that a significant difference exists between the error rates of two ANN models A statistical test of hypothesis is required to ensure that a significant difference exists between the error rates of two ANN models Any testing methods have been developed for large and small size of data Any testing methods have been developed for large and small size of data

Network Training Mastering ANN Parameters Typical Range Typical Range learning rate - 0.1 0.01 - 0.99 momentum - 0.8 0.1 - 0.9 weight-cost - 0.1 0.001 - 0.5 Fine tuning : - adjust individual parameters at each node and/or connection weight automatic adjustment during training automatic adjustment during training

Network Training Network weight initialization Random initial values +/- some range Random initial values +/- some range Smaller weight values for nodes with many incoming connections Smaller weight values for nodes with many incoming connections Rule of thumb: initial weight range should be approximately Rule of thumb: initial weight range should be approximately coming into a node coming into a node

Network Training Typical Problems During Training E # iter E E Would like: But sometimes: Steady, rapid decline in total error Seldom a local minimum - reduce learning or momentum parameter Reduce learning parms. - may indicate data is not learnable

Data Preparation

The three steps of data preparation: The three steps of data preparation: Consolidation and Cleaning Consolidation and Cleaning Selection and Preprocessing Selection and Preprocessing Transformation and Encoding Transformation and Encoding

Data Preparation Data Types and ANNs Three basic data types: Three basic data types: nominal discrete symbolic (A, yes, small) nominal discrete symbolic (A, yes, small) ordinal discrete numeric (-5, 3, 24) ordinal discrete numeric (-5, 3, 24) continuous numeric (0.23, -45.2, 500.43) continuous numeric (0.23, -45.2, 500.43) bp ANNs accept only continuous numeric values (typically 0 - 1 range) bp ANNs accept only continuous numeric values (typically 0 - 1 range)

Data Preparation Consolidation and Cleaning Determine appropriate input attributes Determine appropriate input attributes Consolidate data into working database Consolidate data into working database Eliminate or estimate missing values Eliminate or estimate missing values Remove outliers (obvious exceptions) Remove outliers (obvious exceptions) Deal with volume bias Deal with volume bias

Data Preparation Selection and Preprocessing Select examples random sampling Select examples random sampling Consider number of training examples? Reduce attribute dimensionality Reduce attribute dimensionality remove redundant and/or correlating attributes remove redundant and/or correlating attributes combine attributes (sum, multiply, difference) combine attributes (sum, multiply, difference) Reduce attribute value ranges Reduce attribute value ranges group symbolic discrete values group symbolic discrete values quantize continuous numeric values quantize continuous numeric values

Post-Training Analysis

Examining the neural net model: Visualizing the constructed model Visualizing the constructed model Detailed network analysis Detailed network analysis Sensitivity analysis of input attributes: Analytical techniques Analytical techniques Attribute elimination Attribute elimination

Post-Training Analysis Visualizing the Constructed Model Graphical tools can be used to display output response as selected input variables are changed Graphical tools can be used to display output response as selected input variables are changed Response Size Temp

Post-Training Analysis Detailed network analysis Hidden nodes form internal representation Manual analysis of weight values often difficult - graphics very helpful Manual analysis of weight values often difficult - graphics very helpful Conversion to equation, executable code Conversion to equation, executable code Automated ANN to symbolic logic conversion is a hot area of research Automated ANN to symbolic logic conversion is a hot area of research

Post-Training Analysis Sensitivity analysis of input attributes Analytical techniques Analytical techniques network weight analysis network weight analysis Feature (attribute) elimination Feature (attribute) elimination forward feature elimination forward feature elimination backward feature elimination backward feature elimination

Example Applications Pattern Recognition (reading zip codes) Pattern Recognition (reading zip codes) Signal Filtering (reduction of radio noise) Signal Filtering (reduction of radio noise) Data Segmentation (detection of seismic onsets) Data Segmentation (detection of seismic onsets) Data Compression (TV image transmission) Data Compression (TV image transmission) Database Mining (marketing, finance analysis) Database Mining (marketing, finance analysis) Adaptive Control (vehicle guidance) Adaptive Control (vehicle guidance)

Example Applications Online application (fault detection systems) Online application (fault detection systems) Off line control (modeling a process or controller). Off line control (modeling a process or controller). Hybrid with fuzzy (ANFIS), GA and Rough set theory. Hybrid with fuzzy (ANFIS), GA and Rough set theory.

THE END Thanks for your participation!

Download ppt "Theory and Application of Artificial Neural Networks BY: M. Eftekhari M. Eftekhari."

Similar presentations