Download presentation
Presentation is loading. Please wait.
Published byRodger McDowell Modified over 9 years ago
2
1 Hybrid Intelligent Systems Lecture 4 - Part A Evolutionary Neural Networks Evolving Fuzzy Systems
3
2 Artificial Neural Networks - Features Typically, structure of a neural network is established and one of a variety of mathematical algorithms is used to determine what the weights of the interconnections should be to maximize the accuracy of the outputs produced. This process by which the synaptic weights of a neural network are adapted according to the problem environment is popularly known as learning. There are broadly three types of learning: Supervised learning, unsupervised learning and reinforcement learning
4
3 Node Activation Functions Logistic function Hyperbolic tangent function Sigmoidal function Gaussian function Identity function Others
5
4 Different Neural Network Architectures Multi layered feedforward network Recurrent network Competitive network Jordan network
6
5 Backpropagation Algorithm Backpropagation algorithm E = error criteria to be minimized w ij = weight from the i-th input unit to the j-th output and are the learning rate and momentum
7
6 Designing Neural Networks In the conventional design – user has to specify : Number of neurons Distribution of layers Interconnection between neurons and layers Topological optimization algorithms (limitations ?) * Network Pruning * Network Growing Tiling ( Mezard et al, 1989 ) Upstart ( Frean et al, 1990 ) Cascade Correlation ( Fahlman et al, 1990 ) Exentron (Baffles et al 1992)
8
7 Choosing Hidden Neurons A large number of hidden neurons will ensure the correct learning and the network is able to correctly predict the data it has been trained on, but its performance on new data, its ability to generalise, is compromised. With too few a hidden neurons, the network may be unable to learn the relationships amongst the data and the error will fail to fall below an acceptable level. Selection of the number of hidden neurons is a crucial decision. Often a trial and error approach is taken.
9
8 Choosing Initial Weights The learning algorithm uses a steepest descent The learning algorithm uses a steepest descent technique, which rolls straight downhill in weight space technique, which rolls straight downhill in weight space until the first valley is reached. until the first valley is reached. This valley may not correspond to a zero This valley may not correspond to a zero error for the resulting network. error for the resulting network. This makes the choice of initial starting point in the This makes the choice of initial starting point in the multidimensional weight space critical. multidimensional weight space critical. However, there are no recommended rules for However, there are no recommended rules for this selection except trying several different starting this selection except trying several different starting weight values to see if the network results are improved. weight values to see if the network results are improved.
10
9 Use of Momentum Helps to get out of local minima Helps to get out of local minima Smooth out the variations Smooth out the variations
11
10 Choosing the learning rate Learning rate controls the size of the step that is taken in Learning rate controls the size of the step that is taken in multidimensional weight space when each weight is multidimensional weight space when each weight is modified. modified. If the selected learning rate is too large then the local If the selected learning rate is too large then the local minimum may be overstepped constantly, resulting in minimum may be overstepped constantly, resulting in oscillations and slow convergence to the lower error oscillations and slow convergence to the lower error state. state. If the learning rate is too low, the number of iterations If the learning rate is too low, the number of iterations required may be too large, resulting in slow performance. required may be too large, resulting in slow performance.
12
11 Effects of Different Learning Rates
13
12 Gradient Descent Performance Trapped in local minima Desired behavior Undesired behavior
14
13 Gradient Descent Technique - Drawbacks Always goes “downhill” Cannot always find a global minima, if local minima exist Poor generalization after prolonged training The solution found will depend on the starting location For complicated problems it is hard to find a starting location that will guarantee a global minimum Solution Other search techniques and global optimization algorithms
15
14 Conjugate Gradient Algorithms Search is performed in conjugate directions Start with the steepest descent (first iteration) Line search to move along the current direction New search direction is conjugate to previous direction ( new steepest search direction + previous search direction) Fletcher - Reeves Update Polak - Ribiere Update Powelle - Beale Restart Scaled Conjugate Algorithm
16
15 Scaled Conjugate Gradient Algorithm SCGA avoids the complicated line search procedure of conventional conjugate gradient algorithm Hessian Matrix E' and E" are the first and second derivative information P k = search direction σ k = change in weight for second derivative λ k = regulating indefiniteness of the Hessian For a good quadratic approximation of E, a mechanism to raise and lower λ k is needed when the Hessian is positive definite. Initial values of σ k and λ k is important.
17
16 Quasi - Newton Algorithm By using only the first derivative information of E a sequence of matrices G (k) which represents increasingly accurate approximations to the inverse Hessian (H -1 ): Direct Newton method would be computational expensive !!!! The weight vector is updated using
18
17 Levenberg-Marquardt Algorithm The LM algorithm an approximation to the Hessian matrix in the following Newton-like update: μ is decreased after each successful step (reduction in performance function) and is increased only when a tentative step would increase the performance function. By doing this, the performance function will always be reduced at each iteration of the algorithm When the scalar μ is zero, this is just Newton's method, using the approximate Hessian matrix. When μ is large, this becomes gradient descent with a small step size.
19
18 Limitations of Conventional Design of Neural Nets What is the optimal architecture for a given problem (no of neurons and no of hidden layers) ? What activation function should one choose? What is the optimal learning algorithm and its parameters? To demonstrate the difficulties in designing “optimal” neural networks we will consider three famous time series benchmark problems ! (Reference 4)
20
19 Chaotic Time Series for Performance Analysis of Learning Algorithms Waste Water Flow Prediction The data set is represented as [ f(t), f(t-1), a(t), b(t), f(t+1) ] where f(t), f(t-1) and f(t+1) are the water flows at time t,t-1, and t+1 (hours) respectively. a(t) and b(t) are the moving averages for 12 hours and 24 hours. Mackey-Glass Chaotic Time Series Using the value x(t-18), x(t-12), x(t-6), x(t) to predict x(t+6). Gas Furnace Time Series Data This time series was used to predict the CO 2 concentration y(t+1). Data is represented as [ u(t), y(t), y(t+1) ]
21
20 Experimentation setup Changing number of hidden neurons 14,16,18,20 and 24 3 bench mark problems Mackey glass, Gas furnace and waster water time series Four Learning algorithms Backpropagation (BP), scaled conjugate algorithm (SCG), Quasi Newton (QNA) and Levenberg Marquardt algorithm (LM). Training terminated after 2500 epochs Changing the activation functions of hidden neurons Analyze the computational complexity of the different algorithms
22
21 Effect on Number of Hidden Neurons – Mackey Glass Lowest RMSE for LM = 0.0004 (24 hidden neurons)
23
22 Effect on Number of Hidden Neurons – Mackey Glass Lowest RMSE for LM = 0.0009 (24 hidden neurons)
24
23 Effect on Number of Hidden Neurons - Gas Furnace Series Lowest RMSE for LM = 0.009 (24 hidden neurons)
25
24 Effect on Number of Hidden Neurons - Gas Furnace Series Lowest RMSE for SCG = 0.033 (16 hidden neurons)
26
25 Effect on Number of Hidden Neurons - Waste Water Lowest RMSE for LM = 0.024 (24 hidden neurons)
27
26 Effect on Number of Hidden Neurons - Waste Water Lowest RMSE for SCG = 0.0820 (14 hidden neurons)
28
27 Effect on Activation Functions – Mackey Glass Series LSAF - Log sigmoidal activation function TSAF - Log sigmoidal activation function
29
28 Effect on Activation Functions – Gas Furnace LSAF - Log sigmoidal activation function TSAF - Log sigmoidal activation function
30
29 Effect on Activation Functions – Waste Water LSAF - Log sigmoidal activation function TSAF - Log sigmoidal activation function
31
30 Computational Complexity of learning algorithms
32
31 Difficulties to Design Optimal Neural Networks ? Experiments highlight the difficulty in finding an OPTIMAL network which is smaller in size, faster in convergence and with the best generalization error. using TSAF For Mackey Glass series LM gave the lowest generalization RMSE of 0.0009 with 24 hidden neurons using TSAF For gas furnace series the best RMSE generalization performance was obtained using SCG ( 0.033) using 16 neurons and TSAF. QNA gave marginally better generalization error when the activation function was changed from TSAF to LSAF
33
32 Difficulties to Design Optimal Neural Networks ? For wastewater series the best RMSE generalization performance was obtained using SCG (0.09) using TSAF. SCG's generalization error was improved (0.082) when the activation function was changed from TSAF to LSAF. In spite of computational complexity, LM performed well for Mackey Glass. For gas furnace and wastewater SCG algorithm performed better. This leads us to the following questions: What is the optimal architecture for a given problem? What activation function should one choose? What is the optimal learning algorithm and its parameters? Solution : Optimizing Artificial Neural Networks Using Global Optimization Algorithms Global Optimization Algorithms
34
33 Global Optimization Algorithms No need for functional derivative information No need for functional derivative information Repeated evaluations of objective functions Repeated evaluations of objective functions Intuitive guidelines (simplicity) Intuitive guidelines (simplicity) Randomness Randomness Analytic opacity Analytic opacity Self optimization Self optimization Ability to handle complicated tasks Ability to handle complicated tasks Broad applicability Broad applicability Disadvantage Computational expensive (use parallel engines)
35
34 Popular Global Optimization Algorithms Genetic algorithms Genetic algorithms Simulated annealing Simulated annealing Tabu search Tabu search Random search Random search Down hill simplex search Down hill simplex search GRASP GRASP Clustering methods Clustering methods Many others Many others
36
35 Evolutionary Algorithm – Flow Chart 10010110 01100010 10100100 10011001 01111101... 10010110 01100010 10100100 10011101 01111001... Selection reproduction Current generation Next generation Elitism
37
36 Evolutionary Artificial Neural Networks
38
37 Evolutionary Neural Networks – Design Strategy Complete adaptation is achieved through three levels of evolution, i.e., the evolution of connection weights, architectures and learning rules (algorithms), which progress on different time scales.
39
38 Meta Learning in Evolutionary Neural Networks Hybrid Method = Global Search + Gradient descent
40
39 Adaptive Learning by Evolutionary Computation
41
40 Genotype Representation of Connection Weights Genotype: 0100 1000 0111 0011 0001 0101 Binary representation
42
41 Crossover of Connection Weights
43
42 Crossover of Connection Weights
44
43 Mutation of Connection Weights
45
44 Genotype Representation of Architectures
46
45 Genotype Representation of Learning Parameters For BP deciding the optimal learning rate and momentum For BP deciding the optimal learning rate and momentum Learning parameter vectors were encoded as real- Learning parameter vectors were encoded as real- valued coefficients valued coefficients For SCGA, parameters controlling the weight for second For SCGA, parameters controlling the weight for second derivative approximation and parameter for regulating the derivative approximation and parameter for regulating the indefiniteness of the Hessian. indefiniteness of the Hessian. For QNA, scaling factors and step sizes For QNA, scaling factors and step sizes For LM, adaptive learning rate, initial values for learning For LM, adaptive learning rate, initial values for learning rate increasing and decreasing factor rate increasing and decreasing factor
47
46 Hierarchical Representation Evolutionary Neural Networks
48
47 Parameters used for evolutionary design Population size40 Maximum no of generations50 Initial number of hidden nodes5-16 Activation functions tanh, logistic, sigmoidal, tanh- sigmoidal, log-sigmoidal Output neuronlinear Training epochs500 Initialization of weights+/- 0.3 Ranked based selection0.50 Mutation rate0.1
49
48 EANN Convergence Using BP Learning
50
49 EANN Performance - Mackey Glass series
51
50 EANN Performance - Gas Furnace Series
52
51 EANN Performance – Waste Water Flow Series
53
52 Performance Evaluation among EANN and ANN Learning algorithm EANNANN RMSE Hidden Layer Architecture RMSE BP0.00777(T), 3(LS)0.043724(TS) SCG0.003111(T)0.004524(TS) QNA0.00276(T),4(TS)0.003424(TS) LM0.00048(T),2(TS),1(LS)0.000924(TS) Mackey Glass series Hidden Layer Architecture
54
53 Performance Evaluation among EANN and ANN Learning algorithm EANNANN RMSE Architecture RMSE Architecture BP0.03588(T)0.076618(TS) SCG0.02108(T),2(TS)0.033016(TS) QNA0.02567(T),2(LS)0.037618(TS) LM0.02236(T),1(LS),1(TS)0.045114(TS) Gas Furnace Time series
55
54 Performance Evaluation among EANN and ANN Waste Water Time series Learning algorithm EANNANN RMSE Architecture RMSE Architecture BP0.05476(T),5(TS),1(LS)0.136016(TS) SCG0.05796(T),4(LS)0.082014(LS) QNA0.08235(T),5(TS)0.127614(TS) LM0.05218(T),1(LS)0.095114(TS)
56
55 Efficiency of Evolutionary Neural Nets Designing the architecture and correct learning algorithm is a tedious task for designing an optimal artificial neural network. For critical applications and H/W implementations optimal design often becomes a necessity. Disadvantages of EANNs Comput ational complexity, Success depends on genotype representation. Empirical results show the efficiency of EANN procedure Average Number of hidden neurons reduced by more than 45% Average RMSE on test set down by 65% Future works More learning algorithms, evaluation of full population information (final generation).
57
56 Advantages of Neural Networks Universal approximators Universal approximators Capturing associations or discovering regularities within a set of patterns Can handle large no of variables and huge volume of data Useful when conventional approaches can’t be used to model relationships that are vaguely understood
58
57 Evolutionary Fuzzy Systems
59
58 Fuzzy Expert System A fuzzy expert system to forecast the reactive power (P) at time t+1 by knowing the load current (I) and voltage (V) at time t. The experiment system consists of two stages: Developing the fuzzy expert system and performance evaluation using the test data. The model has two input variables (V and I) and one output variable (P). Training and testing data sets were extracted randomly from the master dataset. 60% of data was used for training and remaining 40% for testing.
60
59 Fuzzy Expert System - Some Illustrations No. of MF's Mamdani FISTakagi - Sugeno FIS Root Mean Squared Error TrainingTestTrainingTest 20.4010.3970.0240.023 30.3480.3340.0170.016 Different quantity of Membership Functions
61
60 Mamdani FISTakagi - Sugeno FIS Root Mean Squared Error TrainingTestTrainingTest 0.2430.2400.0210.019 Different shape of Membership Functions Fuzzy Expert System - Some Illustrations
62
61 Mamdani FISTakagi - Sugeno FIS Root Mean Squared Error TrainingTestTrainingTest 0.2210.2190.0190.018 For different fuzzy operators Fuzzy Expert System - Some Illustrations
63
62 Mamdani FISTakagi - Sugeno FIS Defuzzification RMSE Defuzzification RMSE TrainingTestTrainingTest Centroid0.2210.0219 Weighted sum 0.0190.018 MOM0.2300.232 Weighted average 0.0850.084 BOA0.2180.216 SOM0.2290.232 For different defuzzification operators Fuzzy Expert System - Some Illustrations
64
63 Summary of Fuzzy Modeling Surface structure Relevant input and output variables Relevant fuzzy inference system Number of linguistic terms associated with each input / output variable If-then rules Deep structure Type of membership functions Building up the knowledge base Fine tune parameters of MFs using regression and optimization techniques
65
64 Evolutionary Design of Fuzzy Controllers Disadvantage of fuzzy controllers Requirement of expert knowledge to set up a system - Input-output variables, - Type(shape) of membership functions (MFs), - Quantity of MFs assigned to each variables, - Formulation of rule base. Advantages of evolutionary design - To minimize expert (human) input - Optimization of membership functions (type and quantity) - Optimization of rule base - Optimization / fine tuning of pre-existing fuzzy systems
66
65 Evolutionary Design of Fuzzy Controllers
67
66 Parameterization of Membership Functions changing “ p ” ” changing “ q ” ” changing “ r ” ”
68
67 Parameterization of T-norm Operator
69
68 Parameterization of T-conorm Operator
70
69 Learning with Evolutionary Fuzzy Systems Evolutionary algorithms are not learning algorithms. They offer a powerful and domain independent search method for a variety of learning tasks. Three popular approaches in which evolutionary algorithms have been applied to the learning process of the fuzzy systems: - Michigan approach - Pittsburgh approach - Iterative rule learning Description of the above techniques follows …..
71
70 Michigan Approach The chromosomes are individual rules and a rule set is represented by the entire population The performance system interacts with the environment and contains the rule base and the production system The credit assignment system develops rule by the modification of conflict resolution parameters of the classifier (rule) set and their strengths. The classifier discovery system that generates new classifiers, rules, from a classifier set by means of evolutionary techniques.
72
71 Pittsburgh Approach The chromosome encodes a whole rule base or knowledge base. Crossover helps to provide new combination of rules Mutation provides new rules Variable-length rule bases are used in some cases with special genetic operators for dealing with these variable-length and position independent genomes While Michigan approach might be useful for online- learning Pittsburgh approach seem to be better suited for batch-mode learning.
73
72 Iterative Rule Learning Approach The chromosome encodes individual rules like in Michigan approach. Only the best individual is considered to form part of the solution. The procedure…. 1. Use a EA to obtain a rule for the system 2. Incorporate the rule into the final set of rules 3. Penalize this rule 4. If the set of rules obtained till now is adequate to be a solution o the problem, the system ends up returning the set of rules as the solution. Else return to step 1.
74
73 Genotype Representation of Membership Functions “n” asymmetrical triangular membership functions Specified by the center, left base width and right base width Incorporate prior knowledge
75
74 Genetic Representation of Fuzzy Rules Chromosome representing “m” fuzzy rules 1 stands for a selected and 0 for a non-selected rule Length of the string depending on the number of input and output variables. 3 input variables composed of 3,2,2 fuzzy sets 1 output variable composed of 3 fuzzy sets High level representation – reduces computational complexity
76
75 Evolutionary Design of Fuzzy Control Systems : A Generic Frame Work
77
76 Fuzzy logic in Reality (Industrial Applications) Efficient and stable control of car- engines (Nissan) Simplified control of robots (Hirota, Fuji Electric, Toshiba, Omron) Industrial control applications (Aptronix, Omron, Meiden, Micom, Mitsubishi, Nissin-Denki, Oku- Electronics) Archiving system for documents (Mitsubishi Elec.) Prediction system for early recognition of earthquakes (Bureau of Metrology, Japan) Recognition of handwritten symbols with pocket computers (Sony) Video cameras (Canon, Minolta) Washing-machines (Matsushita, Hitatchi, Samsung) Recognition of handwriting, objects, voice (Hitachi, Hosai Univ., Ricoh) Efficiency for elevator control (Fujitec, Hitachi, Toshiba) Positioning of wafer-steppers in the production of semiconductors (Canon) Automatic control of dam gates for hydroelectric-power plants (Tokyo Electric Power)
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.