Presentation is loading. Please wait.

Presentation is loading. Please wait.

Yuri R. Tsoy, Vladimir G. Spitsyn, Department of Computer Engineering

Similar presentations


Presentation on theme: "Yuri R. Tsoy, Vladimir G. Spitsyn, Department of Computer Engineering"— Presentation transcript:

1 USING GENETIC ALGORITHM WITH ADAPTIVE MUTATION MECHANISM FOR NEURAL NETWORKS DESIGN AND TRAINING
Yuri R. Tsoy, Vladimir G. Spitsyn, Department of Computer Engineering Tomsk Polytechnic University Good afternoon, My name is Yuri Tsoy, I represent Computer Science Department of Tomsk Polytechnic University. Let me introduce a report “USING GENETIC ALGORITHM WITH ADAPTIVE MUTATION MECHANISM FOR NEURAL NETWORKS DESIGN AND TRAINING” 9th Korea-Russia International Symposium on Science and Technology - KORUS 2005 June 26 - July 2, 2005, Novosibirsk State Technical University, Novosibirsk

2 Report contents 1. Introduction 2. Description of the algorithm
3. Adaptive mutation mechanism 4. Results of experiments 5. Implementation 6. Conclusion My report consists of 6 sections. You can see their titles on the slide. So let us proceed to the first Section: Introduction 9th Korea-Russia International Symposium on Science and Technology - KORUS 2005 June 26 - July 2, 2005, Novosibirsk State Technical University, Novosibirsk

3 1.Genetic algorithms and neural networks
Genetic algorithms (GAs) use evolutionary concept (heredity, mutability and natural selection) to solve optimization tasks. The idea of Artificial neural networks (ANNs) is inspired by functionality of human brain. ANNs are often used to solve classification and approximation tasks Genetic algorithm (GA) uses evolutionary concept to solve optimization tasks. Due to mechanisms of heredity, mutability and selective pressure, GAs show great adaptive abilities. The idea of artificial neural networks (ANN) is inspired by functionality of human brain and ANNs are often used to solve classification and approximation problems. 9th Korea-Russia International Symposium on Science and Technology - KORUS 2005 June 26 - July 2, 2005, Novosibirsk State Technical University, Novosibirsk

4 + + 1. Use of neural networks Preparation of Training Data Definition
of Structure (Design) Tuning of Weights (Training) + + Genetic Algortihm Use of neural networks can be roughly divided in 3 stages: 1. Preparation of training data 2. Definition of structure of neural network (so-called design of neural net) 3. Tuning of weights of connections (also known as training of neural network) Although genetic algorithm can be applied to all 3 stages we will consider only the 2nd and the 3rd stages: design and training. Use of genetic algorithms for design and training of neural networks is often referred as neuroevolution. Neuroevolution 9th Korea-Russia International Symposium on Science and Technology - KORUS 2005 June 26 - July 2, 2005, Novosibirsk State Technical University, Novosibirsk

5 2. Description of the algorithm
NEvA – NeuroEvolutionary Algorithm 1. Simultaneous design and tuning of neural network 2. 1 individual = 1 neural network 3. Population = set of neural networks The developed algorithm (NEvA) models evolution of the neural networks population. Genotype of each organism contains information about connections and their weights of the network corresponding to this organism. An example is given on the [slide]. 9th Korea-Russia International Symposium on Science and Technology - KORUS 2005 June 26 - July 2, 2005, Novosibirsk State Technical University, Novosibirsk

6 2. Description of the algorithm
Initial population: Networks without hidden neurons. Weights of connections are encoded with 19-bit within range [-26,2144; 26,2144] with precision 0,0001. Growth of the complexity of networks during the process of evolution. Neurons with log-sigmoid activation function: There are some changes in comparison with the description in published paper because we’ve found better encoding parameters during our experiments. The weight of connection is encoded with 19 bits and its value lies within [-26,2144; 26,2144] range with precision of 0,0001. So obtained results differ from results in paper. In initial population all organisms represent networks without hidden neurons. Connections weights are initialized randomly in range [-0,5; 0,5]. During the process of evolution networks “grow” and become more complex. That is new neurons and new connections appear. We used neurons with log-sigmoid activation function. 9th Korea-Russia International Symposium on Science and Technology - KORUS 2005 June 26 - July 2, 2005, Novosibirsk State Technical University, Novosibirsk

7 3. Adaptive mechanism of mutation
Types of mutation: Addition of random connection Deletion of random connection Addition of random neuron Deletion of random neuron Change of weight of random connection Mutation mechanism allows to add and to delete neurons and connections and also to change connection weight for the random value. Selection of type of mutation performed with respect to the neurons number and number of connections. 9th Korea-Russia International Symposium on Science and Technology - KORUS 2005 June 26 - July 2, 2005, Novosibirsk State Technical University, Novosibirsk

8 3. Adaptive mechanism of mutation
fC percentage of implemented connections fN depends on number of hidden neurons FC = fC2 FN = FC * fN2 Rnd is a random number NH is a number of hidden neurons Rnd > FC Delete random neuron Add connection NH > 0 no Rnd > FN yes Rnd > FC & NH > 0 Schematically the algorithm of selection of the applicable mutation type is shown in [slide]. Some notation is also presented, where fC and fN are heuristics that allow us to control a speed of growth of the neural network. To simplify, the adaptive mutation algorithm can be divided into two branches (by the first conditional transition) according to the influence on “linkage” degree presented by the fC coefficient. These branches are: 1) Branch of fC decrease (the rightmost branch). 2) Branch of fC increase (the leftmost branch). Since neuron deletion can lead either to decrease as well as to increase of fC (it depends on the number of connections associated with this neuron) then this type of mutation exists in both branches. Thus the main factor for regulation of the network structure is its “linkage” degree. 9th Korea-Russia International Symposium on Science and Technology - KORUS 2005 June 26 - July 2, 2005, Novosibirsk State Technical University, Novosibirsk

9 3. Adaptive mechanism of mutation
Example: 1 3 4 2 x1 x2 y1 Rnd > FC Delete random neuron Add connection NH > 0 no Rnd > FN yes Rnd > FC & NH > 0 fC = 6 /(0,5*(5*4 – 2*1 – 1*0)) = 0,667 fN = 3/5 = 0,6 Let’s look at an example of mutation type selection with respect to the introduced scheme. Let the network on the [slide] is undergoing mutation. It is necessary to define which type of mutation change will be applied to this network. To estimate values of the probabilities of different mutation types the coefficients FC and FN should be calculated first (results are shown on the [slide]). After the coefficients are calculated we can estimate what type of mutation is likely to be applied. As it can be seen the probability that a random hidden neuron will be deleted is higher than the other probabilities although the probability to add a connection is also high. In fact we don’t calculate these probabilities in implementation of our neuroevolutionary algorithm, we just follow the scheme (showing the scheme) of mutation type selection instead. FC = fC2 = 0,445, FN = FC * fN2 = 0,160 p1 = ( 1 - FC )(1-FC+FC) = 0,308 p2 = 0,454, p3 = 0,071, p4 = 0,166 p1 p2 p3 p4 9th Korea-Russia International Symposium on Science and Technology - KORUS 2005 June 26 - July 2, 2005, Novosibirsk State Technical University, Novosibirsk

10 3. Adaptive mechanism of mutation
Probability of mutation: “Decrease of probability of mutation during the evolution improves performance of the genetic algorithm” (Schаffer, Caruana, Eshelman, Das, 1989, Goldberg, 1989, Eiben, Hinterding, Michalewicz,1999, Igel, Kreutz, 2003 ) NC – number of connections in neural network represented by individual. Mutation event is “gambled” NI+NO times for each individual, where NI and NO – number of inputs and outputs in networks respectively. Probability and type of mutation are defined individually for each network. Not only a mutation type, but also a mutation operator probability denoted as Pm (showing) is defined adaptively. It’s been noted by many researchers that decrease of the probability of mutation during the evolution improves performance of the genetic algorithm. So then, since we use a concept of “growing” neural networks, it is possible to say that mutation effect decreases as evolution goes. In other words the larger networks we evolve the lesser mutations occur and the more combinations of the existing networks as a result of crossing are used. 9th Korea-Russia International Symposium on Science and Technology - KORUS 2005 June 26 - July 2, 2005, Novosibirsk State Technical University, Novosibirsk

11 4. Results of experiments
Exclusive OR Average number of object function calculations NH NC Population size NEAT1 4755 2,35 7,48 150 NEAT2 6612 3,12 11,72 CGA 13165,63 4 17 50 BP 5338,28 - BPM 828,32 NEvA (no adaptive Pm) 6026,26 4,8 16,22 13 NEvA (19-bit) 6063,4 4,78 15,86 To test the introduced algorithm the following two problems are chosen: exclusive OR problem (classification task) and inverted pendulum problem by R.Sutton (approximation task). We first introduce results for XOR problem ([slide]). The performance of NEvA algorithm is compared with that of canonical genetic algorithm (CGA) from, NEAT and two variants of back-propagation learning algorithm: simple (BP) and with momentum (BPM). We also made a comparison of NEvA with and without adaptive mutation probability (Pm). The performance is estimated with use of average number of object function calculations, i.e. how many times a training data set was introduced (the less, the better). Here NH is a number of hidden neurons and NC is a number of connections in the network. Although both NEvA and NEAT lose to gradient methods, we should note that the task was much more complex for neuroevolutionary algorithms, since not only weights but also network structures were tuned. And also NEvA outperforms CGA as many as 2 times and shows better performance than NEAT2, although NEAT networks are much simpler (the number of hidden neurons and total number of connections is much less). NEvA (19-bit encoding) with adaptive mutation probability is slightly worse than algorithm without such ability, and networks in average are little “better” than networks obtained by NEvA algorithm without adaptive Pm. 9th Korea-Russia International Symposium on Science and Technology - KORUS 2005 June 26 - July 2, 2005, Novosibirsk State Technical University, Novosibirsk

12 4. Results of experiments
1 pole balancing Number of trials NH NC Mean Best Worst GENITOR 1846 272 7052 5 35 SANE 535 70 1910 8 40 ESP 285 11 1326 30 NEvA (no adaptive Pm) 451 37 3872 0,58 5,22 NEvA (19-bit) 358,2 1670 1,18 6,5 For the single pole balancing benchmark with discrete force impact we compared NEvA with other evolutionary algorithms such as GENITOR, SANE and ESP [9] algorithms. The goal is to try to balance 1 or 2 poles on the cart for approximately 30 minutes of the modeling time, moving this cart in the horizontal direction straightforward or backwards. Number of trials, before successful network is found, was estimated (the less, the better). Results averaged over 50 runs are presented in [slide]. By obtained results NEvA loses only to ESP algorithm and performs better than all the other algorithms. Neural networks, obtained by NEvA, are much better then networks for ESP, SANE and GENITOR in all cases. Note that results of NEvA with adaptive mutation rate have better characteristics than NEvA with fixed Pm since the difference between the best and the worst results is much less. This means that NEvA with fully adaptive mutation operator showed more stable results for 1-pole balancing problem. 9th Korea-Russia International Symposium on Science and Technology - KORUS 2005 June 26 - July 2, 2005, Novosibirsk State Technical University, Novosibirsk

13 4. Results of experiments
2 poles balancing Average number of trials Population size NH NC Evolutionary programming 307200 2048 N/A SANE 12600 200 ESP 3800 5 35 NEAT 3578 150 (0–4) (6–15) NEvA (no adaptive Pm) 3777 59 1,4 7,24 NEvA (19-bits) 1338,92 32 0,58 7,2 For the 2 poles balancing the goal is the same as for previous problem, but now there are 2 poles of different length that should be balanced on the cart and also force impact takes continuous values. Number of trials, before successful network is found, was also estimated (the less, the better). Results averaged over 50 runs are presented in [slide]. Results for 2-pole problem with continuous force impact and also information about obtained neural networks are shown on the next [slide].We compared results for NEvA with results of evolutionary programming, SANE, ESP and NEAT. The performance of NEvA without adaptive Pm is comparable with that of ESP and NEAT. The results for NEvA with fully adaptive mutation mechanism surpass the results for all the other algorithms. 9th Korea-Russia International Symposium on Science and Technology - KORUS 2005 June 26 - July 2, 2005, Novosibirsk State Technical University, Novosibirsk

14 4. Future plans “Parameterless” variant of NEvA with adaptive mutation mechanism and adaptive population sizing. Some preliminary results: XOR (NH : NC) 1-pole balancing 2-pole balancing NEvA (adaptive mutation) 6063,4 (4,78 : 15,86) 358,2 (1,18 : 6,5) 1338,92 (0,58 : 7,22) NEvA (adaptive mutation + adaptive population sizing) 6608,84 (5,14 : 17,26) 634,22 (1,26 : 6,74) 1464,54 (0,62 : 7,32) For now we developed “parameterless” variant of NEvA, i.e. we have adaptive mutation and adaptive population sizing strategy, whereas selection and crossover parameters are fixed as a result of numerous experiments. The only parameter that is to be set by user is desired training error. Although fully adaptive algorithm is very simple to use, its results are some worse than results for NEvA without adaptive sizing ([slide]), but we should note that difference is not critical. We’ve also tested fully adaptive NEvA on Proben1 benchmarking test set including medicine and industry real-world data and obtained encouraging results. Nevertheless we’re still working on improvement of sizing strategy to achieve better performance. 9th Korea-Russia International Symposium on Science and Technology - KORUS 2005 June 26 - July 2, 2005, Novosibirsk State Technical University, Novosibirsk

15 5. Implementation The introduced algorithm is implemented to comply with the architecture of the software environment “GA Workshop” General scheme of “GA Workshop” The introduced algorithm is implemented to comply with the architecture of the software environment “GA Workshop”. You can see the structure of this environment on the [slide]. The notion of the “GA Workshop” is to separate solving algorithm from the problem solved and the processing of run data with use of design patterns. 9th Korea-Russia International Symposium on Science and Technology - KORUS 2005 June 26 - July 2, 2005, Novosibirsk State Technical University, Novosibirsk

16 5. “GA Workshop” At the time “GA Workshop” is under construction (there is no GUI), although it is already possible to use it for researches. The following researches have been done: Experiments with NEvA. Study of quasi-species model by M. Eugen. Study of majoring model by V.G. Red’ko. Investigation of simple population sizing techniques. Numerical optimization with use genetic algorithm. Experiments with compensatory genetic algorithm. At the time “GA Workshop” is under construction (there is no GUI yet), although it is already possible to use it for researches. Nevertheless we have already used this environment for several researches on different evolutionary algorithms and problems (details on slide). 9th Korea-Russia International Symposium on Science and Technology - KORUS 2005 June 26 - July 2, 2005, Novosibirsk State Technical University, Novosibirsk

17 5. “GA Workshop” Features:
3 different variants of genetic algorithm, including “standard” GA, compensatory GA and NEvA; 5 different crossover operators for the binary encoded GA and 4 different crossover operators for NEvA; 4 selection strategies; 13 benchmark problems; 3 different population sizing strategies; For now the following features presented on the slide are implemented. Most of these parameters can be combined with parameters of genetic algorithm (such as population size, genetic operators type and probabilities, different stop criteria). This allows examination of the performance of GA with different characteristics possible. Some of the parameters are algorithm-independent. For example, implemented variants of selection as well as population sizing strategies can be used without any changes with different variants of genetic algorithms. 9th Korea-Russia International Symposium on Science and Technology - KORUS 2005 June 26 - July 2, 2005, Novosibirsk State Technical University, Novosibirsk

18 5. “GA Workshop” Data available for analysis:
Data that describes each generation of GA (fitness distribution for each generation averaged over multiple launches). Data that describes GA behavior (dynamics of averaged mean, the best and the worst fitness value for each generation, dynamics of averaged deviation of fitness, time per launch in milliseconds). Data that describes obtained solutions (number of object function calculations until solution is found, time until the first solution is found in milliseconds. All solutions and some data describing additional properties of the solutions are output into separate file for further analysis and use) The data that is available for analysis of results of multiple independent launches of genetic algorithm is presented on this slide. It is data that characterizes evolutionary process, fitness distributions for each generation of genetic algorithm, statistics about solutions or candidate-solutions. Some additional parameters depending on the peculiarities of the evolutionary algorithm or the task solved can also be written. All data can be extracted for further statistical analysis and drawing of plots, charts and so on.. 9th Korea-Russia International Symposium on Science and Technology - KORUS 2005 June 26 - July 2, 2005, Novosibirsk State Technical University, Novosibirsk

19 Conclusion Results of experiments showed that NEvA performance is comparable and in some cases surpass results of other algorithms for the reviewed problems (XOR problem and full information pole balancing) Adaptive mutation rate caused increase of performance in comparison with NEvA with fixed Pm (up to 40% for 2-pole balancing task), although resulting networks were slightly worse for 1-pole balancing problems. Results of experiments showed that NEvA performance is comparable and in some cases surpass results of other algorithms for the reviewed problems (XOR problem and full information pole balancing). Due to adaptive selection of the type of mutation there is no need in direct global limiting variables or deterministic rules for network structure evolution, because addition or deletion of the new elements performed depending on individual peculiarity of each phenotype. Adaptive mutation rate caused increase of performance in comparison with NEvA with fixed Pm (up to 40% for 2-pole balancing task), although resulting networks were slightly worse for 1-pole balancing problems. Use of mutation that deletes connections can help to get rid of insignificant network inputs thus decreasing a search space complexity. The NEvA algorithms is implemented in “GA Workshop” software environment. 9th Korea-Russia International Symposium on Science and Technology - KORUS 2005 June 26 - July 2, 2005, Novosibirsk State Technical University, Novosibirsk

20 Thank you for your attention!
9th Korea-Russia International Symposium on Science and Technology - KORUS 2005 June 26 - July 2, 2005, Novosibirsk State Technical University, Novosibirsk


Download ppt "Yuri R. Tsoy, Vladimir G. Spitsyn, Department of Computer Engineering"

Similar presentations


Ads by Google