Evolution of Complex Behavior Controllers using Genetic Algorithms Title.

Evolution of Complex Behavior Controllers using Genetic Algorithms Title

http://www.cs.unr.edu/~gruber2 Kerry Gruber (UNR now Intel) Jason Baurick (UNR) Sushil J. Louis (UNR) Funded in part from grant number 9624130 from NSF

http://www.cs.unr.edu/~gruber3 G.A.T.O.R.S Genetic Algorithm Training of Robot Simulations

http://www.cs.unr.edu/~gruber4 General Description Use artificial neural networks for control of the simulated robots Evolve the weights of the neural networks using a Genetic Algorithm (GA)

http://www.cs.unr.edu/~gruber5 Goals Develop controllers exhibiting generalized complex behavior –Perform complex spatially-independent tasks –Able to perform adequately under varying environmental conditions Performance which meets or exceeds that of controllers designed by humans Use a minimum of state information

http://www.cs.unr.edu/~gruber6 Why use a GA to train the neural network? Training is based on actual performance instead of expected performance –Supervised training models rely on the designer’s understanding of the environment and the expected consequences of input to output relationships –GA uses whole-run performance; instead of single- step input to output relationships

http://www.cs.unr.edu/~gruber7 Vacuum Cleaning (Cover as larger an area as possible in the time allotted.) Move through the environment without retracing previous steps, and do so with no spatial information of the current or previous locations occupied Recharge by locating and accessing energy supplies/outlets (prey capture scenarios) –Energy supplies may only be used by one unit at a time and are not accessible for a time afterwards (only one vacuum cleaner to an outlet) Interact with obstacles in the environment without crashing (obstacle avoidance) Negotiate obstacles without crashing (wall following)

http://www.cs.unr.edu/~gruber8 Simulator Robots –Predators (Vacuum cleaners) –Prey (Energy supplies/outlets) Environment –300X300 Spatially Independent Grid –Contains Obstacles Simulation Process –1000 time steps

http://www.cs.unr.edu/~gruber9 Simulation

http://www.cs.unr.edu/~gruber10 Predators Two independent motors –1 on each side –4 possible states per motor Battery –Depleted as the robot moves –May be recharged by consuming prey Five binary touch sensors –4 feelers, 1 crash sensor Two real-valued hearing sensors

http://www.cs.unr.edu/~gruber11 Robot Sensor Positions

http://www.cs.unr.edu/~gruber12 Prey/Battery Stationary Emit Sound (signal) Audible to Predators –Inversely proportional to square of distance. –Cut off outside hearing range May be Consumed by Predators –Only a single predator may have access at a time –May not be re-used for a certain period of time

http://www.cs.unr.edu/~gruber13 Environment 300X300 Spatially Independent Grid 3-10 Obstacles (5-10% of area each) 5 Unit Border (Assures entire area not covered)

http://www.cs.unr.edu/~gruber14 Random Environment

http://www.cs.unr.edu/~gruber15 Simulation Process 1000 Steps Each predator randomly given a chance to move –Provided touch and hearing sensor levels New position is determined and battery levels decreased in accordance with motor settings –If boundary crossed, moved outside of boundary and crash registered (battery still decreased as if no crash occurred) If in contact with “live” prey, battery recharged and prey consumed New sensor states determined If battery depleted, predator considered “dead” Sleeping prey awaken if timer expired

http://www.cs.unr.edu/~gruber16 Neural Network Two-layer fully connected artificial neural network Sigmoid activation function Each node has a bias 10 Inputs –5 binary touch sensors –2 hearing sensors (Using binary states dependent on side and presence) –2 binary hearing states –Battery level 1-10 Hidden nodes 4 Output nodes; output threshold of 0.5

http://www.cs.unr.edu/~gruber17 Virtual Prey Location

http://www.cs.unr.edu/~gruber18 Hearing Sensor States Presence of a virtual food source required a minimum of state information in order to find and capture prey. Hearing sensor levels used to determine whether any prey could be heard and the side they were on Input as two binary levels Save and used as state information during the next step

http://www.cs.unr.edu/~gruber19 GA-Encoding 16-bit binary representation of weights (1024-bit string for 4 hidden nodes) Input weights range from -100 to 100 Biases range from -100*N to 100*N (N=number of inputs)

http://www.cs.unr.edu/~gruber20 GA-Initialization Input weights randomly initialized over full range Operating point initialization –Biases set at -0.5*  Input Weights

http://www.cs.unr.edu/~gruber21 Transition Probability Distribution 75% of all nodes with the bias set by random generation never have a state transition during the first generation. With operating point initialization, 99.6% transition for 512 of 1024 possible input combinations. (10 million randomly generated nodes.)

http://www.cs.unr.edu/~gruber22 First Generation Coverage Probability Distribution Operating point initialization more evenly distributes the initial coverage values used for fitness determination. Other feature measurements show similar differences. (250K random networks.)

http://www.cs.unr.edu/~gruber23 Fitness vs. Initialization Method Using operating point initialization, the GA progresses at a higher rate because the initial nodes are actually operational. (Differences in the random environments cause the high fluctuation of values.)

http://www.cs.unr.edu/~gruber24 GA-Fitness Function Use five features –Area coverage –Number of prey consumed –Distance covered –Number of crashes –Number of obstacle touches Relative fitness based on the averages and standard deviations of each generation F i =  W f * 2 (Xif-  f)/  f where : F i is the fitness for the i th feature W f is the weight of the feature Xif is the score for a given feature -  f is the average value of a feature for a given generation  f is the standard deviation of a feature for a given generation

http://www.cs.unr.edu/~gruber25 GA-Fitness Function Cont. Only feature scores which indicate operation are included in the average and deviation calculation –Non functioning units tend to lower averages –Non functioning units increase deviations –Leads to insufficient selection pressure The deviation for crashes is set to the average if the deviation is greater than the average –There is clear cut-off for this feature –High deviations and low averages lead to insufficient selection pressure as the GA matures

http://www.cs.unr.edu/~gruber26 Training Competitive environment; 3 human- programmed controllers Random obstacles, prey, initial positions, and environment variables generated for each generation Random variables same for each chromosome

http://www.cs.unr.edu/~gruber27 Training Variables Variables are set at the beginning of each generation and maintained for all chromosomes.

http://www.cs.unr.edu/~gruber28 GA Setting for Final Controller

http://www.cs.unr.edu/~gruber29 Final Selection For testing purposes, final controller was selected by hand based on objective performance in pre-defined environments Web demonstration uses the best sum of fitnesses of the top 20 controllers over 10 different environments

http://www.cs.unr.edu/~gruber30 Implementation 8 node Beowulf cluster –PII 400MHz machines –Red Hat Linux –LAM version of MPI 3 Java interface applet/applications –Configuration –Training –Simulation Display

http://www.cs.unr.edu/~gruber31 Speed-Up

http://www.cs.unr.edu/~gruber32 Results-Average Area Coverage Averages using 100 different random simulation settings in 100 different environments.

http://www.cs.unr.edu/~gruber33 Results-Average Number of Touches Averages using 100 different random simulation settings in 100 different environments.

http://www.cs.unr.edu/~gruber34 Results-Average Distance Test unit covers more area, but less distance. Indicates a slower speed and better energy conservation.

http://www.cs.unr.edu/~gruber35 Results-Coverage vs. Prey Sleep Time Final controller is less affected by variations in prey sleep time. (Sleep time increment over 100 iterations, results averaged from 100 random environments at each setting.)

http://www.cs.unr.edu/~gruber36 Results-Crashes vs. Hearing Range Final controller performs poorly in relation to crashes as the hearing range is increased. (Hearing range increment over 100 iterations, results averaged from 100 random environments at each setting.)

http://www.cs.unr.edu/~gruber37 Results-Crashes vs. Noise Bias Final controller performs adequately in the presence of noise (Noise increment over 100 iterations, results averaged from 100 random environments at each setting.)

http://www.cs.unr.edu/~gruber38 Results-Coverage for Robots Trained w/wo Noise Controllers trained in the presence of noise are less susceptible to its effects, but do not reach peak performance. (Noise increment over 100 iterations, results averaged from 100 random environments at each setting.)

http://www.cs.unr.edu/~gruber39 Conclusions Result controller surpassed those produced by humans in the areas of coverage and energy conservation All stages of the GAs progression must be taken into account in the fitness function to achieve acceptable results –Operating point initialization guarantees function nodes during early generations; and appears to increase GA performance –Relative scoring functions appears to provide good selection pressure over the life of the GA

Evolution of Complex Behavior Controllers using Genetic Algorithms Title.

Similar presentations

Presentation on theme: "Evolution of Complex Behavior Controllers using Genetic Algorithms Title."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Evolution of Complex Behavior Controllers using Genetic Algorithms Title.

Similar presentations

Presentation on theme: "Evolution of Complex Behavior Controllers using Genetic Algorithms Title."— Presentation transcript:

Similar presentations

About project

Feedback