A TUTORIAL Stefano Nolfi Dario Floreano

A TUTORIAL Stefano Nolfi Dario Floreano
Neural Systems & Artificial Life National Research Council Roma, Italy Dario Floreano Microengineering Dept. Swiss Federal Institute of Technology Lausanne, Switzerland

Stefano Nolfi & Dario Floreano, 2000
The method fitness function genotype-to-phenotype mapping Evolutionary Robotics is the attempt to develop robots and their sensorimotor control systems through an automatic design process involving artificial evolution. The basic idea behind Evolutionary Robotics is the following: An initial population of different "genotypes", each codifying the control system (and possibly the morphology) of a robot, are created randomly. Then, each genotype in translated into the corresponding phenotype and tested in the environment for its ability to perform a desired task. After that, the robots that have obtained the highest fitness are allowed to reproduce by generating copies of their genotypes with the addition of changes introduced by some genetic operators such as mutations, duplication etc.. The process is repeated for a certain number of generations until, hopefully, the desired performances are achieved. The experimenter design the fitness function, that is a criterion that is used to measure how much an individual robot is able to accomplish the desired task. Moreover, the designer usually specifies how genetic information (which are usually encoded as a sequence of binary values) are translated into the corresponding phenotype. However, the mapping between the genotype and phenotype is usually task independent. As a consequence, the evolved behaviors and the way in which behaviors are produced are largely the results of a self-organization process. TUTORIAL Stefano Nolfi & Dario Floreano, 2000

Behavior-Based Robotics & ER
locomote avoid hitting things explore manipulate the world build maps sensors actuators behavior-based robotics [Brooks, 1986] evolutionary robotics ? sensors actuators Generally speaking we can say that this approach shares many but not all of its characteristics with other related approaches. One of these approaches is the Behavior Based Robotics which is based upon the idea of providing the robot with a hierarchically organised set of sensory-motor processes and letting the environment to determine which behaviors should have control at any given time. In this approach, as in Evolutionary Robotics, the environment has a central role. In particular the environmental conditions determine which behavior is active at any given time. Moreover, robots are usually designed in an incremental fashion through a trial and error process in which the designer add a new higher level layer only after testing and debugging the behavior resulting from the lower layers in the environment. However, Evolutionary Robotics, by relying on an automatic evaluation process, usually makes a much wider use of this trial and error process. Moreover, while in the behavior based approach the behavior with which the robot solve the task is at least partially designed by the experimenter who decide its articulation in sub- components, in the Evolutionary Robotics approach it is the result of a self-organizing process. Indeed, the entire organization of the evolving system, including its eventual organization into sub-components, is the result of an adaptation process. TUTORIAL Stefano Nolfi & Dario Floreano, 2000

Learning Robotics & ER sensors motors desired output or
[Kodjabachian & Meyer, 1999] sensors motors desired output or teaching signal Other related approaches are based on learning. The idea, in this case, is that a control system (typically a neural network) can be trained using incomplete data and then allowed to rely on its ability to generalize the acquired knowledge to novel circumstances. The researcher, in this case, should provide to the learning network a feedback that may be the desired state of the actuators each time step or a reinforcement signal indicating how good or bad the network is doing time to time. Evolutionary Robotics shares the emphasis on self-organization with this approach. Indeed, artificial evolution may be described as a form of learning. However, Evolutionary Robotics differs from Learning Robotics in two respects. First, the amount of supervision required by evolution is generally much lower - only a global evaluation of how well an individual accomplishes the desired task is required. Second, learning algorithms introduce constraints on what is acquired through learning (in practice only the connection weights are modified during the learning process). The evolutionary method, instead, does not introduce any constraint on what can be subjected to self-organization. Indeed the characteristics of the sensors and of the actuators, the shape of the robot and the architecture of the control system can be subjected to the evolutionary process. In this slide here, for example, you can see the developmental and phenotipical outcome of an evolved control system for a tripod robot. This is a work of Kodjabachian and Meyer. The robot body (shown in the right side of this picture) is not under evolution in this example, but the network architecture (shown in the center) is and it results from a complex developmental process (shown in the left side of the figure). TUTORIAL Stefano Nolfi & Dario Floreano, 2000

Artificial Life & ER [Floreano and Mondada 1994] Artificial Life represents an attempt to understand all life phenomena through their reproduction in artificial systems (typically through their simulation in a computer). Evolutionary Robotics shares many characteristics with Artificial Life but it also stresses the importance of using a physical body, such as a robot, instead of a simulated agent. In the top part of this slide you can see a typical example of an artificial life experiments (this is a work of Menczer and Belew) in which the environment is represented by a 2-dimensional grid of cells and in which sensors have infinite precision. In the bottom part you can see an example of an evolutionary robotic experiment in which the training is accomplished by downloading the genotype of each individual for each generation into a real mobile robot. By using real robots several additional factors (like friction, inertia, ambient light, noise, etc.) due to the physical properties of the robot and of the environment have to be taken into account. Moreover, only realistic types of sensors and actuators can be used. Given that, obtained results are less likely to be affected by idealised experimental situations that often reflect the implicit assumption of the experimenter. [Menczer and Belew, 1997] TUTORIAL Stefano Nolfi & Dario Floreano, 2000

How to Evolve Robots evolution on the real world
evolution on simulation + test on the real robot [Nolfi, Floreano, Miglino, Mondada 1994] The most straightforward way to evolve robots is to do it in the real environment. In this example two competing populations of robots (predators and prey) are evolved in the real environment. The robots consist of two Khepera robots with different sensory-motor structures. During the evolutionary process each individual of the evolving population is embodied in the real khepera body and tested in the real environment. For this reason the evolutionary process can require a rather long time. Another possibility is to evolve robots in simulation and then test the evolved robot on the real environment. This way to proceed can significantly reduce the time required to perform an experiment. In order to be effective, however, this way to proceed require that all the relevant aspect of the robot, the environment, and the robot-environment interaction are accurately modeled in the simulation otherwise the evolved robots will stop to work once transferred into the real environment. Evolving robots in simulation only without a validation test on the real robot does not make sense given that there is no way to ascertain that the simulation really capture all the complexities and the opportunities of the real world. [Floreano and Nolfi, 1998] TUTORIAL Stefano Nolfi & Dario Floreano, 2000

Evolution in the Real World
mechanical robustness energy supply analysis [© K-Team SA] [© K-Team SA] [© K-Team SA] In order to evolve robots on the real environment one must consider several issues. Firstable, mechanical robustness. During the evolutionary process and especially in early generations individuals may produce behaviors that can be damaging for the robot hardware, such as high-speed collisions against objects or sustained strong currents to motors pushing towards a wall. On way to solve this problem is to use small robots. The fundamental lows of physics, in fact, give higher mechanical robustness to miniature robots such as the Khepera robot. Another issue is the problem of energy supply. Evolutionary experiments last longer than the average battery duration. For this reason is necessary to provide the robot with an ability to autonomously recharge its batteries or bypass this problem in other ways. One possibility is to use a thin cable with rotating contacts like that shown in this picture. This technique allow to save the time required for periodically recharging the batteries and also, by supporting data communication with an host computer, allow to store large amount of data that can be used later on to analyze the evolutionary experiment. Another important issue is the analysis of the evolved behavior and of the evolutionary course. The control system of evolved robots can be very complex and intimately related to the environment. For this reason extracting the evolved neural controller from the robot and studying it in isolation often tells little about the behavior of the robot. Methods inspired upon neuroethology can provide useful insights by correlating behavior and dynamics of evolved controllers. Special devices such as this laser device able to compute in real-time the position of the khepera robot can be extremely useful to perform these type of analysis. [Floreano and Mondada, 1994] TUTORIAL Stefano Nolfi & Dario Floreano, 2000

Evolution in Simulation
Different physical sensors and actuators may perform differently because of slight differences in their electronics or mechanics. 4th IF sensor 8th IF sensor To accurately model robot-environment interactions one must face different problems. A first problem is due to the fact that different physical sensors and actuators, even if apparently identical, may perform differently because of slight differences in their electronics or mechanics. By looking at the infrared sensors of a single Khepera, for example, you can easily notice that different sensor responds in a significantly different way when exposed to the same external stimulus. This two graph for example display the activation state of two sensors of a single Khepera placed at different angles and distances with respect to a cylindrical obstacle. As can be seen the second sensor is sensitive within angle and distance ranges that are nearly twice as large those of the first sensor. This also implies that two different robots may perform very differently in identical conditions because of the differences in the sensory characteristics. One way to take into account the idiosyncrasies of each sensor is to empirically sample the different classes of objects in the environment through the robot's real sensors. The resulting matrices of activation values, that will look like the value displayed in the two graph shown here, can then be used by the simulator to set the activation levels of the infra-red sensors of the simulated robot depending on its current position in the simulated environment. One second problem than should be taken into account in building a simulator is that in real robots physical sensors deliver uncertain values and commands to actuators have uncertain effects. Sensors do not return accurate values but just a fuzzy approximation of what they are measuring. Similarly, actuators have effects that might vary in time depending on several unpredictable causes. This problem may be alleviated by introducing noise in simulation at all levels. Finally, since evolution can exploit unforeseen characteristics of the robot-environment interaction, the body of the robot and the characteristics of the environment should be accurately reproduced in the simulation. This means, for instance, that grid worlds, that are often used in Artificial Life simulations cannot be used. Physical sensors deliver uncertain values and commands to actuators have uncertain effects. The body of the robot and the environment should be accurately reproduced in the simulation. [Nolfi, Floreano, Miglino and Mondada 1994; Miglino, Lund, Nolfi, 1995] TUTORIAL Stefano Nolfi & Dario Floreano, 2000

Designing the Fitness Function
[Floreano et al, 2000] FEE functions that describe how the controller should work (functional), rate the system on the basis of several variables and constraints (explicit), and employ precise external measuring devices (external) are appropriate to optimize a set of parameters for complex but well defined control problem in a well-controlled environment. The fitness functions usually include variables and constraints that rate the performance the expected behavior, but these variables and constraints are difficult to choose because the behavior evolved by the robot is not fully known in advance. For these reason designing effective fitness function is a difficult task and often require a trial and error process. The dimension functional-behavioral indicates whether the fitness function take into account specific functioning modes of the controller or the behavior of the individual. In evolving a walk behavior for example, a functional fitness function will rate the oscillation frequency of the legs while a behavioral fitness function will rate the distance covered by the robot. The dimension explicit-implicit defines the amount of variables and constraints included in the function. For example, an implicit function would select only robots that do not run out of energy. Instead, an explicit function would include variables and constraints that evaluate various components of the desired behavior, such as whether the robot approaches an energy supply area, the travel velocity, the distance from obstacles, etc. The dimension external-internal indicates whether the variables and constraints included in the fitness function are computed using information available to the evolving agent. An example of external fitness function for example might take into account the distance between the robot and its goal that is an external variable because it cannot be computed using the information coming from the sensors of the robot. Deciding how to design a fitness function depends on the circumstances. FEE Fitness function located in the low left corner of Fitness Space that describe how the controller should work (functional), rate the system on the basis of several variables and constraints (explicit), and employ precise external measuring devices (external) are appropriate to optimize a set of parameters for a very complex --but well-defined-- control problem in a well-controlled environment. On the contrary, BII (Behavioral, Implicit, Internal) fitness functions that rate only the behavioral outcome of an evolutionary controller (behavioral), rely on few variables and constraints (implicit) that be computed on-board (internal) are suitable for developing adaptive robots capable of autonomous operation in partially unknown and unpredictable environments without human intervention The diagonal line between the FEE and the BII extremes in Fitness Space defines a continuum between traditional engineering optimization and synthesis of artificial life systems, respectively. BII functions that rate only the behavioral outcome of an evolutionary controller (behavioral), rely on few variables and constraints (implicit) that be computed on-board (internal) are suitable for developing adaptive robots capable of autonomous operation in partially unknown and unpredictable environments without human intervention. TUTORIAL Stefano Nolfi & Dario Floreano, 2000

Genetic Encoding Evolvability Expressive power Compactess Simplicity [Gruau, 1994, Nolfi and Floreano 2000] In addition to specify a fitness function we have to decide how to encode the robot control system and eventually the robot body in the genotype in a manner suitable for the application of genetic operators. In most cases, the phenotypical characteristics subjected to evolution are encoded in an uniform manner so that the description of an individual at the level of the genotype assumes the form of a string of identical elements (such as binary numbers). The transformation of the genotype into a full-formed robot is called the genotype-to-phenotype mapping. A good genetic encoding should exhibit several properties. A first requirement is evolvability, i.e. the ability to produce improvements through the application of genetic operators. This is obviously a very important criteria. A second requirement is expressive power, that is the possibility to encode many different phenotypical characteristics such as the architecture of the controller, the morphology of the robot, the rules that control the plasticity of the individual and eventually the rules that determine the genotype-to-phenotype process itself. Only the features which are encoded in the genotype, in fact, can be developed through the self-organization process instead of being pre-designed and fixed. A third requirement is compactness. The length of the genotype in fact, can affect the dimensionality of the space to be searched by the genetic algorithm that, along with the characteristics of the fitness surface, may affect the result of the evolutionary process. A fourth obvious criteria is simplicity. TUTORIAL Stefano Nolfi & Dario Floreano, 2000

Adaptation is more Powerful than Decomposition and Integration
The main strategy followed to develop mobile robots has been that of Divide and Conquer: 1) divide the problem into a list of hopefully simpler sub-problems 2) build a set of modules or layers able to solve each sub-problem 3) integrate the modules so to solve the whole problem Unfortunately, it is not clear how a desired behavior should be broken down At this point we will start to review a set of evolutionary experiments and in doing so will also discuss several general issues. The first issue that we want to address is the fact that artificial evolution is potentially more powerful than other approach based on decomposition and integration. The main strategy followed to develop mobile robots has been that of Divide and Conquer: i.e. (1) divide the problem into a list of hopefully simpler sub-problems, (2) build a set of modules or layers able to solve each sub-problem, (3) integrate the modules so to solve the whole problem. Classical approaches to robotics have often assumed a primary breakdown into Perception, Planning and Action. However, this way of dividing the problem has produced limited results and several researchers following Brooks considered such a division inappropriate. For these reasons, Brooks proposed a radically different approach in which the division is accomplished at the level of behavior. The desired behavior is broken down into a set of simpler behaviors, which are activated or suppressed through a coordination mechanism. This approach has proven to be more successful. However, it still leave the decision of how to break the desired behavior down into simple behaviors to the designer. Unfortunately, it is not clear how a desired behavior should be break down. TUTORIAL Stefano Nolfi & Dario Floreano, 2000

Proximal and Distal Descriptions of Behaviors
The main reason for which it is difficult to break down a desired behavior into simpler pieces is that behavior is not only the result of the robot's control system but the result of the interaction between the robot and the environment. To better understand this point it is useful to distinguish two ways of describing behavior. The distal description of behavior is a description from the observer’s point of view in which high level terms such as “approach” or “discriminate” are used to describe the result of a sequence of sensorimotor loops. The proximal description of behavior is a description from the point of view of the agent’s sensorimotor system that describes how the agent reacts in different sensory situations It should be noted that behavior from the point of view of a distal description, for example the exploration behavior, is the result not only of the behavior from a proximal description point of view (i.e. the sensory-motor mapping) but also of the environment. More precisely behavior, from a point of view of distal description, is the result of the dynamical interaction between the agent and the environment. In fact, the sensory patterns that the environment provides to the agent partially determine the agent’s motor reactions. These motor reactions in turn, by modifying the environment or the relative position of the agent in the environment, partially determine the type of sensory patterns that the agent will receive from the environment. In general, the breakdown is accomplished intuitively by the researcher on the basis of a description of the global behavior from his point of view (i.e. from the point of view of a distal description). However, because the desired behavior is the emergent result of the dynamical interaction between the agent’s control system and the environment it is difficult to predict which type of behavior will be produced by a given control system. The reverse is also true -- it is difficult to predict which control system will produce a desired behavior. In other words, there is not a one-to-one mapping between sub-components of the agent’s control system and sub- components of the corresponding behavior (from a point of view of the distal description). This means that it do not makes much sense to project the decomposition at the level of the distal description to the control system that operate at the proximal level. Evolutionary Robotics, as we will see, by relying on an evaluation of the system as a whole and of its global behavior, releases the designer from the problem of deciding how to break the desired behavior down into simple basic behaviors. [Nolfi, 1997] TUTORIAL Stefano Nolfi & Dario Floreano, 2000

Discrimination Task (1)
explore avoid approach discriminate sensors actuators decomposition and integration walls and cylinders small and large cylinders To show the differences between the use of artificial evolution and the use of the decomposition and integration method let me start by considering a simple example. Imagine that we want to develop the control system for a simple mobile robot, like this Khepera robot, that is place in a rectangular arena like this and that should discriminate between objects with different shapes. Imagine for example that the robot should find a stay close to a cylinder thus discriminating between walls and cylinders. Or that it should find a stay close to large cylinders thus discriminating between small and large cylinders. To accomplish this task the robot should be able to perform these behaviors: (1) moving in order to explore the environment, (2) avoiding an objects (if walls), (3) remaining close to an object (if target) , (4) discriminate objects Therefore, if we want to build the control system for such a robot by following the decomposition and integration approach, we may divide the desired behavior into this four basic behaviors and we should try to implement each of this behaviors in a different modules of the control system. Now, the exploration behavior, the avoidance behavior and the approaching behavior are easy to implement, however it turns out that this discrimination behavior is quite difficult to realize. For example, if we train a neural network by propagation to produce an output close to 0 in the case of sensory pattern corresponding to walls and close to 1 in the case of sensory pattern corresponding to the target, the network will fail most of the times. These graphs show in blue the positions (i.e. the combination of angle and distance) from which networks are able to correctly discriminate between the sensory patterns belonging to walls and to small cylinders and in red the positions from which the network is able to correctly discriminate between the sensory patterns belonging to the small and large cylinders. From this figure, we see that sensory patterns can be correctly discriminated in a minority of the cases: about 25 and 5 percent of the cases respectively. Also notice that there are two areas in which the objects cannot be correctly discriminated although they correspond to positions located at about 60o to the left or the right and a short distance from the objects. This difficulty can be explained by considering that the clouds of sensory patterns corresponding to the two categories (such as wall and cylinders) strongly overlap in sensory space. There are two possibilities at this point. It may be that this task is not so simple as one can imagine. Or, it may be that this way of dividing the desired behavior into sub-components is inappropriate. That the second is the case, that is that the way in which the task has been decomposed is inappropriate can be shown by analyzing the behavior which are obtained by evolving robots. [Nolfi, 1996,1999] TUTORIAL Stefano Nolfi & Dario Floreano, 2000

explore avoid approach discriminate sensors actuators In this case, we have an initial population of random generated genotypes. Each genotype, in this case, encode the weights of a neural network which have 6 sensors (the 6 infrared sensor of the robot) and 2 motors (which determine the speed of the two wheels) Individuals are tested in the environment and are scored for the number of cycles spent close to the target. Then, the best individual will be allowed to reproduce. And the process will be repeated for a certain number of generations. Notice how in this case we do not decompose the behavior into simple behaviors as before. Evolving individuals are free to select their way to solve the task. We select individuals by looking at how they overall behavior approximate the desired behavior without introducing constraints on how the task should be solved or how the control system should be organized. If we do so, after few generations we obtain individuals able to solve the task. In this video you see an evolved individual which is asked to discriminate between walls and small cylinders by avoiding walls and remaining close to cylinders….. Interestingly, all individuals, like the two shown in the video, never stop in front of the target, but start to move back and forth as well as slightly to the left and right remaining at a given angle and distance with respect to the target. This form of behavior is emergent. There is no indication in the fitness function of how objects should be discriminated. This back-and-fourth strategy with which these individuals discriminate between the two type of objects has been discovered by the evolutionary process. These evolved behaviors can be described as a dynamical system and the relative position in which individuals start to move back and forth while remaining in about the same relative position with respect to the target can be described as an attractor. This is more clearly shown by this slide that represent the trajectory of the movements that one individual produced for different angle and distance when it come close to a wall (top picture) or the target (bottom picture). As can be seen, when the individual reaches a distance of about 20mm from a wall it start to avoid it and go away. On the contrary, when the individual is close to a target it continue to approaches the target until it reaches the attractor area located at a distance of about 15 mm and an angle of about 45 degrees. The trajectory of the motor responses in this area all converge to the center of the area itself allowing the individual to keep more or less the same relative position. Notice that the dynamics arise from the interaction between the robot and the environment. Neither the robot (which is a pure sensorimotor controller) nor the environment are dynamical systems in this case. The dynamics arise from the interaction between the two. The robot modifies its relative position with respect to the environment and the environment produce different sensory stimuli for different relative positions of the robot. The cycling between this two elemets produces the dynamics of the resulting behavior. [Nolfi, 1996] TUTORIAL Stefano Nolfi & Dario Floreano, 2000

Interestingly, as reported by Scheier, Pfeifer and others, also the task of discriminating between small and large cylinders can be easily solved by evolving robots. As shown in this slide in fact (see the thick line) in their experiment performance increase through out generations reaching close to optimal performances in about 40 generations. In this case, evolved robots display a different behavior. They explore the environment until they perceived a cylinder (large or small) and then start to turn around the object eventually living if the object is a small cylinder. Interestingly, as claimed by the authors, individuals behave so to experience sensory states which are easy to discriminate. To understand this point we should consider that the type of sensory patterns that a robot receives from the environment partially depend on how the robot itself reacts to each sensory state. This implies that robots who behave differently may be exposed to sensory states which are easier or harder to discriminate. Robots therefore might act so to experience sensory states which forms clouds in sensory space corresponding to different objects which are as separated as possible. This is actually what happens. The green line here indicate the difficulty of the discrimination task (it is a measure developed by Clark and Thornton that, in short, measure how much the patterns corresponding to two different objects are separated in the input space). The blue line show how performance increase through-out generations. As can be seen, the increase in performance is strongly correlated with the increase in the ability to act so to experience sensory states which are easy to discriminate. Evolved robots act so to select sensory patterns that are easy to discriminate [Scheier, Pfeifer, and Kuniyoshi, 1998] TUTORIAL Stefano Nolfi & Dario Floreano, 2000

The Importance of Self-organization
Operating a decomposition at the level of the distal description of behavior does not necessarily simplify the challenge By allowing individuals to self-organise, artificial evolution tends to find simple solutions that exploit the interaction between the robot and the environment and between the different internal mechanism of the control system. These examples demonstrate that operating a decomposition at the level of the distal description of behavior do not necessarily produce a simplification of the challenge. On the contrary, a given sub- behavior (such as discriminate) might be more complex than the whole desired behavior (in our case find a stay close to object X by avoiding object Y). The emergent behaviors of moving back and fourth in front of the target in fact, show how the discrimination between different objects can be obtained as the emergent result of a sequence of sensory- motor loops which result in a behavioral attractor. In order to obtain this behavior the approaching behavior cannot be separated from the discrimination behavior. Rather the way in which the robot act to approach an object should be tightly correlated with the way in which the robot act so to avoid or to remain close to an object. Similarly, in the second experiment, the ability to self-select sensory patterns that are easy to discriminate result from a tight integration between how the robot explore the environment, approach and avoid objects. Although when we observe these evolved robot we might interpret a given sequence of sensory-motor loops as a given distal behavior (we might say now the robot is exploring the environment, now it is avoiding an object etc.) in each time step these robot tend to react so to maximize their chance to produce the overall behavior. For example, when they are exploring the environment, they act so to maximize the chance to later encounter sensory patterns that they know how to discriminate. Finally it should be noted that the solution found by evolution are qualitatively different from those that can be obtained through decomposition and integration or more generally through partial or total design. The strategies that emerge often surprise us. This is because these strategies strongly rely on the dynamical interaction between robot and the environment and on the interaction between how the robot react in each possible circumstance. This form of strategies are impossible to design given that they result from a large number of non-linear interactions. By allowing individuals to self-organise, artificial evolution tend to find simple solutions that exploit the interaction between the robot and the environment and between the different internal mechanism of the control system. [Nolfi, 1996,1997] TUTORIAL Stefano Nolfi & Dario Floreano, 2000

Modularity and Behaviors
At this point we shall introduce a slightly more complex experiment. We will use this experiment to illustrate two points. One is the relation between modularity and behavior. The second point has to do with the problem of scalability (i.e. the possibility to use artificial evolution to solve complex tasks). Let starts with the relation between behaviors and modularity. From the point of view of the decomposition and integration approach, the use of modularity is implicit in the approach itself. The desired behavior is broken down into a set of basic behaviors (such us “approach” or “avoid”) which correspond to a set of sub-components (modules) of the controller. Conversely, in the case of the evolutionary approach the use of modularity is not mandatory. As we shown in the experiment we just saw, a non-modular control system such as a fully connected perceptron is perfectly able to produce a behavior that can be decomposed into a set of basic behaviors (in our case to explore the environment, to avoid walls, and to approach and remain close to the targets). Should we expect that behaviors of any complexity can be produced by homogeneous non-modular controllers or that start to be necessary when we move to more complex tasks? Moreover, it we allow evolving individuals to self-organise how to use their modules, should we expect a correspondence between modules and basic behaviors in evolved individuals? To answer these questions we will present a new experiment in which evolving individuals are asked to clean their arena by collecting "garbage". As we will see modularity is useful. However, this does not mean that there is a one-to-one relation between modules and sub-behaviors described from a distal description point of view. Is modularity useful in ER ? What is the relation between self-organized neural modules and behaviors ? [Nolfi, 1997] TUTORIAL Stefano Nolfi & Dario Floreano, 2000

The Garbage Collecting Task (1)
Let us consider the case of a Khepera robot provided with a gripper module with two degrees of freedom which should be able to clean an arena by collecting garbage (i.e. cylindrical objects) and releasing them outside the arena. This involves a rather long sequence of sub-tasks: (a) explore the environment, avoiding walls; (b) recognize a target object and move to an angle and distance from where the object can be grasped; (c) pick up the target object; (d) move toward the walls while avoiding other target objects; (e) recognize a wall and move at an angle and distance from where the object can be safely dropped outside the arena; (g) release the object. Moreover the task present other difficulties. For instance the robot should produce different motor actions in different circumstances which are similar from the sensory point of view. To allow evolution (and not the designer) to decompose the problem into sub-components corresponding to different neural modules we evolved robots with this architecture. This network has 7 sensory neurons (in black) coding for the 6 frontal infrared sensors of Khepera and for the light- barrier sensor on the gripper and 4 motors coding for the left and right motors and for the triggering of the object pick- up and object release procedure respectively. When the pick-up procedure is triggered on the arm is moved down, the gripper is closed and then the arm is moded up again. When the release procedure is triggered, the arm is moved down, the gripper is closed, and then the arm is moved up again. For each motor however we have two competing neural modules (the light and dark blue neurons). Sometime the robot is controller by the light-blue output neurons and sometime by the dark-blue one. In other words the light-blue and dark-blue outputs are two neural modules that compete for controlling the same motor. Which of the two gain the control depends on the activation state of the two corresponding selector neurons in red. If the light-red selector neuron is more activated than the dark-red selector neuron in a given moment the light-blue output neuron gain the control otherwise the dark-blue output neuron gain the control. Given that both the blue and red weights (i.e. the weights of the connections which determine the output produced by each neural modules and the weights that determine which neural module gain the control in a given moment) are subjected to the evolutionary process, evolution determine both how evolved individual behave and how the overall behavior is divided in sub-components which are produced by different neural modules. We replicated this experiment with this special architecture and with these other neural architectures shown in the right part of this slide which were non-modular or which were modular but in which the decomposition of the behaviors into modules was predetermined by the designer. TUTORIAL Stefano Nolfi & Dario Floreano, 2000 [Nolfi, 1997]

The Garbage Collecting Task (2)
Modular neural controller able to self-organize outperform other architectures There is not a correspondence between self-organized neural modules and sub-behaviors In this video we can see one of the successful evolved robot. As you can see the robot is able to perform the requested behavior perfectly well. It explores the environment avoiding the walls and is able to distinguish the target objects from the walls. After the robot has correctly disambiguated the sensory stimuli it encounter it grasp the targets and avoid walls (when the gripper do not carry anything) and viceversa it avoid targets and approach walls (when the gripper carry an object). If we look at performance throughout generations we can see how the emergent modular architectures, after few generations, start to outperform all the other architectures maintaining a significant difference until generation 500. Individuals with this architecture appear to be also more robust when tested in the real environment (the evolutionary process in this case has been conducted in simulation). So modularity appear to be useful. If we look at how neural modules are used in evolved individuals we see how there is no correspondence between neural modules and distal behaviors. This picture represents the behavior of an evolved individual that uses just two neural modules. The red and blue circles represents the trajectory of the robot and the color of the circles represent which of the two modules is currently in charge. The robot start from the right-bottom side of the environment, avoid the wall, then by exploring the environment find a target object and pick it up, move toward a wall, release the object. Then it start to explore the environment again finally picking up another object here. If we look to the colors of the circles, we see that blue and red modules alternate and are responsible for producing all the sub-parts of these behavior. In other words is not the case that one module (the blue one for example) is responsible for one behavior (such as explore the environment) and the other module for other sub-behaviors. All different sub-behaviors result from the contribution of both neural modules. What we can conclude from here is that it is useful to divide a complex behavior in sub-components to be produced by different modules. However decomposing the problem by using at the behavior from a distal description point of view might be misleading. [Nolfi, 1997] TUTORIAL Stefano Nolfi & Dario Floreano, 2000

Evolving “complex” behaviors
Bootstrap problem: selecting individuals directly for their ability to solve a task only works for simple tasks Incremental Evolution: starting with a simplified version of the task and then progressively increasing complexity Including in the selection criterion also a reward for sub-components of the desired behavior This experiment also show how simple evolution may have problems of scalability. If we try to evolve this behavior by rewarding individuals only for their ability to release objects we fail. What happen is that being very improbable that an individual with randomly selected weights can succeed in doing the entire sequence of correct behavior even once, if we only score individuals for how many targets they are able to release outside the arena, all networks of the initial generations would be scored with the same 0 values and, as a consequence, the selection process would not have any affect. In other words, selecting individuals directly for their ability to solve a task only works for simple tasks. If we want to scale up to more complex task, we need something else. Now, one thing we can do is to use our own insight, i.e. the insight of the experimenter to channel the evolutionary process in the right direction. This can be down by starting with a simplified version of the task and the progressively increase its complexity. This can be down by including in the selection criterion also a reward for sub-components of the desired behavior or by modifying the fitness function during the evolutionary process so to progressively increase the complexity of the task. The first technique has been used in the experiment on garbage collecting we just showed you. To obtain this behavior through evolution we had to score individuals also for their ability to pick-up objects. In this way individuals which occasionally pick-up objects are selected even if they are not able also to correctly release the objects outside the arena. Once the ability to pick-up object is acquired, after a certain number of generations individuals also able to release objects outside the arena, even if occasionally, will be obtained and will be selected. This ability will be then refined by individuals of successive generations. Start with a simplified version of the task and then progressively increase its complexity by modifying the selection criterion TUTORIAL Stefano Nolfi & Dario Floreano, 2000

Visually-Guided Robots
The second technique has been used by a group of researchers at the University of Sussex who studied the evolution of visually guided robot. In their experiments they trained a specially designed robot called "gantry-robot" to visually discriminate between a triangular and a rectangular white shapes on a black background. The evolutionary process consisted of three stages of increasing complexity. In the first stage, one full wall was covered with white paper and the robot was asked to move toward the wall. In the second stage, the white target surface was restricted to a 22 cm wide band. Finally, in the third stage the white paper was substituted by two white shapes, a rectangle and a triangle, and the robot was asked to move toward the triangle. Although these techniques might solve the bootstrap problem, they require that the designer identify the right stepping stones necessary to obtain a given behavior. For the reasons we described previously, this might be an hard task. As Dario will show in the second part of the tutorial there other way to achieve scalability that do not require additional intervention from the designer, namely lifetime adaptation and co-evolution of competing populations [Cliff et al. 1993; Harvey et al. 1994] TUTORIAL Stefano Nolfi & Dario Floreano, 2000

Learning & Evolution: Interactions
Different time scales, different mechanisms, similar effects Learning Advantages in Evolution [Nolfi & Floreano, 1999]: Adapt to changes that occur faster than a generation Extract information that might channel the course of evolution Help and guide evolution Reduce genetic complexity and increase population diversity Learning Costs in Evolution [Mayley, 1997]: Delay in the ability to achieve fit behaviors Increased unreliability (learning wrong things) Physical damages, energy waste, tutoring Baldwin effect [Baldwin, 1896; Morgan, 1896; Waddington, 1942] TUTORIAL Stefano Nolfi & Dario Floreano, 2000

Hinton & Nowlan model [1987]
00?11???0111?0?1?0?1 1 ? Fitness=correct combination of weights Learning samples space in the surrounding of the individual Fitness landscape is smoothed and evolution becomes faster Baldwin effect (assimilation of features normally « learnt ») Model constraints: Learning task and evolutionary task are the same Learning is a random process Environment is static Genotype and Phenotype space are correlated TUTORIAL Stefano Nolfi & Dario Floreano, 2000

Different Tasks [Nolfi, Elman, Parisi, 1994]
Evolving for food Learning predictions Learning mechanism=BP Increased speed & fitness Genetic assimilation TUTORIAL Stefano Nolfi & Dario Floreano, 2000

Perspectives on Landscape
B1 Q P B2 A=weights evolved for food finding C=weights trained for prediction B1, B2= new position after mutation Fitness=higher when closer to A Correlated landscapes [Parisi & Nolfi, 1996] Relearning effects to compensate mutations [Harvey, 1997] (it may hold only in few cases) TUTORIAL Stefano Nolfi & Dario Floreano, 2000

Evolutionary Reinforcement Learning
Evolving both action and evaluation connection strengths [Ackley & Littman, 1991] Action module modifies weights during lifetime using CRBP ERL better better performance than E alone or RL alone Baldwin effect Method validated on mobile robots [Medeen, 1996] TUTORIAL Stefano Nolfi & Dario Floreano, 2000

Evolutionary Auto-teaching
All weights genetically encoded, but one half teaches the other half using Delta rule [Nolfi & Parisi, 1991] Individuals can live in one of two environments, randomly determined at birth Learning individuals adapt strategy to environment and display higher fitness Learning No learning TUTORIAL Stefano Nolfi & Dario Floreano, 2000

Evolution of Learning Mechanisms (1)
1 synapse synapse sign synapse strength Genetically-determined learning rule - hebb - postsynaptic - presynaptic - covariance learning rate Adaptive Encoding learning rules, NOT learning weights [Floreano & Mondada, 1994] Weights always initialized to random values Different weights can use different rules within same network Adaptive method can be applied to node encoding (short genotypes) TUTORIAL Stefano Nolfi & Dario Floreano, 2000

Sequential task & unpredictable change
Faster and better results [Floreano & Urzelai, 2000] Automatic decomposition of sequential task Synapses continuously change Evolved robots adapt online to upredictable change [Urzelai & Floreano, 2000]: Illumination From simulations to robots Environmental layout Different robotic platform Lesions to motor gears [Eggenberge et al., 1999] Genetically-determined Adaptive TUTORIAL Stefano Nolfi & Dario Floreano, 2000

Summary Learning is very useful for robotic evolution: accelerates and boosts evolutionary performance can cope with fast changing environments can adapt to unpredictable sources of change Lamarck evolution (inherit learned properties) may provide short-term gains [Lund, 1999], but it does not display all the advantages listed above [Sasaki & Tokoro, 1997, 1999] Distinction between learning and adaptation [Floreano & Urzelai, 2000]: Adaptation does not necessary develops and capitalize upon new skills and knowledge Learning is an incremental process whereby new skills and knowledge are gradually acquired and integrated TUTORIAL Stefano Nolfi & Dario Floreano, 2000

Competitive Co-evolution
Fitness of each population depends on fitness of opponent population. Examples: Predator-prey Host-parasite It may increase adaptive power by producing an evolutionary arms race [Dawkins & Krebs, 1979] More complex solutions may incrementally emerge as each population tries to win over the opponent It may be a solution to the boostrap problem Fitness function plays a less important role Continuously changing fitness landscape may help to prevent stagnation in local minima [Hillis, 1990] TUTORIAL Stefano Nolfi & Dario Floreano, 2000

Co-evolutionary Pitfalls
The same set of solutions may be discovered over and over again. This cycling behavior may end up in very simple solutions. Solution: Retain best individuals of last few gens (Hall-of-Fame->all gens). Whereas in conventional evolution the fitness landscape is static and fitness is a monotonic function of progress, in competitive co-evolution the fitness landscape can be modified by the competitor and fitness function is no longer an indicator of progress. Solution: Master Fitness (after evolution test each best against all best), CIAO graphs (test each best against all previous best). TUTORIAL Stefano Nolfi & Dario Floreano, 2000

Examples of Co-evolutionary Agents
Ball-catching agents [Sims, 1994] Distance-based fitness Rare good results Simulated predator-prey [Cliff & Miller, 1997] Distance-based fitness 100s generations CIAO method et al. Evolution of sensors TUTORIAL Stefano Nolfi & Dario Floreano, 2000

Co-evolutionary Robots
Floreano, Nolfi, & Mondada, 1998 Energetically autonomous Predator-prey scenarion Time-based fitness Controllers downloaded to increase reaction speed Retain last best 5 controllers for testing individuals Predators=vision+proximity Prey=proximity+faster Predator genotype longer Prey has initial position advantage TUTORIAL Stefano Nolfi & Dario Floreano, 2000

Co-evolutionary Results
progress best fun Predators do not attempt to minimize distance Prey maximize distance TUTORIAL Stefano Nolfi & Dario Floreano, 2000

Increasing Environmental Complexity
36 240 …prevents premature cycling [Nolfi & Floreano, 1999] TUTORIAL Stefano Nolfi & Dario Floreano, 2000

Summary Competitive co-evolution is challenging because: Fitness landscape is continuously changing Hard to monitor progress online Cycling local minima When environment is sufficiently complex, or Hall-of-Fame method is used, the system develops increasing more complex solutions It can work and capitalize on very implicit, internal, and behavioral fitness functions by exploring a large range of behaviors triggered by opponents When co-evolving adaptive mechanisms, prey resort to random actions whereas predators adapt online to the prey strategy and report better performance [Floreano & Nolfi, 1997] TUTORIAL Stefano Nolfi & Dario Floreano, 2000

Evolvable Hardware Evolution of electronic circuits Evolution of body morphologies (including sensors) Why evolve hardware? Hardware choice constrains environmental interactions and the course of evolution Evolved solutions can be more efficient than those designed by humans Develope new adaptive materials with self-configuration and self-repair abilities TUTORIAL Stefano Nolfi & Dario Floreano, 2000

Evolutionary Control Circuits
Thompson’s unconstrained evolution Xilinx, family 6000, overwrite global synchronization Tone reproduction Robot control Fitness landscape studies (very rugged, neutral networks) Evolvable Hardware Module for Khepera TUTORIAL Stefano Nolfi & Dario Floreano, 2000

Evolutionary Control Circuits
Keymeulen: evolution of vision based controllers Find ball while avoiding obstacles Constrained evolution, entirely on physical robot De Garis: CAM Brain, composed of tens of Xilinx FPGAs, 6000 family Growth of neural circuits using CA with evolved rules Willing to evolve brain for kitten robot. Pitfall: speed limited by sensory- motor loop. TUTORIAL Stefano Nolfi & Dario Floreano, 2000

Evolutionary Morphologies
Evolution of Lego Structures [Funes et al,, 1997] Bridges Cranes Extended to objects and robot bodies see Example of evolved crane [Funes et al,, 1997] TUTORIAL Stefano Nolfi & Dario Floreano, 2000

Co-evolutionary Morphologies
Karl Sims, 1994 Komosinski & Ulatowski, 1999 Effect of doubling sensor range on body/wheel size [Lund et al., 1997] TUTORIAL Stefano Nolfi & Dario Floreano, 2000

Suggestions for Further Research
Encoding and mapping of control systems Exploration of alternative building blocks Integration of growth, learning, and maturation Incremental and open-ended evolution Morphology and sensory co-evolution Application to large-scale circuits User-directed evolution Comparison with other adaptive techniques Further readings: Nolfi, S. & Floreano, D. Evolutionary Robotics. The Biology, Technoloy, and Intelligence of Self-Organizing Machines. MIT Press, October 2000 Husbands, P. & Meyer, J-A. (Eds.) Evolutionary Robotics. Proceedings of the 1st European Workshop, Springer Verlag, 1998 Gomi, T. (Ed.) Evolutionary Robotics. Volume series: I (1997), II (1998), III (2000), AAI Books. TUTORIAL Stefano Nolfi & Dario Floreano, 2000

Evorobot Simulator Sources, binaries, and documentation files freely available at: [Nolfi, 2000] TUTORIAL Stefano Nolfi & Dario Floreano, 2000

A TUTORIAL Stefano Nolfi Dario Floreano

Similar presentations

Presentation on theme: "A TUTORIAL Stefano Nolfi Dario Floreano"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A TUTORIAL Stefano Nolfi Dario Floreano

Similar presentations

Presentation on theme: "A TUTORIAL Stefano Nolfi Dario Floreano"— Presentation transcript:

Similar presentations

About project

Feedback