Constructing Intelligent Agents via Neuroevolution By Jacob Schrum
Motivation Intelligent agents are needed –Search-and-rescue robots –Mars exploration –Training simulations –Video games Insight into nature of intelligence –Sufficient conditions for emergence of: Cooperation Communication Multimodal behavior
Talk Outline Bio-inspired learning methods –Neural networks –Evolutionary computation My research –Learning multimodal behavior –Modular networks in Ms. Pac-Man –Human-like behavior in Unreal Tournament Future work Conclusion
Artificial Neural Networks Brain = network of neurons ANN = abstraction of brain –Neurons organized into layers Inputs Outputs
What Can Neural Networks Do? In theory, anything! –Universal Approximation Theorem – Can’t program: too complicated In practice, learning/training is hard –Supervised: Backpropagation –Unsupervised: Self-Organizing Maps –Reinforcement Learning: Temporal-Difference and Evolutionary Computation
Evolutionary Computation Computational abstraction of evolution –Descent with modification (mutation) –Sexual reproduction (crossover) –Survival of the fittest (natural selection) Evolution + Neural Nets = Neuroevolution –Population of neural networks –Mutation and crossover modify networks –Net used as control policy to evaluate fitness
Neuroevolution Example Start With Parent Population
Neuroevolution Example Start With Parent Population Evaluate and Assign Fitness
Neuroevolution Example Start With Parent Population Evaluate and Assign Fitness Clone, Crossover and Mutate To Get Child Population
Neuroevolution Example Start With Parent Population Evaluate and Assign Fitness Clone, Crossover and Mutate Children Are Now the New Parents Repeat Process: Fitness Evaluations As the process continues, each successive population improves performance
Neuroevolution Applications F. Gomez and R. Miikkulainen, “2-D Pole Balancing With Recurrent Evolutionary Networks” ICANN 1998 Double Pole Balancing
Neuroevolution Applications F. Gomez and R. Miikkulainen, “Active Guidance for a Finless Rocket Using Neuroevolution” GECCO 2003 Finless Rocket Control
Neuroevolution Applications N. Kohl, K. Stanley, R. Miikkulainen, M. Samples, and R. Sherony, "Evolving a Real-World Vehicle Warning System" GECCO 2006 Vehicle Crash Warning System
Neuroevolution Applications K. O. Stanley, B. D. Bryant, I. Karpov, R. Miikkulainen, "Real-Time Evolution of Neural Networks in the NERO Video Game" AAAI 2006 Training Video Game Agents
What is Missing? NERO agents are specialists –Sniping from a distance –Aggressively rushing in Humans can do all of this, and more Multimodal behavior –Different behaviors for different situations Human-like behavior –Preferred by humans
What I do With Neuroevolution Discover complex agent behavior Discover multimodal behavior Contributions: Use multi-objective evolution –Different objectives for different modes Evolve modular networks –Networks with modules for each mode Human-like behavior –Constrain evolution
Pareto-based Multiobjective Optimization High health but did not deal much damage Dealt lot of damage, but lost lots of health Tradeoff between objectives
Non-dominated Sorting Genetic Algorithm II Population P with size N; Evaluate P Use mutation (& crossover) to get P´ size N; Evaluate P´ Calculate non-dominated fronts of P P´ size 2N New population size N from highest fronts of P P´ K. Deb, S. Agrawal, A. Pratap, T. Meyarivan, "A Fast Elitist Non-dominated sorting genetic algorithm for multi-objective optimization: NSGA-II" PPSN VI, 2000
Ms. Pac-Man Popular classic game Predator-prey scenario –Ghosts are predators –Until power pill is eaten Multimodal behavior needed –Running from threats –Chasing edible ghosts –More?
Modular Networks Different areas of brain specialize –Structural modularity → functional modularity Apply to evolved neural networks –Separate module → behavioral mode Preference neurons (grey) arbitrate between modules Use module with highest preference output ( )( )
Module Mutation Let evolution decide how many modules Networks start with one module New modules added by one of several module mutations Previous Random Duplicate
Intelligent Module Usage Evolution discovers a novel task division –Not programmed Dedicates one module to luring (cyan) Improves ghost eating when using other module
Comparison With Other Work AuthorsMethodGameAVGMAX Alhejali and Lucas [1]GPFourMaze16,01444,560 Alhejali and Lucas [2]GP+CampsFourMaze11,41331,850 My Module Mutation Duplicate ResultsFourMaze32,64744,520 Brandstetter and Ahmadi [3]GPCIG ,19833,420 Recio et al. [4]ACOCIG ,03143,467 Alhejali and Lucas [5]GP+MCTSCIG ,64169,010 My Module Mutation Duplicate ResultsCIG ,29984,980 [1] A.M. Alhejali, S.M. Lucas: Evolving diverse Ms. Pac-Man playing agents using genetic programming. UKCI [2] A.M. Alhejali, S.M. Lucas: Using a training camp with Genetic Programming to evolve Ms Pac-Man agents. CIG [3] M.F. Brandstetter, S. Ahmadi: Reactive control of Ms. Pac Man using information retrieval based on Genetic Programming. CIG [4] G. Recio, E. Martín, C. Estébanez, Y. Sáez: AntBot: Ant Colonies for Video Games. TCIAIG [5] A.M. Alhejali, S.M. Lucas: Using genetic programming to evolve heuristics for a Monte Carlo Tree Search Ms Pac-Man agent. CIG 2013.
Types of Intelligence Evolved intelligent Ms. Pac-Man behavior –Surprising module usage –Evolution discovers the unexpected –Diverse collection of solutions Still not human-like –Human-like vs. optimal –Human intelligence
Modern Game: Unreal Tournament 3D world with simulated physics Multiple human and software agents interacting Agents attack, retreat, explore, etc. Multimodal behavior required to succeed
Human-like Behavior: BotPrize International competition at CIG conference A Turing Test for video game bots –Judge as human over 50% of time to win –After 5 years, we won in 2012 Evolved combat behavior –Constrained to be human-like
Guessing Game Coleman: ???? Milford: ???? Moises: ???? Lawerence: ???? Clifford: ???? Kathe: ???? Tristan: ???? Jackie: ????
Judging Game
Player Identities Coleman: UT^2 (Our winning bot) Milford: ICE-2010 (bot) Moises: Discordia (bot) Lawerence: Native UT2004 bot Clifford: w00t (bot) Kathe: Human Tristan: Human Jackie: Native UT2004 bot
Human Subject Study Six participants played the judging game Recorded extensive post-game interviews What criteria to humans claim to judge by?
Lessons Learned Don’t be too skilled –Evolved with accuracy restrictions –Disable elaborate dodging Humans are “tenacious” –Opponent-relative actions –Encourage “focusing” on opponent Don’t repeat mistakes –Database of human traces to get unstuck
Bot Architecture
Future Work Evolving teamwork –Ghosts must cooperate to eat Ms. Pac-Man –Unreal Tournament supports team play Domination, Capture the Flag, etc. Interactive evolution –Evolve in response to human interaction Adaptive opponents/assistants Evolutionary art Content generation
Conclusion Evolution discovers unexpected behavior Modular networks learn multimodal behavior Human behavior not optimal –Evolution can be constrained to be more human-like Many directions for future research
Questions? contact Jacob Schrum