By Jacob Schrum and Risto Miikkulainen

Slides:



Advertisements
Similar presentations
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 7: Learning in recurrent networks Geoffrey Hinton.
Advertisements

Constructing Complex NPC Behavior via Multi- Objective Neuroevolution Jacob Schrum – Risto Miikkulainen –
Class Project Due at end of finals week Essentially anything you want, so long as it’s AI related and I approve Any programming language you want In pairs.
Evolving Cutting Horse and Sheepdog Behavior with a Simulated Flock Chris Beacham Computer Systems Research Lab 2009.
Institute of Intelligent Power Electronics – IPE Page1 Introduction to Basics of Genetic Algorithms Docent Xiao-Zhi Gao Department of Electrical Engineering.
The Dominance Tournament Method of Monitoring Progress in Coevolution Speaker: Lin, Wei-Kai (2009/04/30) 1.
Chapter 4 DECISION SUPPORT AND ARTIFICIAL INTELLIGENCE
TEMPLATE DESIGN © Genetic Algorithm and Poker Rule Induction Wendy Wenjie Xu Supervised by Professor David Aldous, UC.
Evolving Neural Network Agents in the NERO Video Game Author : Kenneth O. Stanley, Bobby D. Bryant, and Risto Miikkulainen Presented by Yi Cheng Lin.
Reinforcement Learning
Using a GA to Create Prey Tactics Presented by Tony Morelli on 11/29/04.
And Just Games etc.. EVOLUTION OF COMPUTER GAMES PongOdyssey Beginning of the use of microprocessors ATARI VCS system bit.
Machine Creativity. Outline BackgroundBackground –The problem and its importance. –The known algorithms and systems. Summary of the Creativity Machine.
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
Othello Sean Farrell June 29, Othello Two-player game played on 8x8 board All pieces have one white side and one black side Initial board setup.
Nicholas Mifsud.  Behaviour Trees (BT) proposed as an improvement over Finite State Machines (FSM)  BTs are simple to design, implement, easily scalable,
McGraw-Hill/Irwin ©2005 The McGraw-Hill Companies, All rights reserved ©2005 The McGraw-Hill Companies, All rights reserved McGraw-Hill/Irwin.
Marcus Gallagher and Mark Ledwich School of Information Technology and Electrical Engineering University of Queensland, Australia Sumaira Saeed Evolving.
Evolving Multi-modal Behavior in NPCs Jacob Schrum – Risto Miikkulainen –
Genetic Algorithms and Ant Colony Optimisation
Evolution, Brains and Multiple Objectives
Chapter 11: Artificial Intelligence
An Approach of Artificial Intelligence Application for Laboratory Tests Evaluation Ş.l.univ.dr.ing. Corina SĂVULESCU University of Piteşti.
Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.
Four Types of Decisions (p p.130) Structured vs. Nonstructured(Examples?) –Structured: Follow rules and criteria. The right answer exists. No “feel”
Soft Computing Lecture 18 Foundations of genetic algorithms (GA). Using of GA.
Computer Go : A Go player Rohit Gurjar CS365 Project Proposal, IIT Kanpur Guided By – Prof. Amitabha Mukerjee.
Constructing Intelligent Agents via Neuroevolution By Jacob Schrum
Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field.
1 5. Application Examples 5.1. Programmable compensation for analog circuits (Optimal tuning) 5.2. Programmable delays in high-speed digital circuits (Clock.
I Robot.
Game Theory, Social Interactions and Artificial Intelligence Supervisor: Philip Sterne Supervisee: John Richter.
Mike Taks Bram van de Klundert. About Published 2005 Cited 286 times Kenneth O. Stanley Associate Professor at University of Central Florida Risto Miikkulainen.
1 Genetic Algorithms and Ant Colony Optimisation.
Neural Networks Chapter 7
科技英文作業 黃方世 陳竹軒. Introduction Talking about video games. Training agents for what. Focus on training Non-player-character (NPC).
Chapter 9 Genetic Algorithms.  Based upon biological evolution  Generate successor hypothesis based upon repeated mutations  Acts as a randomized parallel.
Computational theory techniques in interactive video games.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Evolving Reactive NPCs for the Real-Time Simulation Game.
Introduction to Genetic Algorithms. Genetic Algorithms We’ve covered enough material that we can write programs that use genetic algorithms! –More advanced.
RADHA-KRISHNA BALLA 19 FEBRUARY, 2009 UCT for Tactical Assault Battles in Real-Time Strategy Games.
Pac-Man AI using GA. Why Machine Learning in Video Games? Better player experience Agents can adapt to player Increased variety of agent behaviors Ever-changing.
Artificial Intelligence Research in Video Games By Jacob Schrum
Design and Implementation of General Purpose Reinforcement Learning Agents Tyler Streeter November 17, 2005.
CAP6938 Neuroevolution and Artificial Embryogeny Competitive Coevolution Dr. Kenneth Stanley February 20, 2006.
Course Overview  What is AI?  What are the Major Challenges?  What are the Main Techniques?  Where are we failing, and why?  Step back and look at.
Evolving Multimodal Networks for Multitask Games
Finite State Machines (FSM) OR Finite State Automation (FSA) - are models of the behaviors of a system or a complex object, with a limited number of defined.
Evolving Agent Behavior in Multiobjective Domains Using Fitness-Based Shaping Jacob Schrum and Risto Miikkulainen University of Texas at Austin Department.
Reinforcement Learning AI – Week 22 Sub-symbolic AI Two: An Introduction to Reinforcement Learning Lee McCluskey, room 3/10
Riza Erdem Jappie Klooster Dirk Meulenbelt EVOLVING MULTI-MODAL BEHAVIOR IN NPC S.
Artificial Intelligence: Research and Collaborative Possibilities a presentation by: Dr. Ernest L. McDuffie, Assistant Professor Department of Computer.
A field of study that encompasses computational techniques for performing tasks that require intelligence when performed by humans. Simulation of human.
Robot Intelligence Technology Lab. 10. Complex Hardware Morphologies: Walking Machines Presented by In-Won Park
Deep Learning and Deep Reinforcement Learning. Topics 1.Deep learning with convolutional neural networks 2.Learning to play Atari video games with Deep.
CAP6938 Neuroevolution and Artificial Embryogeny Real-time NEAT Dr. Kenneth Stanley February 22, 2006.
 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.
UT^2: Human-like Behavior via Neuroevolution of Combat Behavior and Replay of Human Traces Jacob Schrum Igor V. Karpov
Solving Interleaved and Blended Sequential Decision-Making Problems through Modular Neuroevolution Jacob Schrum Risto Miikkulainen.
Evolutionary Algorithms Jim Whitehead
Done Done Course Overview What is AI? What are the Major Challenges?
Evolving Multimodal Networks for Multitask Games
Power and limits of reactive intelligence
Announcements Homework 3 due today (grace period through Friday)
UT^2: Human-like Behavior via Neuroevolution of Combat Behavior and Replay of Human Traces Jacob Schrum, Igor Karpov, and Risto Miikkulainen
Dr. Unnikrishnan P.C. Professor, EEE
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
Lecture 4. Niching and Speciation (1)
Presentation transcript:

By Jacob Schrum and Risto Miikkulainen Evolving Multimodal Behavior With Modular Neural Networks in Ms. Pac-Man By Jacob Schrum and Risto Miikkulainen

Introduction Challenge: Discover behavior automatically Simulations Robotics Video games (focus) Why challenging? Complex domains Multiple agents Multiple objectives Multimodal behavior required (focus)

Multimodal Behavior Animals can perform many different tasks Imagine learning a monolithic policy as complex as a cardinal’s behavior: HOW? Problem more tractable if broken into component behaviors Flying Nesting Foraging

Multimodal Behavior in Games How are complex software agents designed? Finite state machines Behavior trees/lists Modular design leads to multimodal behavior FSM for NPC in Quake Behavior list for our winning BotPrize bot

Modular Policy One policy consisting of several policies/modules Number preset, or learned Means of arbitration also needed Human specified, or learned via preference neurons Separate behaviors easily represented Sub-policies/modules can share components Outputs: Inputs: Multitask Preference Neurons (Caruana 1997)

Constructive Neuroevolution Genetic Algorithms + Neural Networks Good at generating control policies Three basic mutations + Crossover Other structural mutations possible (NEAT by Stanley 2004) Perturb Weight Add Connection Add Node

Module Mutation A mutation that adds a module Can be done in many different ways Can happen more than once for multiple modules Out: In: MM(Random) MM(Duplicate) (Schrum and Miikkulainen 2012) (cf Calabretta et al 2000)

Ms. Pac-Man Domain needs multimodal behavior to succeed Predator/prey variant Pac-Man takes on both roles Goals: Maximize score by Eating all pills in each level Avoiding threatening ghosts Eating ghosts (after power pill) Non-deterministic Very noisy evaluations Four mazes Behavior must generalize Human Play

Task Overlap Distinct behavioral modes Are ghosts currently edible? Eating edible ghosts Clearing levels of pills More? Are ghosts currently edible? Possible some are and some are not Task division is blended

Previous Work in Pac-Man Custom Simulators Genetic Programming: Koza 1992 Neuroevolution: Gallagher & Ledwich 2007, Burrow & Lucas 2009, Tan et al. 2011 Reinforcement Learning: Burrow & Lucas 2009, Subramanian et al. 2011, Bom 2013 Alpha-Beta Tree Search: Robles & Lucas 2009 Screen Capture Competition: Requires Image Processing Evolution & Fuzzy Logic: Handa & Isozaki 2008 Influence Map: Wirth & Gallagher 2008 Ant Colony Optimization: Emilio et al. 2010 Monte-Carlo Tree Search: Ikehata & Ito 2011 Decision Trees: Foderaro et al. 2012 Pac-Man vs. Ghosts Competition: Pac-Man Genetic Programming: Alhejali & Lucas 2010, 2011, 2013, Brandstetter & Ahmadi 2012 Monte-Carlo Tree Search: Samothrakis et al. 2010, Alhejali & Lucas 2013 Influence Map: Svensson & Johansson 2012 Ant Colony Optimization: Recio et al. 2012 Pac-Man vs. Ghosts Competition: Ghosts Neuroevolution: Wittkamp et al. 2008 Evolved Rule Set: Gagne & Congdon 2012 Monte-Carlo Tree Search: Nguyen & Thawonmos 2013

Evolved Direction Evaluator Inspired by Brandstetter and Ahmadi (CIG 2012) Net with single output and direction-relative sensors Each time step, run net for each available direction Pick direction with highest net output Right Preference Left Preference argmax Left

Module Setups Manually divide domain with Multitask Two Modules: Threat/Any Edible Three Modules: All Threat/All Edible/Mixed Discover new divisions with preference nodes Two Modules, Three Modules, MM(R), MM(D) Out: In: Two-Module Multitask Two Modules MM(D)

Average Champion Scores Across 20 Runs

Most Used Champion Modules Patterns of module usage correspond to different score ranges

Most Used Champion Modules Low One Module Most low-scoring networks only use one module

Most Used Champion Modules Low One Module Medium-scoring networks use their primary module 80% of the time Edible/Threat Division

Most Used Champion Modules Luring/Surrounded Module Low One Module Surprisingly, the best networks use one module 95% of the time Edible/Threat Division

Multimodal Behavior Different colors are for different modules Three-Module Multitask Learned Edible/Threat Division Learned Luring/Surrounded Module

Comparison with Other Work Authors Method Eval Type AVG MAX Alhejali and Lucas 2010 GP Four Maze 16,014 44,560 Alhejali and Lucas 2011 GP+Camps 11,413 31,850 Best Multimodal Result Two Modules/MM(D) 32,959 44,520 Recio et al. 2012 ACO Competition 36,031 43,467 Brandstetter and Ahmadi 2012 GP Direction 19,198 33,420 Alhejali and Lucas 2013 MCTS 28,117 62,630 MCTS+GP 32,641 69,010 MM(D) 65,447 100,070 Based on 100 evaluations with 3 lives. Four Maze: Visit each maze once. Competition: Visit each maze four times and advance after 3,000 time steps.

Discussion Obvious division is between edible and threat But these tasks are blended Strict Multitask divisions do not perform well Preference neurons can learn when best to switch Better division: one module when surrounded Very asymmetrical: surprising Highest scoring runs use one module rarely Module activates when Pac-Man almost surrounded Often leads to eating power pill: luring Helps Pac-Man escape in other risky situations

Future Work Go beyond two modules Multimodal behavior of teams Issue with domain or evolution? Multimodal behavior of teams Ghost team in Pac-Man Physical simulation Unreal Tournament, robotics

Conclusion Intelligent module divisions result in best results Modular networks make learning separate modes easier Results are better than previous work Module division unexpected Half of neural resources for seldom-used module (< 5%) Rare situations can be very important Some modules handle multiple modes Pills, threats, edible ghosts

Questions? E-mail: schrum2@cs.utexas.edu Movies: http://nn.cs.utexas.edu/?ol-pm Code: http://nn.cs.utexas.edu/?mm-neat

What is Multimodal Behavior? From Observing Agent Behavior: Agent performs distinct tasks Behavior very different in different tasks Single policy would have trouble generalizing Reinforcement Learning Perspective Instance of Hierarchical Reinforcement Learning A “mode” of behavior is like an “option” A temporally extended action A control policy that is only used in certain states Policy for each mode must be learned as well Idea From Supervised Learning Multitask Learning trains on multiple known tasks

Behavioral Modes vs. Network Modules Different behavioral modes Determined via observation of behavior, subjective Any net can exhibit multiple behavioral modes Different network modules Determined by connectivity of network Groups of “policy” outputs designated as modules (sub-policies) Modules distinct even if behavior is same/unused Network modules should help build behavioral modes Module 1 Module 2 Sensors

Preference Neuron Arbitration How can network decide which module to use? Find preference neuron (grey) with maximum output Corresponding policy neurons (white) define behavior 0.7 > 0.1, So use Module 2 Policy neuron for Module 2 has output of 0.5 [0.6, 0.1, 0.5, 0.7] Output value 0.5 defines agent’s behavior Outputs: Inputs:

Pareto-based Multiobjective Optimization (Pareto 1890) High health but did not deal much damage Tradeoff between objectives Dealt lot of damage, but lost lots of health (Deb et al. 2000)

Non-dominated Sorting Genetic Algorithm II (Deb et al. 2000) Population P with size N; Evaluate P Use mutation (& crossover) to get P´ size N; Evaluate P´ Calculate non-dominated fronts of P È P´ size 2N New population size N from highest fronts of P È P´

Direction Evaluator + Modules Network is evaluated in each direction For each direction, a module is chosen Human-specified (Multitask) or Preference Neurons Chosen module policy neuron sets direction preference [0.5, 0.1, 0.7, 0.6] → 0.6 > 0.1: Left Preference is 0.7 [0.3, 0.8, 0.9, 0.1] → 0.8 > 0.1: Right Preference is 0.3 0.7 > 0.3 Ms. Pac-Man chooses to go left, based on Module 2 [Left Inputs] [Right Inputs]

Discussion (2) Good divisions are harder to discover Some modular champions use only one module Particularly MM(R): new modules too random Are evaluations too harsh/noisy? Easy to lose one life Hard to eat all pills to progress Discourages exploration Hard to discover useful modules Make search more forgiving