Lecture 3: Behavior Selection Gal A. Kaminka Introduction to Robots and Multi-Robot Systems Agents in Physical and Virtual Environments.

Slides:



Advertisements
Similar presentations
AI Pathfinding Representing the Search Space
Advertisements

ARCHITECTURES FOR ARTIFICIAL INTELLIGENCE SYSTEMS
COS 461 Fall 1997 Routing COS 461 Fall 1997 Typical Structure.
Lecture 8: Three-Level Architectures CS 344R: Robotics Benjamin Kuipers.
BehaviorNet An Action Selection Mechanism Aregahegn Negatu And Conscious Software Research Group.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 7: Learning in recurrent networks Geoffrey Hinton.
Chapter 4: Trees Part II - AVL Tree
5-1 Chapter 5: REACTIVE AND HYBRID ARCHITECTURES.
Lecture 4: Command and Behavior Fusion Gal A. Kaminka Introduction to Robots and Multi-Robot Systems Agents in Physical and Virtual Environments.
Problem Solving Agents A problem solving agent is one which decides what actions and states to consider in completing a goal Examples: Finding the shortest.
Intelligent systems Lecture 6 Rules, Semantic nets.
Rule Based Systems Michael J. Watts
Artificial Intelligence in Game Design Introduction to Learning.
Chapter 12: Expert Systems Design Examples
Lecture 6: Hybrid Robot Control Gal A. Kaminka Introduction to Robots and Multi-Robot Systems Agents in Physical and Virtual Environments.
Lecture 2: Reactive Systems Gal A. Kaminka Introduction to Robots and Multi-Robot Systems Agents in Physical and Virtual Environments.
Mobile and Wireless Computing Institute for Computer Science, University of Freiburg Western Australian Interactive Virtual Environments Centre (IVEC)
1 Using Search in Problem Solving Part II. 2 Basic Concepts Basic concepts: Initial state Goal/Target state Intermediate states Path from the initial.
1 Wednesday, June 28, 2006 Command, n.: Statement presented by a human and accepted by a computer in such a manner as to make the human feel that he is.
Reinforcement Learning
Prof. Bodik CS 164 Lecture 171 Register Allocation Lecture 19.
Behavior Coordination Mechanisms – State-of-the- Art Paper by: Paolo Pirjanian (USC) Presented by: Chris Martin.
Register Allocation (via graph coloring)
Autonomous Mobile Robots CPE 470/670 Lecture 8 Instructor: Monica Nicolescu.
Integrating POMDP and RL for a Two Layer Simulated Robot Architecture Presented by Alp Sardağ.
Topics: Introduction to Robotics CS 491/691(X) Lecture 8 Instructor: Monica Nicolescu.
Register Allocation (via graph coloring). Lecture Outline Memory Hierarchy Management Register Allocation –Register interference graph –Graph coloring.
1 Liveness analysis and Register Allocation Cheng-Chia Chen.
4/29/09Prof. Hilfinger CS164 Lecture 381 Register Allocation Lecture 28 (from notes by G. Necula and R. Bodik)
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
Chapter 11: Artificial Intelligence
An Architecture for Empathic Agents. Abstract Architecture Planning + Coping Deliberated Actions Agent in the World Body Speech Facial expressions Effectors.
ANTs PI Meeting, Nov. 29, 2000W. Zhang, Washington University1 Flexible Methods for Multi-agent distributed resource Allocation by Exploiting Phase Transitions.
Processes and OS basics. RHS – SOC 2 OS Basics An Operating System (OS) is essentially an abstraction of a computer As a user or programmer, I do not.
System Model Deadlock Characterization Methods for Handling Deadlocks Deadlock Prevention, Avoidance, and Detection Recovering from Deadlock Combined Approach.
Artificial Intelligence 2005/06 Partially Ordered Plans - or: "How Do You Put Your Shoes On?"
Title: Diagnosing a team of agents: Scaling up Written by: Meir Kalech and Gal A. Kaminka Presented by: Reymes Madrazo-Rivera.
Graph Colouring L09: Oct 10. This Lecture Graph coloring is another important problem in graph theory. It also has many applications, including the famous.
Institute for Computer Science VI Autonomous Intelligent Systems
Robotica Lecture Review Reactive control Complete control space Action selection The subsumption architecture –Vertical vs. horizontal decomposition.
1 Computer Systems II Introduction to Processes. 2 First Two Major Computer System Evolution Steps Led to the idea of multiprogramming (multiple concurrent.
Finding Optimal Solutions to Cooperative Pathfinding Problems Trevor Standley and Rich Korf Computer Science Department University of California, Los Angeles.
AI Lecture 17 Planning Noémie Elhadad (substituting for Prof. McKeown)
Subsumption Architecture and Nouvelle AI Arpit Maheshwari Nihit Gupta Saransh Gupta Swapnil Srivastava.
Basic Problem Solving Search strategy  Problem can be solved by searching for a solution. An attempt is to transform initial state of a problem into some.
Behavior-based Multirobot Architectures. Why Behavior Based Control for Multi-Robot Teams? Multi-Robot control naturally grew out of single robot control.
 In this packet we will look at:  The meaning of acceleration  How acceleration is related to velocity and time  2 distinct types acceleration  A.
Intro to Planning Or, how to represent the planning problem in logic.
Finite State Machines (FSM) OR Finite State Automation (FSA) - are models of the behaviors of a system or a complex object, with a limited number of defined.
ARTIFICIAL INTELLIGENCE (CS 461D) Princess Nora University Faculty of Computer & Information Systems.
Introduction to State Space Search
Chapter 13 Backtracking Introduction The 3-coloring problem
Time Management.  Time management is concerned with OS facilities and services which measure real time.  These services include:  Keeping track of.
Artificial Intelligence in Game Design Lecture 20: Hill Climbing and N-Grams.
Brian Williams, Fall 041 Analysis of Uninformed Search Methods Brian C. Williams Sep 21 st, 2004 Slides adapted from: Tomas Lozano Perez,
Learning Procedural Knowledge through Observation -Michael van Lent, John E. Laird – 인터넷 기술 전공 022ITI02 성유진.
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
Process Management Deadlocks.
ECE 448 Lecture 4: Search Intro
Build Intelligence from the bottom up!
Build Intelligence from the bottom up!
Intra-Domain Routing Jacob Strauss September 14, 2006.
CSCI1600: Embedded and Real Time Software
Hidden Markov Models Part 2: Algorithms
Instructor: Shengyu Zhang
CIS 488/588 Bruce R. Maxim UM-Dearborn
Build Intelligence from the bottom up!
Subsuption Architecture
CSE 417: Algorithms and Computational Complexity
CSCI1600: Embedded and Real Time Software
Presentation transcript:

Lecture 3: Behavior Selection Gal A. Kaminka Introduction to Robots and Multi-Robot Systems Agents in Physical and Virtual Environments

© Gal Kaminka 2 Previously, on Robots … Multiple levels of control: Behaviors Avoid Object Wander Explore Map Monitor Change Identify objects Plan changes

© Gal Kaminka 3 Subsuming Layers How to make sure overall output is coherent? e.g., avoid object is in conflict with explore Subsumption hierarchy: Higher levels modify lower Avoid Object Wander Explore Map

© Gal Kaminka 4 This week, on Robots …. Behavior Selection/Arbitration Activation-based selection winner-take-all selection argmax selection (priority, utility, success likelihood, … ) Behavior networks Goal-oriented behavior-based control Takes a direct aim at key weaknesses of reactive approach Behavior hierarchies

© Gal Kaminka 5 Behavior Selection (Arbitration) One behavior takes over completely All sensors, actions controlled by the behavior Behaviors compete for control Key questions: How do we select the correct behavior? When do we terminate the selected behavior?

© Gal Kaminka 6 Maes’ Actions Selection Mechanism (MASM) Some key highlights: Merges some planning with behavior-based control Goal-oriented, allows predictions Responsive, allows reactivity “Speed vs. thought” trade-off Lots of number-hacking A later article addressed this issue with learning However, complex environment may suffer from this

© Gal Kaminka 7 Overall Structure Behaviors: preconditions, delete/add lists, activation Activation links spread positive and negative activation Sensor Goal Behavior

© Gal Kaminka 8 Behaviors Similar to a fully-instantiated planning operator No variables (i.e, pick-up-A, not pick-up(A) Preconditions (what must be true to be executable) Add/delete list (what changes once behavior executes) Behavior

© Gal Kaminka 9 Connecting Behaviors Activation: Sensors to behaviors with matching preconditions Sensor Behavior

© Gal Kaminka 10 Connecting Behaviors Activation: Sensors to behaviors with matching preconditions Add lists to behaviors with matching preconditions Sensor Behavior

© Gal Kaminka 11 Connecting Behaviors (Backward) Activation: Goals to behaviors with matching add lists Behaviors to behaviors with matching add lists Sensor Behavior Goal

© Gal Kaminka 12 Connecting Behaviors (Backward) Advantages: Goal-orientedness (goal drives behaviors) Reactivity (sensors drive behaviors) Parameterized! Sensor Behavior Goal

© Gal Kaminka 13 Handling Conflicts Conflicting behaviors inhibit each other This is a winner-take-all configuration Sensor Behavior Goal Sensor Goal Behavior

© Gal Kaminka 14 Winner Take All A very basic structure in neural networks Relies on recurrence Key idea: Nodes compete by inhibiting each other After some cycles, winner emerges This is useful in many neural models of behavior

© Gal Kaminka 15 Basic Structure Each node excited by incoming information Each node’s activation inhibits its competitors

© Gal Kaminka 16 First activation Darker == more activation (2 is most active, 1 least)

© Gal Kaminka 17 After a few cycles 3 and 2 stronger than 1, so 1 quickly deactivates 2 slightly stronger than 3, so 3 slowly deactivates

© Gal Kaminka 18 After a few more cycles Once 1 is out of picture, only 2 and 3 compete 2 becomes stronger: a weaker 3 inhibits 2 less

© Gal Kaminka 19 Until finally…. Only output from 2 remains

© Gal Kaminka 20 Winner Take All Output from winning node ends up being used Typically, if over a threshold Once node becomes active, never lets in any other A basic problem. Standard solutions: reset after some time, decay, … This mechanism can be used to solve competition Activation is key feature/requirement

© Gal Kaminka 21 Running a behavior network Let activation spread for a while, wait for threshold Once behavior over threshold, execute it Reset activation after it’s done Sensor Behavior Goal Sensor Goal Behavior

© Gal Kaminka 22 Advantages We’ve discussed planned vs. reactive behavior Threshold value changes “speed vs. thought” Larger threshold, more behaviors involved before selection Small threshold, less likely to find optimal chain This is not hybrid architecture—really something new!

© Gal Kaminka 23 Criticisms Where will this fail? Succeed? What needs improvement? What does not? What tasks is it good for? As scientists, you must always ask yourself these questions

© Gal Kaminka 24 Protected Goals Sussman Anomaly: Given: A on B, B on table, C on table Do: A on B, B on C, C on table No way to do this without undoing a subgoal If one is not careful, might go into thrashing Take off A, put A back, Take off A, …. Maes added mechanism for protected goals Not clear where protection comes from

© Gal Kaminka 25 Other problems with MASM No variables  Blow up in the number of behaviors Thrashing: Behavior resets, then re-selected Bug in activation algorithm: Activation from goals is divided by number of goals Thus a behavior satisfying more goals is not preferred Additional minor issues like this found, corrected later Tyrell 1993,1994, Dorer 1999, Blumberg 1994, …

© Gal Kaminka 26 Reminder We are talking about behavior selection Multiple behaviors exist Question is which one to choose Behaviors compete for control of robot Behavior networks have activation: Goal priority “meets” sensor data (preconditions, effects) Winner-take-all selection

© Gal Kaminka 27 Activation-based selection For each behavior, build an activation function How useful it is (utility, value) How urgent it is (priority) How likely it is to succeed (likelihood of success) How much it matches current state (applicability) …. Can of course combine these (e.g., utility X priority) Select behavior with top activation Let it run Re-evaluate all activations

© Gal Kaminka 28 Formal behavior selection Behaviors are arranged in a DAG DAG: Directed Acyclic Graph B set of behaviors (vertices) E set of edges (a,b), where a, b in B. The graph is structured hierarchically: Single root behavior is most general leaf behaviors correspond to primitive actions A path from every behavior to at least one primitive behavior children(b) = { all behaviors a, such that (b,a) is in E }

© Gal Kaminka 29 Hierarchical behaviors The root behavior is always active An active behavior with no active child must select one An active behavior can decide to deactivate itself WinGame Play Interrupt Attack-CenterZone Defense MoveKickPass ClearTurn Attack Pincer

© Gal Kaminka 30 argmax selection At any given time, select behavior whose priority value likelihood of success applicability is greatest No sequence of behaviors known in advance Many instances of behaviors can co-exist, compete

© Gal Kaminka 31 Formally …. f(b) be a function which gives the behavior’s activation Then the arbitration result is: argmax c (f(c)), where c in children(b) For instance, to choose by value, argmax c (value(c)) Or, to choose by priority, argmax c (priority(c) Or decision-theoretic choice, argmax c (probability(c) * value(c))

© Gal Kaminka 32 Subsumption as argmax selection Subsumption level of behavior b, given by level(b) Applicability of behavior b, given by app(b) or 1 Subsumption arbitration: argmax b (app(b) * level(b)) Avoid Object Wander Explore Map

© Gal Kaminka 33 Case Study: HandleBall Arbitrator (ChaMeleons’01) HandleBall behavior triggered when player has ball Must select between multiply-instantiated children: shoot on goal, pass for shot, pass forward, dribble to goal dribble forward, clear, pass to closer, …. We defined a complex arbitrator combining: priority, and likelyhood of success

© Gal Kaminka 34 HandleBall Example

© Gal Kaminka 35 “Number-hacking”: Thrashing de-selection and re-selection of behaviors the time sensor value around threshold

© Gal Kaminka 36 “Number-hacking”: Sensitivity Sensitivity to specific values, ranges Manually adjusting values by 0.1 to get a wanted result… Where do the numbers come from? Learning? e.g., programmer forgot a range of values? e.g., programmer needs to extend range

© Gal Kaminka 37 State-Based Selection State-based selection Look at world and internal state to make selection Behaviors as operators? Almost. Pre-conditions, termination-conditions Selection control rules (non-numeric preferences, priorities) Finite state machines and hierarchical machines

© Gal Kaminka 38 State-Based Behavior Selection Elements from reactive control, but with internal state Quick response to sensor readings Sensor-driven operation Behaviors maintain internal state e.g., previously-executed behaviors e.g., previous sensor readings …

© Gal Kaminka 39 Behaviors as operators Conditions: Preconditions: When is it applicable? Termination conditions: When is it done? Conditions test sensors, internal state Must maintain World Model Can be simple (e.g., vector of sensor readings) Or complex (e.g., internal variables, previous readings)

© Gal Kaminka 40 State-Based Selection: Architecture World Model (beliefs) Behavior Command Scheduling

© Gal Kaminka 41 State-Based Selection: Architecture World Model (beliefs) Behavior Command Scheduling

© Gal Kaminka 42 State-Based Selection: Architecture World Model (beliefs) Behavior Command Scheduling

© Gal Kaminka 43 Conflicting Behaviors What if more than one behavior matches? World Model (beliefs) Behavior Command Scheduling

© Gal Kaminka 44 Preference Rules Prefer one behavior over another Provide “local guidance” Do not consider all possible cases, nor global ranking Test world model (which also records behaviors) World Model (beliefs) Behavior Command Scheduling Preference Rules

© Gal Kaminka 45 שאלות?

© Gal Kaminka 46 What’s in a world model? World Model (beliefs) Behavior Command Scheduling Preference Rules

© Gal Kaminka 47 What’s in a world model? A vector of sensor readings Distance front = 250 Light Left = Detected Battery = Medium level A vector of virtual sensors Distance front < 90 AND light front Average front distances = Complex Simple

© Gal Kaminka 48 What’s in a world model? A vector processed data Estimated X, Y from detected landmarks Seen purple blob at pixel 2,5 Communication from teammate A vector of world models Position of opponent 2 seconds ago My position 10 seconds ago Complex Simple

© Gal Kaminka 49 Hierarchical Behaviors Hierarchies allow designer to build reusable behaviors At any given moment, a path is selected All behaviors in the path are active May issue action commands Monitor sensors This is different from a function call stack What happens when behavior terminates?

© Gal Kaminka 50 Case Study: ModSAF Preference rules manage high-priority interrupts Preconditions dictate ordering Execute Mission Fly Flight PlanWait-at-Point Fly RouteLand NOELowContour UnmaskShoot Find Position HaltJoin ScoutEngage

© Gal Kaminka 51 State-based selection Preconditions and termination conditions Effective, allow flexible re-use Very complex behavior generated Thrashing still very much a problem

© Gal Kaminka 52 Finite State Machines: Avoid Thrashing by Sequencing Every state represents a behavior Transitions are triggered by sensor readings Start A A B C A C AD C

© Gal Kaminka 53 Example: Foraging Pick Up Close to Puck Go Home Acquire Have Puck Drop At Home

© Gal Kaminka 54 Example: Foraging Pick Up Close to Puck Go Home Acquire Lost Puck Have Puck Drop At Home

© Gal Kaminka 55 Hierarchical Finite State Machines A behaviors can be decomposed into others Decomposition selected based on sensors, memory Start A A B C A C AD C

© Gal Kaminka 56 BITE: Bar Ilan Teamwork Engine Combining FSAs and state-based selection Multiple opportunities for arbitration Temporal (what comes next) Hierarchical (which child should be selected) Prevention of cycling, thrashing e.g., by keeping record of which child was recently selected

© Gal Kaminka 57 שאלות?

© Gal Kaminka 58 Homework #2 1. Propose algorithms for detecting (a) thrashing, (b) cycling. The algorithms must be appropriate for execution on robots. 2. One of the advantages of the state-based and activation- based approaches (non-FSA) is that they allow opportunism. Using FSAs limits this opportunism, since behaviors are executed in pre-determined sequences. Propose a method to allow opportunism in FSAs. 3. Propose a technique for resolving thrashing and cycling once detected.