Presentation is loading. Please wait.

Presentation is loading. Please wait.

Uncertain Multiagent Systems: Games and Learning H. Jin Kim, Songhwai Oh and Shankar Sastry University of California, Berkeley July 17, 2002 Decision-Making.

Similar presentations


Presentation on theme: "Uncertain Multiagent Systems: Games and Learning H. Jin Kim, Songhwai Oh and Shankar Sastry University of California, Berkeley July 17, 2002 Decision-Making."— Presentation transcript:

1 Uncertain Multiagent Systems: Games and Learning H. Jin Kim, Songhwai Oh and Shankar Sastry University of California, Berkeley July 17, 2002 Decision-Making under Uncertainty ONR MURI

2 Outline Hierarchical architecture for multiagent operations Partial observation Markov games (POMGame) Berkeley pursuit-evasion game (PEG) setup From PEG to unmanned dynamic battlefield – Model predictive techniques for dynamic replanning – Multi-target tracking (detect  ID  track) – Dynamic model selection for estimating adversarial intent

3 Partial-observation Probabilistic Pursuit- Evasion Game (PEG) with 4 UGVs & 1 UAV A prototype system of fully autonomous mobile teams of intelligent and networked sensing agents deployed to discover and track mobile targets in unmapped environments

4 Uncertainty pervades every layer! Hierarchy in Berkeley Platform actuator positions inertial positions height over terrain obstacles detected targets detected control signals INSGPS ultrasonic altimeter vision state of agents obstacles detected targets detected obstacles detected agents positions desired agents actions Tactical Planner & Regulation Vehicle-level sensor fusion Strategy PlannerMap Builder position of targets position of obstacles positions of agents Communications Network tactical planner trajectory planner regulation lin. accel. ang. vel. Targets Exogenous disturbance UAV dynamics Terrain actuator encoder s UGV dynamics

5 Pursuit-Evasion Game Experiment Setup Ground Command Post Waypoint Commands Position & vehicle status Pursuer UAV Evader UGV Evader location detected by vision system Pursuer UGVs

6 Information Flow in UC Berkeley PEG Platform Wireless Network Pursuer UAV Ground-based Strategy Planner Current Coordination of Agent Processed Vision Input Map Builder Policy Calculator Probability Map Pursuer UGV Evader UGV Flight Computer Vision Computer Motion Controller Vision Computer Map Builder Display Info Current Position for Ground Station Display Waypoint Requests Vision Data Current Position Waypoint Requests Vision Data Current Position Agent Position Requests

7 Lessons Learned and UAV/UGV Objective Scalable/replicable system that deliver mission reliably under uncertainty and evaluate their performance Hierarchical architecture design and analysis –High-level decision making in a discrete space –Physical-layer control in a continuous space Hierarchical decomposition requires tight interaction between layers to achieve cooperative behavior, to deconflict and to support constraints. Confronting uncertainty arising from partially observable, dynamically changing environments and intelligent adversaries

8 Representing and Managing Uncertainty Uncertainty is introduced in various channels –Sensing unable to determine the current state of world –Prediction  unable to infer the future state of world –Actuation  unable to make the desired action to properly affect the state of world Different types of uncertainty can be addressed by different approaches –Nondeterministic uncertainty : Robust Control –Probabilistic uncertainty : (Partially Observable) Markov Decision Processes –Adversarial uncertainty : Game Theory POMGame

9 Partial Observation Markov Games (POMGame)

10 Policy for POMGames Optimal value function of a state –the expected sum of a reward that agent will gain by executing the optimal policy starting from that state: Poorly understood: analysis exists only for very specially structured games such as a game with a complete information on one side Special case : partially observable Markov decision processes (POMDP)

11 Berkeley Pursuit-Evasion Game (PEG) Setup

12 Abstraction of Pursuit-Evasion Game A partial-observation stochastic pursuit-evasion game in a 2-D grid world, between (heterogeneous) teams of n e evaders and n p pursuers. At each time t, –Each evader and pursuer, located at and respectively, –takes the observation over its visibility region –updates the belief state –chooses action from Goal: capture of the evader, or survival

13 Performance measure : capture time Optimal policy minimizes the cost Optimal Pursuit Policy

14 Optimal Pursuit Policy – Dynamic Programming Formulation

15 Persistent Pursuit Policies Solving for the optimal policy of the partial observation Markov games of non-trivial size using dynamic programming is computationally intractable. If the pursuit policy is persistent with a period T, then the expected capture time is bounded.

16 Example of Persistent Pursuit Policies Greedy Policy –Pursuer moves to the neighboring cell with the highest probability of having an evader at the next instant –Strategic planner assigns more importance to local or immediate considerations Global Maximum Policy –Pursuer moves toward the global location with the highest probability, weighted by some distance metric, of having an evader at the next instant

17 Experimental Results: Pursuit Evasion Games with Four UGVs and a UAV

18 Game-theoretic Policy Search Paradigm Large number of variables affect the solution Many interesting games including pursuit-evasion are a large game with partial information, and finding optimal solutions is well outside the capability of current algorithms Approximate solution is not necessarily bad. There might be simple policies with satisfactory performances Choose a good policy from a restricted class of policies ! We can find approximately optimal solutions from restricted classes, using a sparse sampling and a provably convergent policy search algorithm

19 Constructing a Policy Class Given a mission with specific goals, we –decompose the problem in terms of the functions that need to be achieved for success and the means that are available –analyze how a human team would solve the problem –determine a list of important factors that complicate task performance such as safety or physical constraints Maximize aerial coverage, Stay within a communications range, Penalize actions that lead an agent to a danger zone, Maximize the explored region, Minimize fuel usage, …

20 Policy Representation Quantize the above features and define a feature vector that consists of the estimate of above quantities for each action given agents’ history Estimate the ‘goodness’ of each action by a function where is the weighting vector to be learned. Choose an action that maximizes. Or choose a randomized action according to the distribution

21 Example: Policy Feature Maximize collective aerial coverage -> maximize the distance between agents where is the location of pursuer that will be landed by taking action from Try to visit an unexplored region with high possibility of detecting an evader where is a position arrived by the action that maximizes the evader map value along the frontier

22 Prioritize actions that are more compatible with the dynamics of agents Policy representation Example: Policy Feature (Continued)

23 Benchmarking Experiments Performance of two pursuit policies compared in terms of capture time Experiment 1 : two pursuers against the evader who moves greedily with respect to the pursuers’ location Experiment 2 : When the position of evader at each step is detected by the sensor network with only 10% accuracy, two optimized pursuers took 24.1 steps, while the one-step greedy pursuers took over 146 steps in average to capture the evader in 30 by 30 grid. Grid size1-Greedy pursuersOptimized pursuers 10 by 10(7.3, 4.8)*(5.1, 2.7) 20 by 20(42.3, 19.2)(12.3, 4.3) * (mean, standard deviation)

24 Why General-sum Games? " All too often in OR dealing with military problems, war is viewed as a zero-sum two-person game with perfect information. Here I must state as forcibly as I know that war is not a zero-sum two-person game with perfect information. Anybody who sincerely believes it is a fool. Anybody who reaches conclusions based on such an assumption and then tries to peddle these conclusions without revealing the quicksand they are constructed on is a charlatan....There is, in short, an urgent need to develop positive-sum game theory and to urge the acceptance of its precepts upon our leaders throughout the world." Joseph H. Engel, Retiring Presidential Address to the Operations Research Society of America, October 1969

25 General-sum Games Depending on the cooperation between the players, – Noncooperative –Cooperative Depending on the least expected payoff that a player is willing to accept- Nash’s special/general bargaining solution By restricting the blue and red policy class to be the finite size, we reduce the POMGame into the bimatrix game.

26 From PEG to Combat Scenarios Adversarial attack –Reds just do not evade, but also attack -> Blues cannot blindly pursue reds. Unknown number/capability of adversary -> Dynamic selection of the relevant red model from unstructured observation Deconfliction between layers and teams Increase number of feature -> Diversify possible solutions when the uncertainty is high

27 From POMGame To Bimatrix Game

28 Dynamic Bayesian Model Selection Dynamic Bayesian model selection (DBMS) is a generalized model selection approach to time series data of which the number of components can vary with time If K is the number of the components at any instance and T is the length of the time series, then there are O(2 KT ) possible models which demands an efficient algorithm The problem is formulated using Bayesian hierarchical modeling and solved using reversible jump MCMC methods suitably adapted.

29 DBMS

30 DBMS: Graphical Representation   – Dirichlet prior  A – Transition matrix for m t   t – Dirichlet prior  w t – component weights  z t – allocation variable  F – transition dynamics

31 DBMS

32 DBMS: Multi-target Tracking Example

33 Estimated target position + True target trajectory Observation

34 Estimated target position + True target trajectory Observation

35 Summary Decomposition of complex multiagent operation problems requires tighter interaction between subsystems and human intervention Partial observation Markov games provides a mathematical representation of a hierarchical multiagent system operating under adversarial and environmental uncertainty Policy class framework provides a setup for including human experience Policy search methods and sparse sampling produce computationally tractable algorithms to generate approximate solutions to partially observable Markov games. Model predictive (receding horizon) techniques can be used for dynamic replanning to deconflict/coordinate between vehicles, layers or subtasks

36 THE END

37 Acting under Partial Observations We need to use memory of previous actions and observations to disambiguate the current state. The state estimate, or belief state –Posterior probability distribution over states –The likelihood the world is actually in the state x, at time t, given the agent’s past experience (I.e. actions and observation histories).

38 Updating Belief State –Can be updated recursively using the estimated world model and Bayes’ rule. New info on the state of world New info on prediction

39 Pursuit-Evasion Game Experiment PEG with four UGVs Global-Max pursuit policy Simulated camera view (radius 7.5m with 50degree conic view) Pursuer=0.3m/s Evader=0.5m/s MAX

40 Pursuit-Evasion Game Experiment PEG with four UGVs Global-Max pursuit policy Simulated camera view (radius 7.5m with 50degree conic view) Pursuer=0.3m/s Evader=0.5m/s MAX

41 Experimental Results: Evaluation of Policies for Different Visibility Global max policy performs better than greedy, since the greedy policy selects movements based only on local considerations. Both policies perform better with the trapezoidal view, since the camera rotates fast enough to compensate the narrow field of view. Capture time of greedy and global-max for the different region of visibility of pursuers Three pursuers with trapezoidal or omni-directional view Randomly moving evader

42 Experimental Results: Evader’s Speed vs. Intelligence Having a more intelligent evader increases the capture time Harder to capture an intelligent evader at a higher speed The capture time of a fast random evader is shorter than that of a slower random evader, when the speed of evader is only slightly higher than that of pursuers. Capture time for different speeds and levels of intelligence of the evader Three pPursuers with a trapezoidal view & global maximum policy Max speed of pursuers: 0.3 m/s

43 Coordination under Multiple Sources of Commands When different agents or layers specify multiple, possibly conflicting goals or actions, how the system can prioritize or resolve them ? –a priori assignment of the degrees of authority –Surge in coordination demand when the situation deviates from textbook cases: can the overall system adapt real-time? Intermediate, cooperative modes of interaction between layers, agents and human operator based on anticipatory reasoning is desirable


Download ppt "Uncertain Multiagent Systems: Games and Learning H. Jin Kim, Songhwai Oh and Shankar Sastry University of California, Berkeley July 17, 2002 Decision-Making."

Similar presentations


Ads by Google