Lookahead pathology in real-time pathfinding

Slides:



Advertisements
Similar presentations
Problem solving with graph search
Advertisements

Informed search strategies
16.4 Estimating the Cost of Operations Project GuidePrepared By Dr. T. Y. LinVinayan Verenkar Computer Science Dept San Jose State University.
Previous Lecture: Distributions. Introduction to Biostatistics and Bioinformatics Estimation I This Lecture By Judy Zhong Assistant Professor Division.
Introduction to Algorithms Rabie A. Ramadan rabieramadan.org 2 Some of the sides are exported from different sources.
Informed Search Methods How can we improve searching strategy by using intelligence? Map example: Heuristic: Expand those nodes closest in “as the crow.
ICS-171:Notes 4: 1 Notes 4: Optimal Search ICS 171 Summer 1999.
CPSC 322, Lecture 5Slide 1 Uninformed Search Computer Science cpsc322, Lecture 5 (Textbook Chpt 3.4) January, 14, 2009.
CSE 380 – Computer Game Programming Pathfinding AI
Chapter 10: Hypothesis Testing
CPSC 322 Introduction to Artificial Intelligence October 27, 2004.
Reinforcement Learning
Introduction to AI & AI Principles (Semester 1) WEEK 10 (07/08) [John Barnden’s slides only] School of Computer Science University of Birmingham, UK.
Chapter 5.4 Artificial Intelligence: Pathfinding.
Self-Organizing Agents for Grid Load Balancing Junwei Cao Fifth IEEE/ACM International Workshop on Grid Computing (GRID'04)
Ch 8.1 Numerical Methods: The Euler or Tangent Line Method
Chapter 5.4 Artificial Intelligence: Pathfinding.
Computer Science CPSC 322 Lecture 9 (Ch , 3.7.6) Slide 1.
Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University.
Informed search strategies Idea: give the algorithm “hints” about the desirability of different states – Use an evaluation function to rank nodes and select.
Review: Tree search Initialize the frontier using the starting state While the frontier is not empty – Choose a frontier node to expand according to search.
Lecture 3: Uninformed Search
Simple examples of the Bayesian approach For proportions and means.
1 Random Disambiguation Paths Al Aksakalli In Collaboration with Carey Priebe & Donniell Fishkind Department of Applied Mathematics and Statistics Johns.
Artificial Intelligence Lecture No. 8 Dr. Asad Ali Safi ​ Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.
Multiple-goal Search Algorithms and their Application to Web Crawling Dmitry Davidov and Shaul Markovitch Computer Science Department Technion, Haifa 32000,
Why Minimax Works: An Alternative Explanation Mitja Luštrek 1, Ivan Bratko 2 and Matjaž Gams 1 1 Jožef Stefan Institute, Department of Intelligent Systems.
Sampling Distributions
Review: Tree search Initialize the frontier using the starting state
Uniformed Search (cont.) Computer Science cpsc322, Lecture 6
L2 Sampling Exercise A possible solution.
Chapter 5.4 Artificial Intelligence: Pathfinding
SUR-2250 Error Theory.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Heuristic Functions.
Sampling Distributions
COSC160: Data Structures Linked Lists
Minimax Pathology Mitja Luštrek 1, Ivan Bratko 2 and Matjaž Gams 1
Uniformed Search (cont.) Computer Science cpsc322, Lecture 6
Minimax Pathology Mitja Luštrek 1, Ivan Bratko 2 and Matjaž Gams 1
Navigation In Dynamic Environment
CSE 4705 Artificial Intelligence
Searching for Solutions
Department of Computer Science University of York
Agent-Centered Search
Reference: “Artificial Intelligence for Games”, Ian Millington.
Informed search algorithms
CS 188: Artificial Intelligence Fall 2007
Team 17c ****** Pathfinding
Artificial Intelligence
The Rock Boxers: Tabitha Greenwood Cameron Meade Noah Cahan
A General Backtracking Algorithm
Agent-Centered Search
CPSC 322 Introduction to Artificial Intelligence
Announcements This Friday Project 1 due Talk by Jeniya Tabassum
CPSC 322 Introduction to Artificial Intelligence
HW 1: Warmup Missionaries and Cannibals
Informed Search Idea: be smart about what paths to try.
Thinking too much: Pathology in pathfinding
Introduction to Sampling Distributions
The Rich/Knight Implementation
HW 1: Warmup Missionaries and Cannibals
Informed Search Idea: be smart about what paths to try.
Markov Decision Processes
Minimax Pathology and Real-Number Minimax Model
Markov Decision Processes
Excursions into Parallel Programming
The Rich/Knight Implementation
Presentation transcript:

Lookahead pathology in real-time pathfinding Mitja Luštrek Jožef Stefan Institute, Department of Intelligent Systems Vadim Bulitko University of Alberta, Department of Computer Science

Introduction Problem Explanation

Agent-centered search (LRTS) Lookahead area Current state Goal state Lookahead depth d

Agent-centered search (LRTS) f = g + h True shortest distance g Estimated shortest distance h Frontier state

Agent-centered search (LRTS) Frontier state with the lowest f (fopt)

Agent-centered search (LRTS)

Agent-centered search (LRTS) h = fopt

Agent-centered search (LRTS)

Lookahead pathology Generally believed that larger lookahead depths produce better solutions Solution-length pathology: larger lookahead depths produce worse solutions Lookahead depth Solution length 1 11 2 10 3 8 4 5 7 6 Degree of pathology = 2

Lookahead pathology Pathology on states that do not form a path Error pathology: larger lookahead depths produce more suboptimal decisions Multiple states Depth Error 1 0.31 2 0.25 3 0.21 4 0.24 5 0.18 6 0.23 7 0.12 One state Depth Decision 1 suboptimal 2 3 optimal 4 5 6 7 Degree of pathology = 2 There is pathology

Introduction Problem Explanation

Our setting HOG – Hierarchical Open Graph [Sturtevant et al.] Maps from commercial computer games (Baldur’s Gate, Warcraft III) Initial heuristic: octile distance (true distance assuming an empty map) 1,000 problems (map, start state, goal state)

On-policy experiments The agent follows a path from the start state to the goal state, updating the heuristic along the way Solution length and error over the whole path computed for each lookahead depth -> pathology d = 1 d = 2 d = 3

Off-policy experiments The agent spawns in a number of states It takes one move towards the goal state Heuristic not updated Error is computed from these first moves -> pathology d = 3 d = 1, 2 d = 1 d = 1 d = 2 d = 2, 3 d = 3

Basic on-policy experiment Degree of pathology 1 2 3 4 ≥ 5 Length (problems %) 38.1 12.8 18.2 16.1 9.5 5.3 Error (problems %) 38.5 15.1 20.3 17.0 7.6 1.5 A lot of pathology – over 60%! First explanation: a lot of states are intrinsically pathological (off-policy mode) Not true: only 3.9% are If the topology of the maps is not at fault, perhaps the algorithm is to blame?

Off-policy experiment on 188 states Comparison not fair: On-policy: pathology from error over a number of states Off-policy: pathologicalness of single states Fair: off-policy error over the same number of states as on-policy – 188 (chosen randomly) Can use only error – no solution length off-policy Degree of pathology 1 2 3 ≥ 4 Problems % 57.8 31.4 9.4 1.4 0.0 Not much less pathology than on-policy: 42.2% vs. 61.5%

Tolerance The first off-policy experiment showed little pathology, the second one quite a lot Perhaps off-policy pathology is caused by minor differences in error – noise Introduce tolerence t: increase in error counts towards the pathology only if error (d1) > t ∙ error (d2) set t so that the pathology in the off-policy experiment on 188 states is < 5%: t = 1.09

Experiments with t = 1.09 Degree of pathology 1 2 3 4 ≥ 5 1 2 3 4 ≥ 5 On-policy (prob. %) 42.3 19.7 21.2 12.9 3.6 0.3 Off-policy (prob. %) 95.7 3.7 0.6 0.0 On-policy changes little vs. t = 1: 57.7% vs. 61.9% Apparently on-policy pathology is more severe than off-policy Investigate why! The above experiments are the basic on-policy experiment and the basic off-policy experiment

Introduction Problem Explanation

Hypothesis 1 LRTS tends to visit pathological states with an above-average frequency Test: compute pathology from states visited on-policy instead of 188 random states Degree of pathology 1 2 3 ≥ 4 Problems % 93.6 5.3 0.9 0.2 0.0 More pathology than in random states: 6.3% vs. 4.3% Much less pathology than basic on-policy: 6.3% vs. 57.7% Hypothesis 1 is correct, but it is not the main reason for on-policy pathology

Is learning the culprit? There is learning (updating the heuristic) on-policy, but not off-policy Learning necessary on-policy, otherwise the agent gets caught in infinite loops Test: traverse paths in the normal on-policy manner, measure error without learning Degree of pathology 1 2 3 4 ≥ 5 Problems % 79.8 14.2 4.5 1.2 0.3 0.0 Less pathology than basic on-policy: 20.2% vs. 57.7% Still more pathology than basic off-policy: 20.2% vs. 4.3% Learning is a reason, although not the only one

Hypothesis 2 Larger fraction of updated states at smaller depths Current lookahead area Updated state

Hypothesis 2 Smaller lookahead depths benefit more from learning This makes their decisions better than the mere depth suggests Thus they are closer to larger depths If they are closer to larger depths, cases where a larger depth happens to be worse than a smaller depth are more common Test: equalize depths by learning as much as possible in the whole lookahead area – uniform learning

Uniform learning

Uniform learning Search

Uniform learning Update

Uniform learning Search

Uniform learning Update

Uniform learning

Uniform learning

Uniform learning

Uniform learning

Pathology with uniform learning Degree of pathology 1 2 3 4 ≥ 5 Problems % 40.9 20.2 22.1 12.3 4.2 0.3 Even more pathology than basic on-policy: 59.1% vs. 57.7% Is Hypothesis 2 wrong? Let us look at the volume of heuristic updates encountered per state generated during search This seems to be the best measure of the benefit of learning

Volume of updates encountered Hypothesis 2 is correct after all

Hypothesis 3 On-policy: one search every d moves, so fewer searchs at larger depths Off-policy: one search every move

Hypothesis 3 The difference between depths in the amount of search is smaller on-policy than off-policy This makes the depths closer on-policy If they are closer, cases where a larger depth happens to be worse than a smaller depth are more common Test: search every move on-policy

Pathology when searching every move Degree of pathology 1 2 3 4 ≥ 5 Problems % 86.9 9.0 3.3 0.6 0.2 0.0 Less pathology than basic on-policy: 13.1% vs. 57.7% Still more pathology than basic off-policy: 13.1% vs. 4.3% Hypothesis 3 is correct, the remaining pathology due to Hypotheses 1 and 2 Further test: number of states generated per move

States generated / move Hypothesis 3 confirmed again

Summary of explanation On-policy pathology caused by different lookahead depths being closer to each other in terms of the quality of decisions than the mere depths would suggest: due to the volume of heuristic updates ecnountered per state generated due to the number of states generated per move LRTS tends to visit pathological states with an above-average frequency

Thank you. Questions?