Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lookahead pathology in real-time pathfinding

Similar presentations


Presentation on theme: "Lookahead pathology in real-time pathfinding"— Presentation transcript:

1 Lookahead pathology in real-time pathfinding
Mitja Luštrek Jožef Stefan Institute, Department of Intelligent Systems Vadim Bulitko University of Alberta, Department of Computer Science

2 Introduction Problem Explanation

3 Agent-centered search (LRTS)
Lookahead area Current state Goal state Lookahead depth d

4 Agent-centered search (LRTS)
f = g + h True shortest distance g Estimated shortest distance h Frontier state

5 Agent-centered search (LRTS)
Frontier state with the lowest f (fopt)

6 Agent-centered search (LRTS)

7 Agent-centered search (LRTS)
h = fopt

8 Agent-centered search (LRTS)

9 Lookahead pathology Generally believed that larger lookahead depths produce better solutions Solution-length pathology: larger lookahead depths produce worse solutions Lookahead depth Solution length 1 11 2 10 3 8 4 5 7 6 Degree of pathology = 2

10 Lookahead pathology Pathology on states that do not form a path
Error pathology: larger lookahead depths produce more suboptimal decisions Multiple states Depth Error 1 0.31 2 0.25 3 0.21 4 0.24 5 0.18 6 0.23 7 0.12 One state Depth Decision 1 suboptimal 2 3 optimal 4 5 6 7 Degree of pathology = 2 There is pathology

11 Introduction Problem Explanation

12 Our setting HOG – Hierarchical Open Graph [Sturtevant et al.]
Maps from commercial computer games (Baldur’s Gate, Warcraft III) Initial heuristic: octile distance (true distance assuming an empty map) 1,000 problems (map, start state, goal state)

13 On-policy experiments
The agent follows a path from the start state to the goal state, updating the heuristic along the way Solution length and error over the whole path computed for each lookahead depth -> pathology d = 1 d = 2 d = 3

14 Off-policy experiments
The agent spawns in a number of states It takes one move towards the goal state Heuristic not updated Error is computed from these first moves -> pathology d = 3 d = 1, 2 d = 1 d = 1 d = 2 d = 2, 3 d = 3

15 Basic on-policy experiment
Degree of pathology 1 2 3 4 ≥ 5 Length (problems %) 38.1 12.8 18.2 16.1 9.5 5.3 Error (problems %) 38.5 15.1 20.3 17.0 7.6 1.5 A lot of pathology – over 60%! First explanation: a lot of states are intrinsically pathological (off-policy mode) Not true: only 3.9% are If the topology of the maps is not at fault, perhaps the algorithm is to blame?

16 Off-policy experiment on 188 states
Comparison not fair: On-policy: pathology from error over a number of states Off-policy: pathologicalness of single states Fair: off-policy error over the same number of states as on-policy – 188 (chosen randomly) Can use only error – no solution length off-policy Degree of pathology 1 2 3 ≥ 4 Problems % 57.8 31.4 9.4 1.4 0.0 Not much less pathology than on-policy: 42.2% vs. 61.5%

17 Tolerance The first off-policy experiment showed little pathology, the second one quite a lot Perhaps off-policy pathology is caused by minor differences in error – noise Introduce tolerence t: increase in error counts towards the pathology only if error (d1) > t ∙ error (d2) set t so that the pathology in the off-policy experiment on 188 states is < 5%: t = 1.09

18 Experiments with t = 1.09 Degree of pathology 1 2 3 4 ≥ 5
1 2 3 4 ≥ 5 On-policy (prob. %) 42.3 19.7 21.2 12.9 3.6 0.3 Off-policy (prob. %) 95.7 3.7 0.6 0.0 On-policy changes little vs. t = 1: 57.7% vs. 61.9% Apparently on-policy pathology is more severe than off-policy Investigate why! The above experiments are the basic on-policy experiment and the basic off-policy experiment

19 Introduction Problem Explanation

20 Hypothesis 1 LRTS tends to visit pathological states with an above-average frequency Test: compute pathology from states visited on-policy instead of 188 random states Degree of pathology 1 2 3 ≥ 4 Problems % 93.6 5.3 0.9 0.2 0.0 More pathology than in random states: 6.3% vs. 4.3% Much less pathology than basic on-policy: 6.3% vs. 57.7% Hypothesis 1 is correct, but it is not the main reason for on-policy pathology

21 Is learning the culprit?
There is learning (updating the heuristic) on-policy, but not off-policy Learning necessary on-policy, otherwise the agent gets caught in infinite loops Test: traverse paths in the normal on-policy manner, measure error without learning Degree of pathology 1 2 3 4 ≥ 5 Problems % 79.8 14.2 4.5 1.2 0.3 0.0 Less pathology than basic on-policy: 20.2% vs. 57.7% Still more pathology than basic off-policy: 20.2% vs. 4.3% Learning is a reason, although not the only one

22 Hypothesis 2 Larger fraction of updated states at smaller depths
Current lookahead area Updated state

23 Hypothesis 2 Smaller lookahead depths benefit more from learning
This makes their decisions better than the mere depth suggests Thus they are closer to larger depths If they are closer to larger depths, cases where a larger depth happens to be worse than a smaller depth are more common Test: equalize depths by learning as much as possible in the whole lookahead area – uniform learning

24 Uniform learning

25 Uniform learning Search

26 Uniform learning Update

27 Uniform learning Search

28 Uniform learning Update

29 Uniform learning

30 Uniform learning

31 Uniform learning

32 Uniform learning

33 Pathology with uniform learning
Degree of pathology 1 2 3 4 ≥ 5 Problems % 40.9 20.2 22.1 12.3 4.2 0.3 Even more pathology than basic on-policy: 59.1% vs. 57.7% Is Hypothesis 2 wrong? Let us look at the volume of heuristic updates encountered per state generated during search This seems to be the best measure of the benefit of learning

34 Volume of updates encountered
Hypothesis 2 is correct after all

35 Hypothesis 3 On-policy: one search every d moves, so fewer searchs at larger depths Off-policy: one search every move

36 Hypothesis 3 The difference between depths in the amount of search is smaller on-policy than off-policy This makes the depths closer on-policy If they are closer, cases where a larger depth happens to be worse than a smaller depth are more common Test: search every move on-policy

37 Pathology when searching every move
Degree of pathology 1 2 3 4 ≥ 5 Problems % 86.9 9.0 3.3 0.6 0.2 0.0 Less pathology than basic on-policy: 13.1% vs. 57.7% Still more pathology than basic off-policy: 13.1% vs. 4.3% Hypothesis 3 is correct, the remaining pathology due to Hypotheses 1 and 2 Further test: number of states generated per move

38 States generated / move
Hypothesis 3 confirmed again

39 Summary of explanation
On-policy pathology caused by different lookahead depths being closer to each other in terms of the quality of decisions than the mere depths would suggest: due to the volume of heuristic updates ecnountered per state generated due to the number of states generated per move LRTS tends to visit pathological states with an above-average frequency

40 Thank you. Questions?


Download ppt "Lookahead pathology in real-time pathfinding"

Similar presentations


Ads by Google