Artificial Intelligence in Game Design Lecture 20: Hill Climbing and N-Grams.

Artificial Intelligence in Game Design Lecture 20: Hill Climbing and N-Grams

Hill Climbing Simple technique for learning optimal parameter values Character AI described in terms of configuration of parameter values V = (v 1, v 2, … v n ) –Example: Action probabilities for Oswald –V = (P left, P right, P defend ) Attack Left45% Attack Right30% Defend25% Oswald’s current V = (0.45, 0.30, 0.25)

Hill Climbing Each configuration of parameter values V = (v 1, v 2, … v n ) has error measure E(V ) –Often an estimate based on success of last action(s) Example: Total damage taken by Oswald – Total damage caused by Oswald’s last 3 actions Good enough for hill climbing Goal of learning: Find V such that E(V ) is minimized –Or at least “good enough” Attack Left35% Attack Right25% Defend40% Configuration with low error measure

Hill Climbing Hill climbing works best for –Single parameter –Correctness measure which is easy to compute Example: “cannon game” –Only parameter: Angle Ө of cannon –Error measure: Distance between target and actual landing point Error Ө

Error Space Graphical representation of relationship between parameter value and correctness Hill climbing = finding “lowest point” in this space Ө Error Optimal Ө Ө Error = 0 Maximum correctness

Hill Climbing Algorithm Assumption: If small change in one direction increases correctness Then will eventually reach optimal value if keep changing in that direction Ө Error Ө2Ө2 Ө3Ө3 Ө1Ө1 Direction of decreasing error Ө3Ө3 Ө2Ө2 Ө1Ө1

Hill Climbing Algorithm Estimate direction of slope in local area of error space –Must sample values near E(Ө) E(Ө + ε) E(Ө - ε) Move in direction of decreasing error –Increase/decrease Ө by some given step size δ –If E(Ө + ε) < E(Ө - ε) then Ө = Ө + δ –Else Ө = Ө – δ Ө Ө+εӨ+εӨ-εӨ-ε Ө + δ

Multidimensional Error Space Exploring multiple parameters simultaneously –Probabilities for Attack Left, Attack Right, Defend –Ability to control “powder charge” C for cannon as well as angle Ө Vary parameters slightly in all dimensions –E(Ө + ε, C + ε) –E(Ө + ε, C – ε) –E(Ө – ε, C + ε) –E(Ө – ε, C – ε) Choose combination with lowest error Ө 1 C 1 I need to increase both the angle and the charge

Multidimensional Error Space Can have too many parameters –n parameters = n dimensional error space –Will usually “wander” space, never finding good values If using learning keep problem simple –Few parameters (one or two best) –Make sure parameters have independent effect on error Increased charge, angle both increase distance Ө 1 C 1 I could also move up a hill, or check the wind direction…

Hill Climbing Step Size Choosing a good step size δ –Too small: learning takes too long –Too large: learning will “jump over” optimal value Ө2Ө2 Ө1Ө1 This guy is an idiot! Ө2Ө2 Ө1Ө1

Hill Climbing Step Size Adaptive Resolution –Keep track of previous error E (Ө T-1 ) If E (Ө T ) < E (Ө T-1 ) assume moving in correct direction –Increase step size to get there faster δ = δ + κ Ө2Ө2 Ө1Ө1 Ө3Ө3

Hill Climbing Step Size If E (Ө T ) > E (Ө T-1 ) assume overshot optimal value –Decrease step size to avoid overshooting on way back δ = δ × ρ, ρ < 1 –Idea: decrease step size fast Main goal: Make character actions plausible to player –Should make large changes if miss badly –Should make small changes if near target Ө1Ө1 Ө3Ө3 Ө2Ө2

Local Minima in Error Space Major assumption: Error space monotonically decreases as move towards goal Other factors may cause error to increase in local areas Ө2Ө2 Ө1Ө1 Ө1Ө1 Ө2Ө2 Appears to be worse than first shot!

Local Minima in Error Space Local minima in error space –Places where apparent error increases as get closer to optimum value –Simple hill climbing can get stuck Ө Error Optimal ӨLocal minima Hill climbing will not escape!

Local Minima in Error Space Solutions: Momentum term: –Current change based on previous changes as well as current error –Define momentum term α Proportion of previous change to current change α < 1 –Previous change: Δ Ө T-1 –Current change C based on error either δ or –δ –Current change Δ Ө T = α Δ Ө T-1 + (1 – α) C

Local Minima in Error Space “Speeds up” with multiple changes in same direction Will continue to go in same direction for several step even if error indicates change in other direction Idea: Momentum will “run through” local minima Ө Error Optimal ӨLocal minima Momentum builds down hill Momentum decreases, but still escapes local minimum

Local Minima in Error Space May need to restart with different initial value –Use randomness –Something very different from last starting point –Plausible behavior – if current actions not working, try something new Multiple shots with same result Very different result

Memory and Learning What if player moves? –Should not have to restart learning process –Should keep appearance that character is slowly improving aim Should be able to quickly adapt to changes in player strategy Ө3Ө3 Ө2Ө2 Ө1Ө1 Ө4Ө4

Memory and Learning Remember previous actions and effects –Store each angle Ө tried and resulting distance D(Ө) –If player moves to location L, start from Ө whose D(Ө) is closest to L Ө3Ө3 Ө2Ө2 Ө1Ө1 Closest to new player location is Ө 2 D(Ө2)D(Ө2)D(Ө3)D(Ө3)D(Ө1)D(Ө1) D(Ө2)D(Ө2)D(Ө3)D(Ө3)D(Ө1)D(Ө1)

Memory and Learning Best solution may be to cheat –Use physics formula Ө = f(location) –Incorporate “error” term E: Ө = f(location ± E) –Decrease error each term: E = E – ΔE Error will still decrease even as player moves –May want to increase error if player moves: E = E + movePenalty Looks more realistic Encourages player to keep moving

Appearance of Learning Requires accuracy in error measure –Otherwise, how do we know what direction to change parameters? Not always plausible to have –How do we know changing Oswald’s probabilities will improve performance? Goal sometimes not optimal performance – just appearance of adaptation Attack Left40% Attack Right60% Successful attack from right Attack Left35% Attack Right65% No guarantee this is best in long run, but at least Oswald looks like he is learning!

Predicting Player Actions Type of games where works best: Player has choice between few possible actions –Attack left –Attack right Character can take simultaneous counteraction –Each player action has “correct” counteraction Attack left  defend left Attack right  defend right Goal: Character should learn to “anticipate” current player action based on past actions

Probabilistic Approach Keep track of last n actions taken by player –Size of n is “window” of character memory Compute probability of next player action based on these –Base decision about counteraction on that probability Example: Last 10 player actions L L R L R L L L R L Estimated probabilities of next action: Left attack70% Right attack30%

Probabilistic Actions Simple majority approach –Since left attack has highest probability, defend left –Problem: Character will take same action for long periods Too predictable! –Example: Player’s next 4 attacks are from right Character will still choose defend left since left attacks still have highest probability Character looks very stupid! player: L L R L R L L L R L R R R R character: L L L L Left attack still majority player action in last 10 moves

Probabilistic Actions Probabilistic Approach: Choose action with same probability as corresponding player action Biased towards recent player actions Far less predictable –Player may notice that character is changing tactics without being aware of how this is done Left attack70% Right attack30% Left defend70% Right defend30%

Window Size Key question: What is good value for n? L L R L R L L L R L R R L Can be too small (n = 2, for example ) L L R L R L L L R L R R L Can be too large (n = 20, for example ) LLLLLLLLLLLLRRRRRRRR No best solution –Will need to experiment Size of “window” used to determine probabilities Character has no “memory” of past player actions Left defend50% Right defend50% Too slow to react to changes in player tactics Left defend60% Right defend40%

N-Grams Conditional probabilities based on sequence of user actions Example: –“Last two player attacks were left and then right” –“What has player done next after last times the attacked left the right?” After last 4 left then right attacks: –Attacked right 3 times –Attacked left 1 time Conclusion: Player has 75% chance of attacking right next

N-Grams Example Example: Window of memory = last 12 actions Base decision on last two actions taken by player (past sequence length = 2) Goal: Determine what action player is likely to take next given last two actions L R –Previous actions: L L R L R R L L R L L R ? –Previous cases of L R: Followed by L once Followed by R twice Left attack33% Right attack67%

N-Grams and Sequence Size Number of statistics to keep grow exponentially with length of past sequences –Number of possible actions = a –Past sequences length = L –Number of possible configurations of past sequences = a L L must be small (no more than 2 or 3) –Too many statistics to keep track of –Very few instances of each case How many cases of LLRLLRR will there be? All statistics will be unreliable

Storing N-Grams Algorithm: –Keep statistics for all possible action strings of length L Based on “window” of past actions Example: L L L L R R L L R L L R Previous action string Instances of next action Probability next action is L L L L R R R40%60% L R 50% R LL 100%0% R L100%0%

N-Grams and Updating When player takes new action –Add instance for that action and update statistics –Remove oldest action from list and update statistics Move “window” Example: L L L L R R L L R L L R L New action Now outside “window” Previous action string Instances of next action Probability next action is L L L L R R R25%75% L RL R L67%33% R LL 100%0% R L100%0%

Artificial Intelligence in Game Design Lecture 20: Hill Climbing and N-Grams.

Similar presentations

Presentation on theme: "Artificial Intelligence in Game Design Lecture 20: Hill Climbing and N-Grams."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Artificial Intelligence in Game Design Lecture 20: Hill Climbing and N-Grams.

Similar presentations

Presentation on theme: "Artificial Intelligence in Game Design Lecture 20: Hill Climbing and N-Grams."— Presentation transcript:

Similar presentations

About project

Feedback