Chapter 6 : Game Search 게임 탐색 (Adversarial Search)

게임 탐색의 특성 - Exhaustive search is almost impossible.
==> mostly too many branching factor and too many depths. (e.g., 바둑: (18 * 18)! ), 체스:DeepBlue ? - 정적평가점수(Static evaluation score) ==> board quality - maximizing player ==> hoping to win (me) minimizing player ==> hoping to lose (enemy) - Game tree ==> is a semantic tree with node (board configuration) and branch (moves). original board state new board state new board state

Minimax Game Search Idea: take maximum score at maximizing level (my turn). take minimum score at minimizing level (your turn). maximizing level 2 7 1 8 2 7 1 8 2 1 minimizing level maximizing level 상대는? 나는? 2 2 7 1 8 “this move gurantees best”

최소최대 탐색 예 평가함수 값을 최대화 시키려는 최대화자 A의 탐색 최소화자 B의 의도를 고려한 A의 선택 [c1] [c2]
f=0.8 [c2] f=0.3 [c3] f=-0.2 A 최소화자(B) 단계 [c1] f=0.8 [c2] f=0.3 [c3] f=-0.2 [c11] f=0.9 [c12] f=0.1 [c13] f=-0.6 [c21] f=0.1 [c22] f=-0.7 [c31] f=-0.1 [c32] f=-0.3

Minimax Algorithm Function MINIMAX( state ) returns an action
inputs: state, current state in game v = MAX-VALUE (state) return the action corresponding with value v Function MAX-VALUE( state ) returns a utility value if TERMINAL-TEST( state ) then return UTILITY( state ) v = - for a, s in SUCCESSORS( state ) do v = MAX( v, MIN-VALUE( s ) ) return v Function MIN-VALUE( state ) returns a utility value v =  v = MIN( v, MAX-VALUE( s ) )

Minimax Example MAX node MIN node 3 12 8 2 4 6 14 5 2 The nodes are “MAX nodes” The nodes are “MIN nodes”

Minimax Example MAX node MIN node 3 2 2 3 12 8 2 4 6 14 5 2 The nodes are “MAX nodes” The nodes are “MIN nodes”

Minimax Example MAX node 3 MIN node 3 2 2 3 12 8 2 4 6 14 5 2 The nodes are “MAX nodes” The nodes are “MIN nodes”

Tic-Tac-Toe Tic-tac-toe, also called noughts and crosses (in the British Commonwealth countries) and X's and O's in the Republic of Ireland, is a pencil-and-paper game for two players, X and O, who take turns marking the spaces in a 3×3 grid. The X player usually goes first. The player who succeeds in placing three respective marks in a horizontal, vertical, or diagonal row wins the game. The following example game is won by the first player, X:

Save time

Game tree (2-player) How do we search this tree to find the optimal move? 14

Applying MiniMax to tic-tac-toe
The static heuristic evaluation function

a-b Pruning Idea: 탐색 공간을 줄인다! (mini-max  지수적으로 폭발)
a-b principle: “if you have an idea that is surely bad, do not take time to see how truly bad it is.” =2 >=2 Max >=2 a-cut Min =2 =2 <=1 2 7 2 7 2 7 1

최대화 노드에서 가능한 최소의 값(알파 )과 최소화의 노드에서 가능한 최대의 값(베타 )를 사용한 게임 탐색법
알파베타 가지치기 최대화 노드에서 가능한 최소의 값(알파 )과 최소화의 노드에서 가능한 최대의 값(베타 )를 사용한 게임 탐색법 기본적으로 DFS로 탐색 진행 [c0] =0.2 [c1] f=0.2 [c2] f= -0.1 a-cut [c11] f=0.2 [c12] f=0.7 [c21] f= -0.1 [c22] [c23] C21의 평가값 -0.1이 C2에 올려지면 나머지 노드들(C22, C23)을 더 이상 탐색할 필요가 없음

Tic-Tac-Toe Example with Alpha-Beta Pruning
Backup Values

a-b Procedure a never decrease (initially - infinite) -∞
b never increase (initially infinite) +∞ - Search rule: 1. a-cutoff ==> cut when below any minimizing node that have a b <= a (ancestor). 2, b-cutoff ==> cut when below any maximizing node that have a a >= b (ancestor).

Example max a-cut min max b-cut min 90 89 100 99 60 59 75 74

Alpha-Beta Pruning Algorithm
Function ALPHA-BETA( state ) returns an action inputs: state, current state in game v = MAX-VALUE (state, -, +) return the action corresponding with value v Function MAX-VALUE( state, ,  ) returns a utility value , the value of the best alternative for MAX along the path to state , the value of the best alternative for MIN along the path to state if TERMINAL-TEST( state ) then return UTILITY( state ) v = - for a, s in SUCCESSORS( state ) do v = MAX( v, MIN-VALUE( s, ,  ) ) if v >=  then return v  = MAX(, v ) return v

Alpha-Beta Pruning Algorithm
Function MIN-VALUE( state, ,  ) returns a utility value inputs: state, current state in game , the value of the best alternative for MAX along the path to state , the value of the best alternative for MIN along the path to state if TERMINAL-TEST( state ) then return UTILITY( state ) v = + for a, s in SUCCESSORS( state ) do v = MIN( v, MAX-VALUE( s, ,  ) ) if v <=  then return v  = MIN( , v ) return v

Alpha-Beta Pruning Example
, , initial values =−  =+ , , passed to kids =−  =+ The nodes are “MAX nodes” The nodes are “MIN nodes”

=−  =+ =−  =3 MIN updates , based on kids 3 The nodes are “MAX nodes” The nodes are “MIN nodes” 25 25

[-, + ] [-, 3] 3 The nodes are “MAX nodes” The nodes are “MIN nodes”

[-, + ] [-, 3] =−  =3 MIN updates , based on kids. No change. 3 12 The nodes are “MAX nodes” The nodes are “MIN nodes”

[3, +] MAX updates , based on kids. [-, 3] 3 is returned as node value. 3 3 12 8 The nodes are “MAX nodes” The nodes are “MIN nodes”

[3, +] [-, 3] 3 12 8 The nodes are “MAX nodes” The nodes are “MIN nodes”

[3, +] , , passed to kids [-, 3] [3,+] =3  =+ 3 12 8 The nodes are “MAX nodes” The nodes are “MIN nodes”

[3, +] MIN updates , based on kids. [-, 3] [3, 2] =3  =2 X X  ≥ , so prune. 3 12 8 2 The nodes are “MAX nodes” The nodes are “MIN nodes” 31 31

MAX updates , based on kids. No change. [3, +] 2 is returned as node value. [-, 3] [3, 2] X X 3 12 8 2 The nodes are “MAX nodes” The nodes are “MIN nodes” 32 32

[3,+] , , passed to kids [-, 3] [3, 2] =3  =+ X X 3 12 8 2 The nodes are “MAX nodes” The nodes are “MIN nodes”

[3, +] MIN updates , based on kids. [-, 3] [3, 2] [3, 14] =3  =14 X X 3 12 8 2 14 The nodes are “MAX nodes” The nodes are “MIN nodes” 34 34

[3, +] MIN updates , based on kids. [-, 3] [3, 2] [3, 5] =3  =5 X X 3 12 8 2 14 5 The nodes are “MAX nodes” The nodes are “MIN nodes”

[3, +] MIN updates , based on kids. [-, 3] [3, 2] [3, 2] =3  =2 X X 3 12 8 2 14 5 2 The nodes are “MAX nodes” The nodes are “MIN nodes”

[3, +] 2 is returned as node value. [-, 3] [3, 2] [3, 2] X X 3 12 8 2 14 5 2 The nodes are “MAX nodes” The nodes are “MIN nodes”

MAX updates , based on kids. No change. [3, +] [-, 3] [3, 2] [3, 2] X X 3 12 8 2 14 5 2 The nodes are “MAX nodes” The nodes are “MIN nodes” 38 38

Example -which nodes can be pruned? 3 4 1 2 7 8 5 6 Max Min 40

Answer to Example 3 4 1 2 7 8 5 6 41 -which nodes can be pruned? Max
Min Answer: NONE! Because the most favorable nodes for both are explored last (i.e., in the diagram, are on the right-hand side). 41

Second Example (the exact mirror image of the first example)
-which nodes can be pruned? 4 3 6 5 8 7 2 1 42

Answer to Second Example (the exact mirror image of the first example)
-which nodes can be pruned? 6 5 8 7 2 1 3 4 Min Max Answer: LOTS! Because the most favorable nodes for both are explored first (i.e., in the diagram, are on the left-hand side). 43

점진적 심화방법 Time limits  unlikely to find goal, must approximate

Iterative (Progressive) Deepening :점진적 심화방법
In real games, there is usually a time limit T on making a move How do we take this into account? using alpha-beta we cannot use “partial” results with any confidence unless the full breadth of the tree has been searched So, we could be conservative and set a conservative depth-limit which guarantees that we will find a move in time < T disadvantage is that we may finish early, could do more search In practice, iterative deepening search (IDS) is used IDS runs depth-first search with an increasing depth-limit when the clock runs out we use the solution found at the previous depth limit

점진적 심화방법

Iterative deepening search l =0
49

50

51

52

Heuristic Continuation: fight horizon effect

Chapter 6 : Game Search 게임 탐색 (Adversarial Search)

Similar presentations

Presentation on theme: "Chapter 6 : Game Search 게임 탐색 (Adversarial Search)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 6 : Game Search 게임 탐색 (Adversarial Search)

Similar presentations

Presentation on theme: "Chapter 6 : Game Search 게임 탐색 (Adversarial Search)"— Presentation transcript:

Similar presentations

About project

Feedback