Download presentation
Presentation is loading. Please wait.
Published byBeverly Floyd Modified over 6 years ago
1
Chapter 6 : Game Search 게임 탐색 (Adversarial Search)
2
게임 탐색의 특성 - Exhaustive search is almost impossible.
==> mostly too many branching factor and too many depths. (e.g., 바둑: (18 * 18)! ), 체스:DeepBlue ? - 정적평가점수(Static evaluation score) ==> board quality - maximizing player ==> hoping to win (me) minimizing player ==> hoping to lose (enemy) - Game tree ==> is a semantic tree with node (board configuration) and branch (moves). original board state new board state new board state
3
Minimax Game Search Idea: take maximum score at maximizing level (my turn). take minimum score at minimizing level (your turn). maximizing level 2 7 1 8 2 7 1 8 2 1 minimizing level maximizing level 상대는? 나는? 2 2 7 1 8 “this move gurantees best”
4
최소최대 탐색 예 평가함수 값을 최대화 시키려는 최대화자 A의 탐색 최소화자 B의 의도를 고려한 A의 선택 [c1] [c2]
f=0.8 [c2] f=0.3 [c3] f=-0.2 A 최소화자(B) 단계 [c1] f=0.8 [c2] f=0.3 [c3] f=-0.2 [c11] f=0.9 [c12] f=0.1 [c13] f=-0.6 [c21] f=0.1 [c22] f=-0.7 [c31] f=-0.1 [c32] f=-0.3
5
Minimax Algorithm Function MINIMAX( state ) returns an action
inputs: state, current state in game v = MAX-VALUE (state) return the action corresponding with value v Function MAX-VALUE( state ) returns a utility value if TERMINAL-TEST( state ) then return UTILITY( state ) v = - for a, s in SUCCESSORS( state ) do v = MAX( v, MIN-VALUE( s ) ) return v Function MIN-VALUE( state ) returns a utility value v = v = MIN( v, MAX-VALUE( s ) )
6
Minimax Example MAX node MIN node 3 12 8 2 4 6 14 5 2 The nodes are “MAX nodes” The nodes are “MIN nodes”
7
Minimax Example MAX node MIN node 3 12 8 2 4 6 14 5 2 The nodes are “MAX nodes” The nodes are “MIN nodes”
8
Minimax Example MAX node MIN node 3 2 2 3 12 8 2 4 6 14 5 2 The nodes are “MAX nodes” The nodes are “MIN nodes”
9
Minimax Example MAX node 3 MIN node 3 2 2 3 12 8 2 4 6 14 5 2 The nodes are “MAX nodes” The nodes are “MIN nodes”
10
Tic-Tac-Toe Tic-tac-toe, also called noughts and crosses (in the British Commonwealth countries) and X's and O's in the Republic of Ireland, is a pencil-and-paper game for two players, X and O, who take turns marking the spaces in a 3×3 grid. The X player usually goes first. The player who succeeds in placing three respective marks in a horizontal, vertical, or diagonal row wins the game. The following example game is won by the first player, X:
13
Save time
14
Game tree (2-player) How do we search this tree to find the optimal move? 14
15
Applying MiniMax to tic-tac-toe
The static heuristic evaluation function
17
a-b Pruning Idea: 탐색 공간을 줄인다! (mini-max 지수적으로 폭발)
a-b principle: “if you have an idea that is surely bad, do not take time to see how truly bad it is.” =2 >=2 Max >=2 a-cut Min =2 =2 <=1 2 7 2 7 2 7 1
18
최대화 노드에서 가능한 최소의 값(알파 )과 최소화의 노드에서 가능한 최대의 값(베타 )를 사용한 게임 탐색법
알파베타 가지치기 최대화 노드에서 가능한 최소의 값(알파 )과 최소화의 노드에서 가능한 최대의 값(베타 )를 사용한 게임 탐색법 기본적으로 DFS로 탐색 진행 [c0] =0.2 [c1] f=0.2 [c2] f= -0.1 a-cut [c11] f=0.2 [c12] f=0.7 [c21] f= -0.1 [c22] [c23] C21의 평가값 -0.1이 C2에 올려지면 나머지 노드들(C22, C23)을 더 이상 탐색할 필요가 없음
19
Tic-Tac-Toe Example with Alpha-Beta Pruning
Backup Values
20
a-b Procedure a never decrease (initially - infinite) -∞
b never increase (initially infinite) +∞ - Search rule: 1. a-cutoff ==> cut when below any minimizing node that have a b <= a (ancestor). 2, b-cutoff ==> cut when below any maximizing node that have a a >= b (ancestor).
21
Example max a-cut min max b-cut min 90 89 100 99 60 59 75 74
22
Alpha-Beta Pruning Algorithm
Function ALPHA-BETA( state ) returns an action inputs: state, current state in game v = MAX-VALUE (state, -, +) return the action corresponding with value v Function MAX-VALUE( state, , ) returns a utility value , the value of the best alternative for MAX along the path to state , the value of the best alternative for MIN along the path to state if TERMINAL-TEST( state ) then return UTILITY( state ) v = - for a, s in SUCCESSORS( state ) do v = MAX( v, MIN-VALUE( s, , ) ) if v >= then return v = MAX(, v ) return v
23
Alpha-Beta Pruning Algorithm
Function MIN-VALUE( state, , ) returns a utility value inputs: state, current state in game , the value of the best alternative for MAX along the path to state , the value of the best alternative for MIN along the path to state if TERMINAL-TEST( state ) then return UTILITY( state ) v = + for a, s in SUCCESSORS( state ) do v = MIN( v, MAX-VALUE( s, , ) ) if v <= then return v = MIN( , v ) return v
24
Alpha-Beta Pruning Example
, , initial values =− =+ , , passed to kids =− =+ The nodes are “MAX nodes” The nodes are “MIN nodes”
25
Alpha-Beta Pruning Example
=− =+ =− =3 MIN updates , based on kids 3 The nodes are “MAX nodes” The nodes are “MIN nodes” 25 25
26
Alpha-Beta Pruning Example
[-, + ] [-, 3] 3 The nodes are “MAX nodes” The nodes are “MIN nodes”
27
Alpha-Beta Pruning Example
[-, + ] [-, 3] =− =3 MIN updates , based on kids. No change. 3 12 The nodes are “MAX nodes” The nodes are “MIN nodes”
28
Alpha-Beta Pruning Example
[3, +] MAX updates , based on kids. [-, 3] 3 is returned as node value. 3 3 12 8 The nodes are “MAX nodes” The nodes are “MIN nodes”
29
Alpha-Beta Pruning Example
[3, +] [-, 3] 3 12 8 The nodes are “MAX nodes” The nodes are “MIN nodes”
30
Alpha-Beta Pruning Example
[3, +] , , passed to kids [-, 3] [3,+] =3 =+ 3 12 8 The nodes are “MAX nodes” The nodes are “MIN nodes”
31
Alpha-Beta Pruning Example
[3, +] MIN updates , based on kids. [-, 3] [3, 2] =3 =2 X X ≥ , so prune. 3 12 8 2 The nodes are “MAX nodes” The nodes are “MIN nodes” 31 31
32
Alpha-Beta Pruning Example
MAX updates , based on kids. No change. [3, +] 2 is returned as node value. [-, 3] [3, 2] X X 3 12 8 2 The nodes are “MAX nodes” The nodes are “MIN nodes” 32 32
33
Alpha-Beta Pruning Example
[3,+] , , passed to kids [-, 3] [3, 2] =3 =+ X X 3 12 8 2 The nodes are “MAX nodes” The nodes are “MIN nodes”
34
Alpha-Beta Pruning Example
[3, +] MIN updates , based on kids. [-, 3] [3, 2] [3, 14] =3 =14 X X 3 12 8 2 14 The nodes are “MAX nodes” The nodes are “MIN nodes” 34 34
35
Alpha-Beta Pruning Example
[3, +] MIN updates , based on kids. [-, 3] [3, 2] [3, 5] =3 =5 X X 3 12 8 2 14 5 The nodes are “MAX nodes” The nodes are “MIN nodes”
36
Alpha-Beta Pruning Example
[3, +] MIN updates , based on kids. [-, 3] [3, 2] [3, 2] =3 =2 X X 3 12 8 2 14 5 2 The nodes are “MAX nodes” The nodes are “MIN nodes”
37
Alpha-Beta Pruning Example
[3, +] 2 is returned as node value. [-, 3] [3, 2] [3, 2] X X 3 12 8 2 14 5 2 The nodes are “MAX nodes” The nodes are “MIN nodes”
38
Alpha-Beta Pruning Example
MAX updates , based on kids. No change. [3, +] [-, 3] [3, 2] [3, 2] X X 3 12 8 2 14 5 2 The nodes are “MAX nodes” The nodes are “MIN nodes” 38 38
40
Example -which nodes can be pruned? 3 4 1 2 7 8 5 6 Max Min 40
41
Answer to Example 3 4 1 2 7 8 5 6 41 -which nodes can be pruned? Max
Min Answer: NONE! Because the most favorable nodes for both are explored last (i.e., in the diagram, are on the right-hand side). 41
42
Second Example (the exact mirror image of the first example)
-which nodes can be pruned? 4 3 6 5 8 7 2 1 42
43
Answer to Second Example (the exact mirror image of the first example)
-which nodes can be pruned? 6 5 8 7 2 1 3 4 Min Max Answer: LOTS! Because the most favorable nodes for both are explored first (i.e., in the diagram, are on the left-hand side). 43
46
점진적 심화방법 Time limits unlikely to find goal, must approximate
47
Iterative (Progressive) Deepening :점진적 심화방법
In real games, there is usually a time limit T on making a move How do we take this into account? using alpha-beta we cannot use “partial” results with any confidence unless the full breadth of the tree has been searched So, we could be conservative and set a conservative depth-limit which guarantees that we will find a move in time < T disadvantage is that we may finish early, could do more search In practice, iterative deepening search (IDS) is used IDS runs depth-first search with an increasing depth-limit when the clock runs out we use the solution found at the previous depth limit
48
점진적 심화방법
49
Iterative deepening search l =0
49
50
Iterative deepening search l =1
50
51
Iterative deepening search l =2
51
52
Iterative deepening search l =3
52
54
Heuristic Continuation: fight horizon effect
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.