double AlphaBeta(state, depth, alpha, beta) begin if depth <= 0 then return evaluation(state) //op pov for each action “a” possible from state nextstate.

double AlphaBeta(state, depth, alpha, beta) begin if depth <= 0 then return evaluation(state) //op pov for each action “a” possible from state nextstate = performAction(a, state) rval = -AlphaBeta(nextstate, depth-1, -beta, -alpha); if (rval >= beta) return rval; if (rval > alpha) alpha = rval; endfor return alpha; end

Meta-Reasoning for Search problem: one legal move only (or one clear favourite)  alpha-beta search will still generate (possibly large) search tree similar symmetrical situations idea: compute utility of expanding a node before expanding it meta-reasoning (reasoning about reasoning): reason about how to spend computing time

Where We are: Chess Technology Search depth Level of play minimax, evaluation function, cut-off test with quiescence search, large transposition table, speed: 1 million nodes/sec 5 plynovice as above, but alpha-beta search 10 plyexpert as above, plus additional pruning, database of openings and end games, supercomputer 14 ply grand master

Deep Blue algorithm: –iterative-deepening alpha-beta search, transposition table, databases incl. openings, grandmaster games (700000), endgames (all with 5 pieces, many with 6) hardware: –30 IBM RS/6000 processors software search: at high level –480 custom chess processors for hardware search: search deep in the tree, move generation and ordering, position evaluation (8000 features) average performance: –126 million nodes/sec., 30 billion position/move generated, search depth: 14 (but up to 40 plies)

Samuel’s Checkers Program (1952) learn an evaluation function by self-play (see: machine learning) beat its creator after several days of self- play hardware: IBM 704 –10kHz processor –10000 words of memory –magnetic tape for long-term storage

Chinook: Checkers World Champion simple alpha-beta search (running on PCs) database of 444 billion positions with eight or fewer pieces problem: Marion Tinsley –world checkers champion for over 40 years –lost three games in all this time 1990: Tinsley vs. Chinook: 20.5-18.5 –Chinook won two games! 1994: Tinsley retires (for health reasons)

Backgammon TD-GAMMON –search only to depth 2 or 3 –evaluation function machine learning techniques (see Samuel’s Checkers Program) neural network –performance ranked amongst top three players in the world program’s opinions have altered received wisdom

Go most popular board game in Asia 19x19 board: initial branching factor 361 –too much for search methods best programs: Goemate/Go4++ –pattern recognition techniques (rules) –limited search (locally) performance: 10 kyu (weak amateur)

A Dose of Reality: Chance unpredictability: –in real life: normal; often external events that are not predictable –in games: add random element, e.g. throwing dice, shuffling of cards games with an element of chance are less “toy problems”

Example: Backgammon move: roll pair of dice move pieces according to result

Search Trees with Chance Nodes problem: –MAX knows its own legal moves –MAX does not know MIN’s possible responses solution: introduce chance nodes –between all MIN and MAX nodes –with n children if there are n possible outcomes of the random element, each labelled with the result of the random element the probability of this outcome

Example: Search Tree for Backgammon 1/36 1-1 1/18 1-2 1/18 5-6 1/36 6-6 MAX MIN CHANCE move probability + outcome move probability + outcome

Optimal Decisions for Games with Chance Elements aim: pick move that leads to best position idea: calculate the expected value over all possible outcomes of the random element  expectiminimax value

Example: Simple Tree 0.9 0.1 22134314 2314 0.9 × 2 + 0.1 × 3 = 2.10.9 × 1 + 0.1 × 4 = 1.3 2.11.3

Complexity of Expectiminimax time complexity: O(b m n m ) –b: maximal number of possible moves –n: number of possible outcomes for the random element –m: maximal search depth example: backgammon –average b is around 20 (but can be up to 4000 for doubles) –n = 21 –about three ply depth is feasible

double AlphaBeta(state, depth, alpha, beta) begin if depth <= 0 then return evaluation(state) //op pov for each action “a” possible from state nextstate.

Similar presentations

Presentation on theme: "double AlphaBeta(state, depth, alpha, beta) begin if depth <= 0 then return evaluation(state) //op pov for each action “a” possible from state nextstate."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

double AlphaBeta(state, depth, alpha, beta) begin if depth <= 0 then return evaluation(state) //op pov for each action “a” possible from state nextstate.

Similar presentations

Presentation on theme: "double AlphaBeta(state, depth, alpha, beta) begin if depth <= 0 then return evaluation(state) //op pov for each action “a” possible from state nextstate."— Presentation transcript:

Similar presentations

About project

Feedback