Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 416 Artificial Intelligence Lecture 8 Adversarial Search Chapter 6 Lecture 8 Adversarial Search Chapter 6.

Similar presentations


Presentation on theme: "CS 416 Artificial Intelligence Lecture 8 Adversarial Search Chapter 6 Lecture 8 Adversarial Search Chapter 6."— Presentation transcript:

1

2 CS 416 Artificial Intelligence Lecture 8 Adversarial Search Chapter 6 Lecture 8 Adversarial Search Chapter 6

3 TA Office Hours Chris White Office Hours Olsson 238Olsson 238 Tuesday after classTuesday after class Friday 11:30 – 1:00Friday 11:30 – 1:00 Chris White Office Hours Olsson 238Olsson 238 Tuesday after classTuesday after class Friday 11:30 – 1:00Friday 11:30 – 1:00

4 Buddha on the Brain (Wired Issue 14.02) The neuroscience of meditation Tibetan monk with more than 10,000 hours of meditation timeTibetan monk with more than 10,000 hours of meditation time 30x greater gamma waves (focused thought)30x greater gamma waves (focused thought) Larger active area of prefrontal cortexLarger active area of prefrontal cortex The neuroscience of meditation Tibetan monk with more than 10,000 hours of meditation timeTibetan monk with more than 10,000 hours of meditation time 30x greater gamma waves (focused thought)30x greater gamma waves (focused thought) Larger active area of prefrontal cortexLarger active area of prefrontal cortex Think “kindness and compassion” EEG

5 “The world’s fastest runners stopped racing cars years ago” 2003 Kasparov v. Deep Junior ends in a 3-3 Draw

6 Chess Article Garry Kasparov reflects on computerized chess IBM should have released the contents of Deep Blue to chess community to advance research of computation as it relates to chessIBM should have released the contents of Deep Blue to chess community to advance research of computation as it relates to chess Kudos to Deep Junior for putting information in public domain so state of the art can advanceKudos to Deep Junior for putting information in public domain so state of the art can advance Deep Blue made one good move that surprised Kasparov (though he thinks a person was in the loop)Deep Blue made one good move that surprised Kasparov (though he thinks a person was in the loop) Deep Junior made a fantastic sacrifice that reflects a new accomplishment for computerized chessDeep Junior made a fantastic sacrifice that reflects a new accomplishment for computerized chesshttp://www.opinionjournal.com/extra/?id=110003081 Garry Kasparov reflects on computerized chess IBM should have released the contents of Deep Blue to chess community to advance research of computation as it relates to chessIBM should have released the contents of Deep Blue to chess community to advance research of computation as it relates to chess Kudos to Deep Junior for putting information in public domain so state of the art can advanceKudos to Deep Junior for putting information in public domain so state of the art can advance Deep Blue made one good move that surprised Kasparov (though he thinks a person was in the loop)Deep Blue made one good move that surprised Kasparov (though he thinks a person was in the loop) Deep Junior made a fantastic sacrifice that reflects a new accomplishment for computerized chessDeep Junior made a fantastic sacrifice that reflects a new accomplishment for computerized chesshttp://www.opinionjournal.com/extra/?id=110003081

7 Where we’ve been Search Find optimal sequence of actionsFind optimal sequence of actions –Tree searching Find optimal input to a functionFind optimal input to a function –Simulated annealing –Genetic algorithms –Gradient descent Search Find optimal sequence of actionsFind optimal sequence of actions –Tree searching Find optimal input to a functionFind optimal input to a function –Simulated annealing –Genetic algorithms –Gradient descent

8 Adversarial Search Problems involving Multiple agentsMultiple agents Competitive environmentsCompetitive environments Agents have conflicting goalsAgents have conflicting goals Also called games Problems involving Multiple agentsMultiple agents Competitive environmentsCompetitive environments Agents have conflicting goalsAgents have conflicting goals Also called games

9 Since the dawn of time? Oldest known written fair-division problem Talmud – Jewish Oral Law dating to first century A Bankruptcy CaseA Bankruptcy Case –A man married three wives and in each marriage contract he promised each of them different amounts of money upon his death:  one of them gets $100  another gets $200  the third gets $300 –When he died, he had fewer than $600 What do you do?What do you do? Oldest known written fair-division problem Talmud – Jewish Oral Law dating to first century A Bankruptcy CaseA Bankruptcy Case –A man married three wives and in each marriage contract he promised each of them different amounts of money upon his death:  one of them gets $100  another gets $200  the third gets $300 –When he died, he had fewer than $600 What do you do?What do you do?

10 Bankruptcy law Modern bankruptcy provides shares of the estate proportional to their individual claims, no matter what size of the estateModern bankruptcy provides shares of the estate proportional to their individual claims, no matter what size of the estate –A receives 100/600 * estate_holdings –B receives 200/600 * estate_holdings –C receives 300/600 * estate_holdings Modern bankruptcy provides shares of the estate proportional to their individual claims, no matter what size of the estateModern bankruptcy provides shares of the estate proportional to their individual claims, no matter what size of the estate –A receives 100/600 * estate_holdings –B receives 200/600 * estate_holdings –C receives 300/600 * estate_holdings

11 Bankruptcy law Rabbi Nathan in Mishnah section of Talmud This allocation not understood until recently Rabbi Nathan in Mishnah section of Talmud This allocation not understood until recently Estate→ Claims↓ 100200300 10033.35050 20033.375100 30033.375150

12 Unexplained until 1984 Aumann and Maschler (Israeli Mathematicians) Realistically, when you die, people could come out of the woodwork saying you owe them money. Some could coalesce into deceptive groups. How can we reduce the incentives (rewards) of forming such groups?Realistically, when you die, people could come out of the woodwork saying you owe them money. Some could coalesce into deceptive groups. How can we reduce the incentives (rewards) of forming such groups? Minimize largest dissatisfaction among all possible coalitionsMinimize largest dissatisfaction among all possible coalitions A common fair-division problemA common fair-division problem –http://www.math.gatech.edu/~hill/publications/cv.dir/mad evice.pdf Aumann and Maschler (Israeli Mathematicians) Realistically, when you die, people could come out of the woodwork saying you owe them money. Some could coalesce into deceptive groups. How can we reduce the incentives (rewards) of forming such groups?Realistically, when you die, people could come out of the woodwork saying you owe them money. Some could coalesce into deceptive groups. How can we reduce the incentives (rewards) of forming such groups? Minimize largest dissatisfaction among all possible coalitionsMinimize largest dissatisfaction among all possible coalitions A common fair-division problemA common fair-division problem –http://www.math.gatech.edu/~hill/publications/cv.dir/mad evice.pdf

13 Garment Principle Two people claim a garment worth $100 One claims the entire garment belongs to himOne claims the entire garment belongs to him The other claims half the garment is hisThe other claims half the garment is his The one claiming the full garment gets $75 The one claiming half gets $25 Why? Two people claim a garment worth $100 One claims the entire garment belongs to himOne claims the entire garment belongs to him The other claims half the garment is hisThe other claims half the garment is his The one claiming the full garment gets $75 The one claiming half gets $25 Why?

14 Minimizing maximum dissatisfaction The one who wants the entire garment cedes nothing to the other and thus wants $100.The one who wants the entire garment cedes nothing to the other and thus wants $100. The one who wants half the garment would be perfectly happy to cede $50 to the other.The one who wants half the garment would be perfectly happy to cede $50 to the other. –But a split of 50/50 would make one person unhappy and the other perfectly happy  How to make them equally unhappy? The one who wants the entire garment cedes nothing to the other and thus wants $100.The one who wants the entire garment cedes nothing to the other and thus wants $100. The one who wants half the garment would be perfectly happy to cede $50 to the other.The one who wants half the garment would be perfectly happy to cede $50 to the other. –But a split of 50/50 would make one person unhappy and the other perfectly happy  How to make them equally unhappy?

15 A $100 Garment Person 1 Person 2 Requested Amount 10050 Ceded from competitor 500 Split what remains 2525 Sum of ceded and split 7525

16 Game Theory Studied by mathematicians, economists, finance In this part of AI we limit games to: deterministicdeterministic turn-takingturn-taking two-playertwo-player zero-sumzero-sum perfect informationperfect information Studied by mathematicians, economists, finance In this part of AI we limit games to: deterministicdeterministic turn-takingturn-taking two-playertwo-player zero-sumzero-sum perfect informationperfect information

17 Games “Shall we play a game?” Let’s play tic-tac-toe “Shall we play a game?” Let’s play tic-tac-toe

18 Tic-Tac-Toe game tree MAX’s first move MIN’s first move Each layer is a ply

19 What data do we need to play? Initial State How does the game start?How does the game start? Successor Function A list of legal (move, state) pairs for each stateA list of legal (move, state) pairs for each state Terminal Test Determines when game is overDetermines when game is over Utility Function Provides numeric value for all terminal statesProvides numeric value for all terminal states Initial State How does the game start?How does the game start? Successor Function A list of legal (move, state) pairs for each stateA list of legal (move, state) pairs for each state Terminal Test Determines when game is overDetermines when game is over Utility Function Provides numeric value for all terminal statesProvides numeric value for all terminal states

20 Minimax strategy Optimal Strategy Leads to outcomes at least as good as any other strategy when playing an infallible opponentLeads to outcomes at least as good as any other strategy when playing an infallible opponent Pick the option that minimizes the maximum damage your opponent can doPick the option that minimizes the maximum damage your opponent can do –minimize the worst-case outcome –because your skillful opponent will certainly find the most damaging move Optimal Strategy Leads to outcomes at least as good as any other strategy when playing an infallible opponentLeads to outcomes at least as good as any other strategy when playing an infallible opponent Pick the option that minimizes the maximum damage your opponent can doPick the option that minimizes the maximum damage your opponent can do –minimize the worst-case outcome –because your skillful opponent will certainly find the most damaging move

21 Minimax Algorithm MinimaxValue(n) =MinimaxValue(n) = Utility (n) if n is a terminal state max MinimaxValue(s) of all successors, s if n is a MAX node min MinimaxValue(s) of all successors, s if n is a MIN node Algorithm MinimaxValue(n) =MinimaxValue(n) = Utility (n) if n is a terminal state max MinimaxValue(s) of all successors, s if n is a MAX node min MinimaxValue(s) of all successors, s if n is a MIN node This is optimal strategy assuming both players play optimally from there until end of game

22 A two-ply example MIN considers minimizing how much it loses…

23 MAX considers maximizing how much it wins… A two-ply example

24 Minimax Algorithm We wish to identify minimax decision at the root Recursive evaluation of all nodes in game treeRecursive evaluation of all nodes in game tree Time complexity = O (b m )Time complexity = O (b m ) We wish to identify minimax decision at the root Recursive evaluation of all nodes in game treeRecursive evaluation of all nodes in game tree Time complexity = O (b m )Time complexity = O (b m )

25 Feasibility of minimax? How about a nice game of chess? Avg branching = 35 and avg # moves = 50 for each playerAvg branching = 35 and avg # moves = 50 for each player –O(35 100 ) time complexity = 10 154 nodes  10 40 distinct nodes Minimax is impractical if directly applied to chess How about a nice game of chess? Avg branching = 35 and avg # moves = 50 for each playerAvg branching = 35 and avg # moves = 50 for each player –O(35 100 ) time complexity = 10 154 nodes  10 40 distinct nodes Minimax is impractical if directly applied to chess 10 81 atoms in universe!

26 Pruning minimax tree Are there times when you know you need not explore a particular move? When the move is poor?When the move is poor? Poor compared to what?Poor compared to what? Poor compared to what you have explored so farPoor compared to what you have explored so far Are there times when you know you need not explore a particular move? When the move is poor?When the move is poor? Poor compared to what?Poor compared to what? Poor compared to what you have explored so farPoor compared to what you have explored so far

27 Alpha-beta pruning   –the value of the best (highest) choice so far in search of MAX   –the value of the best (lowest) choice so far in search of MIN Order of considering successors mattersOrder of considering successors matters –If possible, consider best successors first   –the value of the best (highest) choice so far in search of MAX   –the value of the best (lowest) choice so far in search of MIN Order of considering successors mattersOrder of considering successors matters –If possible, consider best successors first

28 Notation on tree Min wants to minimize  Max wants to maximize 

29 Alpha-beta pruning MIN knows it will lose at most 3. MAX worries that –inf is still possible MAX knows that 3 is worst case for this node. MAX knows that it can accomplish a score of at least 3. Discovery could find a higher value MIN knows player MAX has an option of going to node B with a min payoff of 3. MAX will never take action C and culling is possible.

30 Alpha-beta pruning Without pruningWithout pruning –O(b d ) nodes to explore With a good heuristic pruner (consider part (f) of figure)With a good heuristic pruner (consider part (f) of figure) – O(b d/2 )  Chess can drop from O(35 100 ) to O(6 100 ) With a random heuristic (you don’t try the best thing first)With a random heuristic (you don’t try the best thing first) –O(b 3d/4 ) Without pruningWithout pruning –O(b d ) nodes to explore With a good heuristic pruner (consider part (f) of figure)With a good heuristic pruner (consider part (f) of figure) – O(b d/2 )  Chess can drop from O(35 100 ) to O(6 100 ) With a random heuristic (you don’t try the best thing first)With a random heuristic (you don’t try the best thing first) –O(b 3d/4 )

31 Real-time decisions What if you don’t have enough time to explore entire search tree? We cannot search all the way down to terminal state for all decision sequencesWe cannot search all the way down to terminal state for all decision sequences Use a heuristic to approximate (guess) eventual terminal stateUse a heuristic to approximate (guess) eventual terminal state Replace non-terminal states with output of heuristic and treat as if they were terminalReplace non-terminal states with output of heuristic and treat as if they were terminal What if you don’t have enough time to explore entire search tree? We cannot search all the way down to terminal state for all decision sequencesWe cannot search all the way down to terminal state for all decision sequences Use a heuristic to approximate (guess) eventual terminal stateUse a heuristic to approximate (guess) eventual terminal state Replace non-terminal states with output of heuristic and treat as if they were terminalReplace non-terminal states with output of heuristic and treat as if they were terminal

32 Evaluation Function (Estimator) The heuristic that estimates expected utility Cannot take too long (otherwise continue w/o it)Cannot take too long (otherwise continue w/o it) It should preserve the ordering among terminal statesIt should preserve the ordering among terminal states –otherwise it can cause bad decision making Define features of game state that assist in evaluationDefine features of game state that assist in evaluation –what are features of chess? The heuristic that estimates expected utility Cannot take too long (otherwise continue w/o it)Cannot take too long (otherwise continue w/o it) It should preserve the ordering among terminal statesIt should preserve the ordering among terminal states –otherwise it can cause bad decision making Define features of game state that assist in evaluationDefine features of game state that assist in evaluation –what are features of chess?

33 Truncating minimax search When do you recurse or use evaluation function? Cutoff-Test (state, depth) returns 1 or 0Cutoff-Test (state, depth) returns 1 or 0 –When 1 is returned, use evaluation function When do you recurse or use evaluation function? Cutoff-Test (state, depth) returns 1 or 0Cutoff-Test (state, depth) returns 1 or 0 –When 1 is returned, use evaluation function

34 When do you cut off? Cutoff if state is stable or quiescient (more predictable)Cutoff if state is stable or quiescient (more predictable)

35 When do you cut off? When exploring beyond a certain depthWhen exploring beyond a certain depth –The horizon effect When exploring beyond a certain depthWhen exploring beyond a certain depth –The horizon effect

36 When do you cut off? Use of a good heuristic as cutoff will expedite, but not invalidate search (same result w/o heuristic) Also… Cutoff moves you know are bad (forward pruning)Cutoff moves you know are bad (forward pruning) Risk losing good states down the roadRisk losing good states down the road Use of a good heuristic as cutoff will expedite, but not invalidate search (same result w/o heuristic) Also… Cutoff moves you know are bad (forward pruning)Cutoff moves you know are bad (forward pruning) Risk losing good states down the roadRisk losing good states down the road

37 Benefits of truncation Comparing Chess Number of plys that can considered per unit time Using minimax5 plyUsing minimax5 ply Average Human6-8 plyAverage Human6-8 ply Using alpha-beta10 plyUsing alpha-beta10 ply Intelligent pruning14 plyIntelligent pruning14 ply Comparing Chess Number of plys that can considered per unit time Using minimax5 plyUsing minimax5 ply Average Human6-8 plyAverage Human6-8 ply Using alpha-beta10 plyUsing alpha-beta10 ply Intelligent pruning14 plyIntelligent pruning14 ply

38 Games with chance How to include chance in game tree? Add chance nodesAdd chance nodes How to include chance in game tree? Add chance nodesAdd chance nodes

39 Expectiminimax Expectiminimax (n) = utility(n) if n is a terminal stateutility(n) if n is a terminal state if n is a MAX node if n is a MAX node if n is a MIN node if n is a MIN node if n is a chance node if n is a chance node Expectiminimax (n) = utility(n) if n is a terminal stateutility(n) if n is a terminal state if n is a MAX node if n is a MAX node if n is a MIN node if n is a MIN node if n is a chance node if n is a chance node

40 Pruning Can we prune search in games of chance? Think about alpha-beta pruningThink about alpha-beta pruning –With alpha-beta, we don’t explore nodes that we know are worse than what we know we can accomplish –With randomness, we never really what we will accomplish  chance node values are average of successors Thus it is hard to use alpha-beta Best case is to bound max/min outcomesBest case is to bound max/min outcomes Can we prune search in games of chance? Think about alpha-beta pruningThink about alpha-beta pruning –With alpha-beta, we don’t explore nodes that we know are worse than what we know we can accomplish –With randomness, we never really what we will accomplish  chance node values are average of successors Thus it is hard to use alpha-beta Best case is to bound max/min outcomesBest case is to bound max/min outcomes

41 History of Games Chess, Deep Blue IBM: 30 RS/6000 comps with 480 custom VLSI chess chipsIBM: 30 RS/6000 comps with 480 custom VLSI chess chips Deep Thought design came from Campbell and Hsu at CMUDeep Thought design came from Campbell and Hsu at CMU 126 mil nodes / s126 mil nodes / s 30 bil positions per move30 bil positions per move routine reaching depth of 14routine reaching depth of 14 iterative deepening alpha-beta searchiterative deepening alpha-beta search Chess, Deep Blue IBM: 30 RS/6000 comps with 480 custom VLSI chess chipsIBM: 30 RS/6000 comps with 480 custom VLSI chess chips Deep Thought design came from Campbell and Hsu at CMUDeep Thought design came from Campbell and Hsu at CMU 126 mil nodes / s126 mil nodes / s 30 bil positions per move30 bil positions per move routine reaching depth of 14routine reaching depth of 14 iterative deepening alpha-beta searchiterative deepening alpha-beta search

42 Deep Blue evaluation function had 8000 featuresevaluation function had 8000 features 4000 opening moves in memory4000 opening moves in memory 700,000 grandmaster games from which recommendations extracted700,000 grandmaster games from which recommendations extracted many endgames solved for all five piece combosmany endgames solved for all five piece combos evaluation function had 8000 featuresevaluation function had 8000 features 4000 opening moves in memory4000 opening moves in memory 700,000 grandmaster games from which recommendations extracted700,000 grandmaster games from which recommendations extracted many endgames solved for all five piece combosmany endgames solved for all five piece combos Deep Junior (Israeli Co.) – 8-processor 1.6 GHz Intel w/ 8 GB RAM (2003)

43 Checkers Arthur Samuel of IBM, 1952 program learned by playing against itselfprogram learned by playing against itself beat champion 1962 (but human clearly made error)beat champion 1962 (but human clearly made error) 19 KB of memory19 KB of memory 0.000001 Ghz processor0.000001 Ghz processor Arthur Samuel of IBM, 1952 program learned by playing against itselfprogram learned by playing against itself beat champion 1962 (but human clearly made error)beat champion 1962 (but human clearly made error) 19 KB of memory19 KB of memory 0.000001 Ghz processor0.000001 Ghz processor

44 Checkers Chinook, Jonathan Schaeffer, 1990 Alpha-beta search on regular PCsAlpha-beta search on regular PCs database of all 444 billion endgame positions with 8 piecesdatabase of all 444 billion endgame positions with 8 pieces Played against Marion TinsleyPlayed against Marion Tinsley –world champion for over 40 years –lost only 3 games in 40 years –Chinook won two games, but lost match Rematch with Tinsley was incomplete for health reasonsRematch with Tinsley was incomplete for health reasons –Chinook became world champion Chinook, Jonathan Schaeffer, 1990 Alpha-beta search on regular PCsAlpha-beta search on regular PCs database of all 444 billion endgame positions with 8 piecesdatabase of all 444 billion endgame positions with 8 pieces Played against Marion TinsleyPlayed against Marion Tinsley –world champion for over 40 years –lost only 3 games in 40 years –Chinook won two games, but lost match Rematch with Tinsley was incomplete for health reasonsRematch with Tinsley was incomplete for health reasons –Chinook became world champion

45 Othello Smaller search space (5 to 15 legal moves) Humans are no match for computers Smaller search space (5 to 15 legal moves) Humans are no match for computers

46 Backgammon Garry Tesauro, TD-Gammon, 1992 Reliably ranked in top-three players of worldReliably ranked in top-three players of world Learned to play through playing against itselfLearned to play through playing against itself –Reinforcement Learning Garry Tesauro, TD-Gammon, 1992 Reliably ranked in top-three players of worldReliably ranked in top-three players of world Learned to play through playing against itselfLearned to play through playing against itself –Reinforcement Learning

47 Go Most popular board game in Asia Branching factor of 361Branching factor of 361 Few competent computer playersFew competent computer players –Weak amature at best Most popular board game in Asia Branching factor of 361Branching factor of 361 Few competent computer playersFew competent computer players –Weak amature at best

48 Discussion How reasonable is minimax? perfectly performing opponentperfectly performing opponent perfect knowledge of leaf node evaluationsperfect knowledge of leaf node evaluations strong assumptionsstrong assumptions How reasonable is minimax? perfectly performing opponentperfectly performing opponent perfect knowledge of leaf node evaluationsperfect knowledge of leaf node evaluations strong assumptionsstrong assumptions

49 Metareasoning Reasoning about reasoning alpha-beta is one examplealpha-beta is one example –think before you think –think about utility of thinking about something before you think about it –don’t think about choices you don’t have to think about Reasoning about reasoning alpha-beta is one examplealpha-beta is one example –think before you think –think about utility of thinking about something before you think about it –don’t think about choices you don’t have to think about


Download ppt "CS 416 Artificial Intelligence Lecture 8 Adversarial Search Chapter 6 Lecture 8 Adversarial Search Chapter 6."

Similar presentations


Ads by Google