CS 416 Artificial Intelligence Lecture 23 Making Complex Decisions Chapter 17 Lecture 23 Making Complex Decisions Chapter 17.

CS 416 Artificial Intelligence Lecture 23 Making Complex Decisions Chapter 17 Lecture 23 Making Complex Decisions Chapter 17

Final Exam Reminder Final Exam is Tuesday, May 6 th at 7 p.m.Final Exam is Tuesday, May 6 th at 7 p.m. Let me know if you have a legitimate conflictLet me know if you have a legitimate conflictReminder Final Exam is Tuesday, May 6 th at 7 p.m.Final Exam is Tuesday, May 6 th at 7 p.m. Let me know if you have a legitimate conflictLet me know if you have a legitimate conflict

Zero-sum games Payoffs in each cell sum to zero Morra Two players (Odd and Even)Two players (Odd and Even) ActionAction –Each player simultaneously displays one or two fingers EvaluationEvaluation –f = total number of fingers  if f == odd, Even gives f dollars go to Odd  if f == even, Odd gives f dollars go to Even Payoffs in each cell sum to zero Morra Two players (Odd and Even)Two players (Odd and Even) ActionAction –Each player simultaneously displays one or two fingers EvaluationEvaluation –f = total number of fingers  if f == odd, Even gives f dollars go to Odd  if f == even, Odd gives f dollars go to Even

Optimal strategy von Neumann (1928) developed optimal mixed strategy for two-player, zero-sum games Because what one player wins, the other losesBecause what one player wins, the other loses –just keep track of one player’s payoff in each cell (Even) –assume this player wishes to maximize Maximin techniqueMaximin technique –make game a turn-taking game and analyze von Neumann (1928) developed optimal mixed strategy for two-player, zero-sum games Because what one player wins, the other losesBecause what one player wins, the other loses –just keep track of one player’s payoff in each cell (Even) –assume this player wishes to maximize Maximin techniqueMaximin technique –make game a turn-taking game and analyze

Maximin Change the rules of Morra for analysis Force Even to reveal strategy firstForce Even to reveal strategy first –apply minimax algorithm –Odd has an advantage and thus the outcome of the game is Even’s worst case and Even might do better in real game  The utility of this game to Even is >= $-3 Change the rules of Morra for analysis Force Even to reveal strategy firstForce Even to reveal strategy first –apply minimax algorithm –Odd has an advantage and thus the outcome of the game is Even’s worst case and Even might do better in real game  The utility of this game to Even is >= $-3

Maximin Change the rules of Morra for analysis Force Odd to reveal strategy firstForce Odd to reveal strategy first –Apply minimax algorithm  Odd would always select one to minimize Odd’s loss  Even would always select one to maximize Even’s gain –This game favors Even  The utility of this game to Even is <= +$2 Change the rules of Morra for analysis Force Odd to reveal strategy firstForce Odd to reveal strategy first –Apply minimax algorithm  Odd would always select one to minimize Odd’s loss  Even would always select one to maximize Even’s gain –This game favors Even  The utility of this game to Even is <= +$2

Combining two games Even’s combined utility EvenFirst_Utility <= Even’s_Utility <= OddFirst_UtilityEvenFirst_Utility <= Even’s_Utility <= OddFirst_Utility –-3 <= Even’s_Utility <= 2 Even’s combined utility EvenFirst_Utility <= Even’s_Utility <= OddFirst_UtilityEvenFirst_Utility <= Even’s_Utility <= OddFirst_Utility –-3 <= Even’s_Utility <= 2

Considering mixed strategies Mixed strategyMixed strategy –select one finger with prob: p –select two fingers with prob: 1 – p If one player reveals strategy first, second player will always use a pure strategyIf one player reveals strategy first, second player will always use a pure strategy –expected utility of a mixed strategy  U1 = p * u one + (1-p) u two –expected utility of a pure strategy  U2 = max (u one, u two ) –U2 is always greater than U1 Mixed strategyMixed strategy –select one finger with prob: p –select two fingers with prob: 1 – p If one player reveals strategy first, second player will always use a pure strategyIf one player reveals strategy first, second player will always use a pure strategy –expected utility of a mixed strategy  U1 = p * u one + (1-p) u two –expected utility of a pure strategy  U2 = max (u one, u two ) –U2 is always greater than U1

Modeling as a game tree Because the second player will always use a fixed strategy… Still pretending Even goes firstStill pretending Even goes first Because the second player will always use a fixed strategy… Still pretending Even goes firstStill pretending Even goes first - - - - -

What is outcome of this game? Player Odd has a choice Always pick the option that minimizes utility to EvenAlways pick the option that minimizes utility to Even Represent two choices as functions of pRepresent two choices as functions of p Odd picks line that is lowest (dark part on figure)Odd picks line that is lowest (dark part on figure) Even maximizes utility by choosing p to be where lines crossEven maximizes utility by choosing p to be where lines cross –5p – 3 = 4 – 7p p = 7/12 => E utility = -1/12 Player Odd has a choice Always pick the option that minimizes utility to EvenAlways pick the option that minimizes utility to Even Represent two choices as functions of pRepresent two choices as functions of p Odd picks line that is lowest (dark part on figure)Odd picks line that is lowest (dark part on figure) Even maximizes utility by choosing p to be where lines crossEven maximizes utility by choosing p to be where lines cross –5p – 3 = 4 – 7p p = 7/12 => E utility = -1/12

Pretend Odd must go first Even’s outcome decided by pure strategy (dependent on q) Even will always pick maximum of two choicesEven will always pick maximum of two choices Odd will minimize the maximum of two choicesOdd will minimize the maximum of two choices –Odd chooses intersection point –5q – 3 = 4 – 7q q = 7/12 => E utility = -1/12 Even’s outcome decided by pure strategy (dependent on q) Even will always pick maximum of two choicesEven will always pick maximum of two choices Odd will minimize the maximum of two choicesOdd will minimize the maximum of two choices –Odd chooses intersection point –5q – 3 = 4 – 7q q = 7/12 => E utility = -1/12

Final results Both players use same mixed strategy –p one = 7/12 –p two = 5/12 –Outcome of the game is -1/12 to Even Both players use same mixed strategy –p one = 7/12 –p two = 5/12 –Outcome of the game is -1/12 to Even

Generalization Two players with n action choices mixed strategy is not as simple as p, 1-pmixed strategy is not as simple as p, 1-p –it is (p 1, p 2, …, p n-1, 1-(p 1 +p 2 +…+p n-1 )) Solving for optimal p vector requires finding optimal point in (n-1)- dimensional spaceSolving for optimal p vector requires finding optimal point in (n-1)- dimensional space –lines become hyperplanes –some hyperplanes will be clearly worse for all p –find intersection among remaining hyperplanes –linear programming can solve this problem Two players with n action choices mixed strategy is not as simple as p, 1-pmixed strategy is not as simple as p, 1-p –it is (p 1, p 2, …, p n-1, 1-(p 1 +p 2 +…+p n-1 )) Solving for optimal p vector requires finding optimal point in (n-1)- dimensional spaceSolving for optimal p vector requires finding optimal point in (n-1)- dimensional space –lines become hyperplanes –some hyperplanes will be clearly worse for all p –find intersection among remaining hyperplanes –linear programming can solve this problem

Repeated games Imagine same game played multiple times payoffs accumulate for each playerpayoffs accumulate for each player optimal strategy is a function of game historyoptimal strategy is a function of game history –must select optimal action for each possible game history StrategiesStrategies –perpetual punishment  cross me once and I’ll take us both down forever –tit for tat  cross me once and I’ll cross you the subsequent move Imagine same game played multiple times payoffs accumulate for each playerpayoffs accumulate for each player optimal strategy is a function of game historyoptimal strategy is a function of game history –must select optimal action for each possible game history StrategiesStrategies –perpetual punishment  cross me once and I’ll take us both down forever –tit for tat  cross me once and I’ll cross you the subsequent move

The design of games Let’s invert the strategy selection process to design fair/effective games Tragedy of the commonsTragedy of the commons –individual farmers bring their livestock to the town commons to graze –commons is destroyed and all experience negative utility –all behaved rationally – refraining would not have saved the commons as someone else would eat it  Externalities are a way to place a value on changes in global utility  Power utilities pay for the utility they deprive neighboring communities (yet another Nobel prize in Econ for this – Coase) Let’s invert the strategy selection process to design fair/effective games Tragedy of the commonsTragedy of the commons –individual farmers bring their livestock to the town commons to graze –commons is destroyed and all experience negative utility –all behaved rationally – refraining would not have saved the commons as someone else would eat it  Externalities are a way to place a value on changes in global utility  Power utilities pay for the utility they deprive neighboring communities (yet another Nobel prize in Econ for this – Coase)

Auctions English AuctionEnglish Auction –auctioneer incrementally raises bid price until one bidder remains  bidder gets the item at the highest price of another bidder plus the increment (perhaps the highest bidder would have spent more?)  strategy is simple… keep bidding until price is higher than utility  strategy of other bidders is irrelevant English AuctionEnglish Auction –auctioneer incrementally raises bid price until one bidder remains  bidder gets the item at the highest price of another bidder plus the increment (perhaps the highest bidder would have spent more?)  strategy is simple… keep bidding until price is higher than utility  strategy of other bidders is irrelevant

Auctions Sealed bid auctionSealed bid auction –place your bid in an envelope and highest bid is selected  say your highest bid is v  say you believe the highest competing bid is b  bid min (v, b +  )  player with highest value on good may not win the good and players must contemplate other player’s values Sealed bid auctionSealed bid auction –place your bid in an envelope and highest bid is selected  say your highest bid is v  say you believe the highest competing bid is b  bid min (v, b +  )  player with highest value on good may not win the good and players must contemplate other player’s values

Auctions Vickery AuctionVickery Auction –Winner pays the price of the next highest bid –Dominant strategy is to bid what item is worth to you Vickery AuctionVickery Auction –Winner pays the price of the next highest bid –Dominant strategy is to bid what item is worth to you

Auctions These auction algorithms can find their way into computer- controlled systemsThese auction algorithms can find their way into computer- controlled systems –Networking  Routers  Ethernet –Thermostat control in offices (Xerox PARC) These auction algorithms can find their way into computer- controlled systemsThese auction algorithms can find their way into computer- controlled systems –Networking  Routers  Ethernet –Thermostat control in offices (Xerox PARC)

Next Topic: Statistical Learning Chapter 20 Urns and Balls / Candy Bags Data and Hypotheses Maximum Likelihood Bayes Learning Expectation Maximization Hidden Markov Models (HMMs) Urns and Balls / Candy Bags Data and Hypotheses Maximum Likelihood Bayes Learning Expectation Maximization Hidden Markov Models (HMMs)

Running example: Candy Surprise Candy Comes in two flavorsComes in two flavors –cherry (yum) –lime (yuk) All candy is wrapped in same opaque wrapperAll candy is wrapped in same opaque wrapper Candy is packaged in large bags containing five different allocations of cherry and limeCandy is packaged in large bags containing five different allocations of cherry and lime Surprise Candy Comes in two flavorsComes in two flavors –cherry (yum) –lime (yuk) All candy is wrapped in same opaque wrapperAll candy is wrapped in same opaque wrapper Candy is packaged in large bags containing five different allocations of cherry and limeCandy is packaged in large bags containing five different allocations of cherry and lime

Statistics Given a bag of candy, what distribution of flavors will it have? Let H be the random variable corresponding to your hypothesisLet H be the random variable corresponding to your hypothesis As you open pieces of candy, let each observation of data: D 1, D 2, D 3, … be either cherry or limeAs you open pieces of candy, let each observation of data: D 1, D 2, D 3, … be either cherry or lime Predict the flavor of the next piece of candyPredict the flavor of the next piece of candy Given a bag of candy, what distribution of flavors will it have? Let H be the random variable corresponding to your hypothesisLet H be the random variable corresponding to your hypothesis As you open pieces of candy, let each observation of data: D 1, D 2, D 3, … be either cherry or limeAs you open pieces of candy, let each observation of data: D 1, D 2, D 3, … be either cherry or lime Predict the flavor of the next piece of candyPredict the flavor of the next piece of candy

Bayesian Learning Use available data to calculate the probability of each hypothesis and make a prediction Because each hypothesis has an independent likelihood, we use all their relative likelihoods when making a predictionBecause each hypothesis has an independent likelihood, we use all their relative likelihoods when making a prediction Probabilistic inference using Bayes’ rule:Probabilistic inference using Bayes’ rule: –P(h i | d) =  P(d | h i ) P(h i ) Prediction of an unknown quantity X:Prediction of an unknown quantity X: Use available data to calculate the probability of each hypothesis and make a prediction Because each hypothesis has an independent likelihood, we use all their relative likelihoods when making a predictionBecause each hypothesis has an independent likelihood, we use all their relative likelihoods when making a prediction Probabilistic inference using Bayes’ rule:Probabilistic inference using Bayes’ rule: –P(h i | d) =  P(d | h i ) P(h i ) Prediction of an unknown quantity X:Prediction of an unknown quantity X:

CS 416 Artificial Intelligence Lecture 23 Making Complex Decisions Chapter 17 Lecture 23 Making Complex Decisions Chapter 17.

Similar presentations

Presentation on theme: "CS 416 Artificial Intelligence Lecture 23 Making Complex Decisions Chapter 17 Lecture 23 Making Complex Decisions Chapter 17."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS 416 Artificial Intelligence Lecture 23 Making Complex Decisions Chapter 17 Lecture 23 Making Complex Decisions Chapter 17.

Similar presentations

Presentation on theme: "CS 416 Artificial Intelligence Lecture 23 Making Complex Decisions Chapter 17 Lecture 23 Making Complex Decisions Chapter 17."— Presentation transcript:

Similar presentations

About project

Feedback