Presentation is loading. Please wait.

Presentation is loading. Please wait.

Falcon on a Cloudy Day A Ro Sham Bo Algorithm by Andrew Post.

Similar presentations


Presentation on theme: "Falcon on a Cloudy Day A Ro Sham Bo Algorithm by Andrew Post."— Presentation transcript:

1 Falcon on a Cloudy Day A Ro Sham Bo Algorithm by Andrew Post

2 Lets Review If you missed my previous presentation: If you missed my previous presentation: Ro Sham Bo = Rock Paper Scissors Ro Sham Bo = Rock Paper Scissors Can be more complicated though Can be more complicated though Ro Sham Bo has important applications Ro Sham Bo has important applications Algorithms compete at Ro Sham Bo in tournaments Algorithms compete at Ro Sham Bo in tournaments Iocaine Powder is the world champ of Ro Sham Bo Iocaine Powder is the world champ of Ro Sham Bo Because it uses ‘Sicilian Reasoning’ Because it uses ‘Sicilian Reasoning’ I will beat Iocaine Powder I will beat Iocaine Powder Eventually… Eventually…

3 What is Ro Sham Bo? Also known as Rock Paper Scissors Also known as Rock Paper Scissors

4 What is Ro Sham Bo? Generalized case of Rock Paper Scissors actually Generalized case of Rock Paper Scissors actually Not always three choices Not always three choices Ties can be resolved differently Ties can be resolved differently The game is not necessarily zero-sum The game is not necessarily zero-sum

5

6 Why does it matter? Many competitive scenarios involve a Ro Sham Bo Many competitive scenarios involve a Ro Sham Bo Example: Example: CBS and NBC choosing Primetime TV Shows CBS and NBC choosing Primetime TV Shows They can choose to show a Drama, Comedy, or Sports show They can choose to show a Drama, Comedy, or Sports show Viewers prefer Comedy to Drama, Sports to Comedy, and Drama to Sports, given the choice. Viewers prefer Comedy to Drama, Sports to Comedy, and Drama to Sports, given the choice. Neither station knows ahead of time what the other will choose Neither station knows ahead of time what the other will choose Billions of dollars every day rely on decisions like these. Billions of dollars every day rely on decisions like these.

7 How it works Simplest Non-Cooperative Game Simplest Non-Cooperative Game Players cannot play to ensure they both win Players cannot play to ensure they both win Governed by the Nash Equilibrium Governed by the Nash Equilibrium There are strategies which cannot be dominated There are strategies which cannot be dominated http://www.youtube.com/watch?v=pdrBDfRvpBA 1:31 -- 2:20 http://www.youtube.com/watch?v=pdrBDfRvpBA 1:31 -- 2:20 http://www.youtube.com/watch?v=pdrBDfRvpBA

8 How to Win As you just heard, playing randomly can ensure you don’t lose, but how do you win? As you just heard, playing randomly can ensure you don’t lose, but how do you win? How to predict your opponent How to predict your opponent Sub-Optimal Frequency Distributions Sub-Optimal Frequency Distributions Pattern Matching Pattern Matching History Analysis History Analysis

9 Iocaine Powder International Ro Sham Bo Programming Tournament Champion International Ro Sham Bo Programming Tournament Champion Named for this famous scene: http://youtube.com/watch?v=TUee1WvtQZU 0:57 -- 2:20 Named for this famous scene: http://youtube.com/watch?v=TUee1WvtQZU 0:57 -- 2:20 http://youtube.com/watch?v=TUee1WvtQZU

10 The Tournament Tournament programs play thousands of rounds Tournament programs play thousands of rounds Win by beating the most opponents by a large margin Win by beating the most opponents by a large margin Most programs play sub-optimally, so exploiting your opponent is more important than playing randomly to avoid losing. Most programs play sub-optimally, so exploiting your opponent is more important than playing randomly to avoid losing.

11

12 Iocaine Powder IP is the algorithm which does this best. IP is the algorithm which does this best. IP uses the same heuristics to predict what an opponent is most likely to do. IP uses the same heuristics to predict what an opponent is most likely to do. Using the same tools, how can you be better? Using the same tools, how can you be better? Sicilian Reasoning!

13 Sicilian Reasoning Levels of second guessing: Levels of second guessing: 1. Opponent will play rock, so play paper 2. Opponent knows you will counter rock with paper, and play scissors – so play rock 3. Opponent knows all this, and will now play paper to beat your rock – so play scissors 4. Opponent will play rock again – same as 1

14 Sicilian Reasoning Use your predictive strategies to evaluate what is going to happen next. Use your predictive strategies to evaluate what is going to happen next. Run SR on yourself and your opponent, and keep a table of what each of the six levels of reasoning say you should do. Run SR on yourself and your opponent, and keep a table of what each of the six levels of reasoning say you should do. Pick the level of reasoning which would have won against what your opponent actually did the most often. Pick the level of reasoning which would have won against what your opponent actually did the most often.

15 Wait, six? Don’t you mean three? You can use the same predictive tools that your opponent uses to ‘predict’ what you are going to do. You can use the same predictive tools that your opponent uses to ‘predict’ what you are going to do. Now you have three more levels of SR: Now you have three more levels of SR: 4. I will play rock. So he plays paper. So play Scissors 5. He knows I will counter with scissors, and play rock. So play Paper. 6. He expects me to counter-counter with paper, and will play scissors. So play rock.

16 More Sicilian Reasoning Just because one level of SR is winning now, doesn’t mean it always will be. Just because one level of SR is winning now, doesn’t mean it always will be. Opponents will change how they play if they are losing, so you must change too! Opponents will change how they play if they are losing, so you must change too! How do you switch your level of SR? How do you switch your level of SR?

17 Switching Reasoning SR-2 has just won the first 100 rounds SR-2 has just won the first 100 rounds Opponent changes strategy Opponent changes strategy You lose 50 rounds before SR-4 has more than 100 theoretical wins. You lose 50 rounds before SR-4 has more than 100 theoretical wins. You just wasted 50 rounds! You just wasted 50 rounds!

18 Switching Reasoning Use several different methodologies for switches Use several different methodologies for switches Most wins in last 10, 25, 50, 100, 1000 rounds Most wins in last 10, 25, 50, 100, 1000 rounds Has won the most in similar situations Has won the most in similar situations Causes the opponent to switch to a worse strategy Causes the opponent to switch to a worse strategy

19 Switching Reasoning Here is the real genius – now use the switching methodology which has helped you win the most rounds! Here is the real genius – now use the switching methodology which has helped you win the most rounds!

20 Falcon on a Cloudy Day So you ask, how do you beat Iocaine Powder? So you ask, how do you beat Iocaine Powder? Improve the basic predictive heuristics Improve the basic predictive heuristics Extend Sicilian Reasoning Extend Sicilian Reasoning

21 Improving Prediction What I have implemented: What I have implemented: Improved Variable History Analysis Improved Variable History Analysis Look at just your history, your opponents, or both Look at just your history, your opponents, or both Improved Frequency Analysis Improved Frequency Analysis EV[x] = Pr[x+2] - Pr[x+1] EV[x] = Pr[x+2] - Pr[x+1]

22 Demonstration Here is how my project does with what is implemented so far. Here is how my project does with what is implemented so far.

23 Improving Prediction What I have not implemented yet: What I have not implemented yet: Improved Pattern Matching Improved Pattern Matching Markov Models with MegaHAL Markov Models with MegaHAL Extended Sicilian Reasoning Extended Sicilian Reasoning

24 More on MegaHAL MegaHAL is a very simple "infinite-order" Markov model. MegaHAL is a very simple "infinite-order" Markov model. Stores frequency information about the moves the opponent has made in the past for all possible contexts Stores frequency information about the moves the opponent has made in the past for all possible contexts Using the ‘context’ of the last few moves, the “appropriate” response is then selected. Using the ‘context’ of the last few moves, the “appropriate” response is then selected.

25 Extended Sicilian Reasoning Q: Isn’t Sicilian Reasoning complete at 6? Q: Isn’t Sicilian Reasoning complete at 6? A: Yes, but there is information we are ignoring. A: Yes, but there is information we are ignoring. By compressing your strategy decisions into the idea of which of six strategies is best right now, you have no way to keep track of how changing your strategies has paid off best in the past. By compressing your strategy decisions into the idea of which of six strategies is best right now, you have no way to keep track of how changing your strategies has paid off best in the past.

26 Now for some Math Hilbert Space Hilbert Space Game Trajectory and Game State Game Trajectory and Game State Projection Operators Projection Operators Annotated History Analysis Annotated History Analysis Project Enigma Project Enigma


Download ppt "Falcon on a Cloudy Day A Ro Sham Bo Algorithm by Andrew Post."

Similar presentations


Ads by Google