Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction Many decision making problems in real life

Similar presentations


Presentation on theme: "Introduction Many decision making problems in real life"— Presentation transcript:

0 Soft Computing Laboratory 장 수 형
Computers & Operations Research, no. 35, 2008 Application of reinforcement learning to the game of Othello Nees Jan van Eck, Mechiel van Wezel Soft Computing Laboratory 장 수 형

1 Introduction Many decision making problems in real life
Not depend on an isolated decision but rather on a sequence of decision But traditional theory does not account for consciousness in WM Markov decision processes(MDPs) A well-known class of sequential decision making problems Optimal decision in a given state Widespread application Some algorithms that are guaranteed to find optimal policies Dynamic programming methods Weak point Many possible states Require exact knowledge Reinforcement learning algorithms Machine learning, operation research, control theory, psychology, neurosicence Robotics, control to industrial manufacturing combinatorial search 1

2 Intoduction Othello Well defined sequential decision making problem
Huge state space(approximately ) Easily measure performance Experiment without the use of any knowledge provided by human

3 Reinforcement learning and sequential decision making problems
At each moment Environment is in a certain state The agent observes this state The agent takes an action The environment responds with reward The agent’s task is to learn to take optimal action Maximize the sum of immediate rewards and future reward Sacrificing immediate rewards to obtain a greater cumulative reward

4 RL and sequential decision making problems
r : Reward : Learning late(future discount factor) If = 0, only the immediate rewards are consider As is set closer to 1, future rewards are given greater emphasis state-value function

5 Q-learning A reinforcement learning algorithm that learns the value of a functiotn Q(s,a) to find an optimal policy

6 Neural Network

7 Q-leaning with neural network

8 Networks Single and distinct

9 Action select Trade-off between exploration and exploitation
Exploiting Select the action with the highest estimated Q-value Obtain high reward Exploring Improve its knowledge of the Q-function Make better action selections in the future Softmax function probability model Using Bolzmann distribution

10 Othello A two-player game Zero-sum board game(competitive)
Fixed total reward Perfect information Imperfect game : poker, RTS games The state space size is approximately Its length is 60 moves at most 8 by 8 board using 64 two sided discs Initially the board is empty except for central four square

11 Othello

12 Strategies Three phases Opening game, middle game End game
The goal is to strategically position the disc on the board Cannot be flipped Corners and edges End game Maximizing one’s own discs while minimizing the opponent’s disc

13 Positional player Does not learn
Plays according to the positional strategy Opponent = -1 Player = 1 Unoccupied - 0

14 Mobility player Does not learn
Plays according to the mobility strategy Mobility concept Legal moves Corner position are great importance Number of corner squares occupied by player Number of corner squares occupied by opponent Player’s mobility Opponent’s mobility Weight parameter

15 Q-learning player Uses the q-learning algorithm
The current state of the board State of game Reward is 0 until end of the game Upon completing the game +1 for a win, -1 for a loss, and 0 for a draw Aims to choose optimal actions leading to maximal reward Leaning rate is set to 0,1, discount factor is set to 1 Does not change during learning Equal weight to immediate and future rewards Only care about winning and not about winning as fast as possible Softmax action selection method

16 Implementation of the Othello playing agents

17 Experiment & Result 15,000,000 games were played for training
Be Evaluated by playing 100 game against two benchmark player Positional player, Mobility player More difficult Q-learning player to play against a mobility player than against the positional player

18 Experiment & Result 15,000,000 games were played for training
Be Evaluated by playing 100 game against two benchmark player Positional player, Mobility player

19 Summary, conclusion and outlook
Reinforcement learning Described q-leaning with neural network Othello has a huge space Applied q-learning to the game of Othello with neural network Future research Use of an adapted version of Q-learning The minimax Q-learning described by Littman Study the effects of the presentation of special board features In order to simplify learning Study potential application of reinforcement learning Operation research, management science General MDP application by WHITE

20 E.N.D


Download ppt "Introduction Many decision making problems in real life"

Similar presentations


Ads by Google