Introduction Many decision making problems in real life

Slides:



Advertisements
Similar presentations
Adversarial Search Chapter 6 Section 1 – 4. Types of Games.
Advertisements

RL for Large State Spaces: Value Function Approximation
Ai in game programming it university of copenhagen Reinforcement Learning [Outro] Marco Loog.
February 7, 2006AI: Chapter 6: Adversarial Search1 Artificial Intelligence Chapter 6: Adversarial Search Michael Scherger Department of Computer Science.
10/19/2004TCSS435A Isabelle Bichindaritz1 Game and Tree Searching.
Games CPSC 386 Artificial Intelligence Ellen Walker Hiram College.
Minimax and Alpha-Beta Reduction Borrows from Spring 2006 CS 440 Lecture Slides.
1 Monte Carlo Methods Week #5. 2 Introduction Monte Carlo (MC) Methods –do not assume complete knowledge of environment (unlike DP methods which assume.
COSC 878 Seminar on Large Scale Statistical Machine Learning 1.
Reinforcement Learning & Apprenticeship Learning Chenyi Chen.
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Reinforcement Learning.
Reinforcement Learning
Reinforcement Learning of Local Shape in the Game of Atari-Go David Silver.
Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background Markov games (2-player) –Background –Learning Markov games Littman’s.
Reinforcement Learning
לביצוע מיידי ! להתחלק לקבוצות –2 או 3 בקבוצה להעביר את הקבוצות – היום בסוף השיעור ! ספר Reinforcement Learning – הספר קיים online ( גישה מהאתר של הסדנה.
Cooperative Q-Learning Lars Blackmore and Steve Block Expertness Based Cooperative Q-learning Ahmadabadi, M.N.; Asadpour, M IEEE Transactions on Systems,
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
1 Kunstmatige Intelligentie / RuG KI Reinforcement Learning Johan Everts.
1 Machine Learning: Symbol-based 9d 9.0Introduction 9.1A Framework for Symbol-based Learning 9.2Version Space Search 9.3The ID3 Decision Tree Induction.
Othello Sean Farrell June 29, Othello Two-player game played on 8x8 board All pieces have one white side and one black side Initial board setup.
Reinforcement Learning Yishay Mansour Tel-Aviv University.
Reinforcement Learning (1)
1 Quality of Experience Control Strategies for Scalable Video Processing Wim Verhaegh, Clemens Wüst, Reinder J. Bril, Christian Hentschel, Liesbeth Steffens.
Making Decisions CSE 592 Winter 2003 Henry Kautz.
Reinforcement Learning of Local Shape in the Game of Atari-Go David Silver.
Exploration in Reinforcement Learning Jeremy Wyatt Intelligent Robotics Lab School of Computer Science University of Birmingham, UK
1 On the Agenda(s) of Research on Multi-Agent Learning by Yoav Shoham and Rob Powers and Trond Grenager Learning against opponents with bounded memory.
1 Reinforcement Learning: Learning algorithms Function Approximation Yishay Mansour Tel-Aviv University.
Reinforcement Learning in the Presence of Hidden States Andrew Howard Andrew Arnold {ah679
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
Reinforcement Learning on Markov Games Nilanjan Dasgupta Department of Electrical and Computer Engineering Duke University Durham, NC Machine Learning.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Reinforcement Learning
OBJECT FOCUSED Q-LEARNING FOR AUTONOMOUS AGENTS M. ONUR CANCI.
Game-playing AIs Part 1 CIS 391 Fall CSE Intro to AI 2 Games: Outline of Unit Part I (this set of slides)  Motivation  Game Trees  Evaluation.
Reinforcement Learning Presentation Markov Games as a Framework for Multi-agent Reinforcement Learning Mike L. Littman Jinzhong Niu March 30, 2004.
Reinforcement Learning
Reinforcement Learning 主講人:虞台文 Content Introduction Main Elements Markov Decision Process (MDP) Value Functions.
Reinforcement Learning Ata Kaban School of Computer Science University of Birmingham.
CHECKERS: TD(Λ) LEARNING APPLIED FOR DETERMINISTIC GAME Presented By: Presented To: Amna Khan Mis Saleha Raza.
© D. Weld and D. Fox 1 Reinforcement Learning CSE 473.
Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik.
Reinforcement Learning Yishay Mansour Tel-Aviv University.
Cooperative Q-Learning Lars Blackmore and Steve Block Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents Tan, M Proceedings of the.
Reinforcement Learning 主講人:虞台文 大同大學資工所 智慧型多媒體研究室.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 9 of 42 Wednesday, 14.
CUHK Learning-Based Power Management for Multi-Core Processors YE Rong Nov 15, 2011.
Games as adversarial search problems Dynamic state space search.
Reinforcement Learning with Laser Cats! Marshall Wang Maria Jahja DTR Group Meeting October 5, 2015.
Reinforcement learning (Chapter 21)
Reinforcement Learning
CS 484 – Artificial Intelligence1 Announcements Homework 5 due Tuesday, October 30 Book Review due Tuesday, October 30 Lab 3 due Thursday, November 1.
Reinforcement Learning AI – Week 22 Sub-symbolic AI Two: An Introduction to Reinforcement Learning Lee McCluskey, room 3/10
ARTIFICIAL INTELLIGENCE (CS 461D) Princess Nora University Faculty of Computer & Information Systems.
Possible actions: up, down, right, left Rewards: – 0.04 if non-terminal state Environment is observable (i.e., agent knows where it is) MDP = “Markov Decision.
Reinforcement Learning Guest Lecturer: Chengxiang Zhai Machine Learning December 6, 2001.
Deep Learning and Deep Reinforcement Learning. Topics 1.Deep learning with convolutional neural networks 2.Learning to play Atari video games with Deep.
Reinforcement Learning. Overview Supervised Learning: Immediate feedback (labels provided for every input). Unsupervised Learning: No feedback (no labels.
REINFORCEMENT LEARNING Unsupervised learning 1. 2 So far ….  Supervised machine learning: given a set of annotated istances and a set of categories,
Chapter 5 Adversarial Search. 5.1 Games Why Study Game Playing? Games allow us to experiment with easier versions of real-world situations Hostile agents.
CS 188: Artificial Intelligence Fall 2007 Lecture 12: Reinforcement Learning 10/4/2007 Dan Klein – UC Berkeley.
CS 5751 Machine Learning Chapter 13 Reinforcement Learning1 Reinforcement Learning Control learning Control polices that choose optimal actions Q learning.
Artificial Intelligence AIMA §5: Adversarial Search
Stochastic tree search and stochastic games
Reinforcement Learning
A Crash Course in Reinforcement Learning
Reinforcement Learning
Reinforcement Learning (2)
Reinforcement Learning (2)
Presentation transcript:

Soft Computing Laboratory 장 수 형 Computers & Operations Research, no. 35, 2008 Application of reinforcement learning to the game of Othello Nees Jan van Eck, Mechiel van Wezel Soft Computing Laboratory 장 수 형

Introduction Many decision making problems in real life Not depend on an isolated decision but rather on a sequence of decision But traditional theory does not account for consciousness in WM Markov decision processes(MDPs) A well-known class of sequential decision making problems Optimal decision in a given state Widespread application Some algorithms that are guaranteed to find optimal policies Dynamic programming methods Weak point Many possible states Require exact knowledge Reinforcement learning algorithms Machine learning, operation research, control theory, psychology, neurosicence Robotics, control to industrial manufacturing combinatorial search 1

Intoduction Othello Well defined sequential decision making problem Huge state space(approximately ) Easily measure performance Experiment without the use of any knowledge provided by human

Reinforcement learning and sequential decision making problems At each moment Environment is in a certain state The agent observes this state The agent takes an action The environment responds with reward The agent’s task is to learn to take optimal action Maximize the sum of immediate rewards and future reward Sacrificing immediate rewards to obtain a greater cumulative reward

RL and sequential decision making problems r : Reward : Learning late(future discount factor) If = 0, only the immediate rewards are consider As is set closer to 1, future rewards are given greater emphasis state-value function

Q-learning A reinforcement learning algorithm that learns the value of a functiotn Q(s,a) to find an optimal policy

Neural Network

Q-leaning with neural network

Networks Single and distinct

Action select Trade-off between exploration and exploitation Exploiting Select the action with the highest estimated Q-value Obtain high reward Exploring Improve its knowledge of the Q-function Make better action selections in the future Softmax function probability model Using Bolzmann distribution

Othello A two-player game Zero-sum board game(competitive) Fixed total reward Perfect information Imperfect game : poker, RTS games The state space size is approximately Its length is 60 moves at most 8 by 8 board using 64 two sided discs Initially the board is empty except for central four square

Othello

Strategies Three phases Opening game, middle game End game The goal is to strategically position the disc on the board Cannot be flipped Corners and edges End game Maximizing one’s own discs while minimizing the opponent’s disc

Positional player Does not learn Plays according to the positional strategy Opponent = -1 Player = 1 Unoccupied - 0

Mobility player Does not learn Plays according to the mobility strategy Mobility concept Legal moves Corner position are great importance Number of corner squares occupied by player Number of corner squares occupied by opponent Player’s mobility Opponent’s mobility Weight parameter

Q-learning player Uses the q-learning algorithm The current state of the board State of game Reward is 0 until end of the game Upon completing the game +1 for a win, -1 for a loss, and 0 for a draw Aims to choose optimal actions leading to maximal reward Leaning rate is set to 0,1, discount factor is set to 1 Does not change during learning Equal weight to immediate and future rewards Only care about winning and not about winning as fast as possible Softmax action selection method

Implementation of the Othello playing agents

Experiment & Result 15,000,000 games were played for training Be Evaluated by playing 100 game against two benchmark player Positional player, Mobility player More difficult Q-learning player to play against a mobility player than against the positional player

Experiment & Result 15,000,000 games were played for training Be Evaluated by playing 100 game against two benchmark player Positional player, Mobility player

Summary, conclusion and outlook Reinforcement learning Described q-leaning with neural network Othello has a huge space Applied q-learning to the game of Othello with neural network Future research Use of an adapted version of Q-learning The minimax Q-learning described by Littman Study the effects of the presentation of special board features In order to simplify learning Study potential application of reinforcement learning Operation research, management science General MDP application by WHITE

E.N.D