Reinforcement Learning and Tetris Jared Christen.

Slides:



Advertisements
Similar presentations
Fuzzy Reasoning in Computer Go Opening Stage Strategy P.Lekhavat and C.J.Hinde.
Advertisements

CS 4701 – Practicum in Artificial Intelligence Pre-proposal Presentation TEAM SKYNET: Brian Nader Stephen Stinson Rei Suzuki.
Sorting Really Big Files Sorting Part 3. Using K Temporary Files Given  N records in file F  M records will fit into internal memory  Use K temp files,
Fantasy Football Forecasting
Tetris – Genetic Algorithm Presented by, Jeethan & Jun.
Building Level Benchmark Data This represents the percent of students who demonstrated the following proficiency levels on benchmark assessments. AP-Advanced.
Tetris and Genetic Algorithms Math Club 5/30/2011.
CS 4700: Foundations of Artificial Intelligence Bart Selman Reinforcement Learning R&N – Chapter 21 Note: in the next two parts of RL, some of the figure/section.
CS AI Practicum Fall Robot Downey Jr Will Kiffer Joey Staehle Andrew Bunyea.
Chapter 10 Artificial Intelligence © 2007 Pearson Addison-Wesley. All rights reserved.
Date:2011/06/08 吳昕澧 BOA: The Bayesian Optimization Algorithm.
Tetris AI 팀원 김유섭 ( ) 류동균 ( ) 임성훈 ( )
Honte, a Go-Playing Program Using Neural Nets Frederik Dahl.
Handwritten Character Recognition Using Artificial Neural Networks Shimie Atkins & Daniel Marco Supervisor: Johanan Erez Technion - Israel Institute of.
Games Programming III (TGP2281) – T1, 2010/2011 Game AI Fundamentals John See 15 November 2010.
Escape from Fitz! Mark Healy Nickie McCabe April 26, 2005.
Hao Luo, Zhifei Song. Introduction Lock and Roll: very popular game on iphone, more than 12,000,000 had been played ever since last may. Layout of the.
Incorporating Advice into Agents that Learn from Reinforcement Presented by Alp Sardağ.
Implementation of neuronetwork system on FPGA (characterization presentation) supervisor: Karina Odinaev Vyacheslav Yushin Igor Derzhavets Winter 2007.
Othello Sean Farrell June 29, Othello Two-player game played on 8x8 board All pieces have one white side and one black side Initial board setup.
Reinforcement Learning of Local Shape in the Game of Atari-Go David Silver.
Spam? Not any more !! Detecting spam s using neural networks ECE/CS/ME 539 Project presentation Submitted by Sivanadyan, Thiagarajan.
A Study of Computational and Human Strategies in Revelation Games 1 Noam Peled, 2 Kobi Gal, 1 Sarit Kraus 1 Bar-Ilan university, Israel. 2 Ben-Gurion university,
ADITI BHAUMICK ab3585. To use reinforcement learning algorithm with function approximation. Feature-based state representations using a broad characterization.
Marcus Gallagher and Mark Ledwich School of Information Technology and Electrical Engineering University of Queensland, Australia Sumaira Saeed Evolving.
IE 594 : Research Methodology – Discrete Event Simulation David S. Kim Spring 2009.
Baseline Methods for the Feature Extraction Class Isabelle Guyon Best BER=1.26  0.14% - n0=1000 (20%) – BER0=1.80% GISETTE Best BER=1.26  0.14% - n0=1000.
Chapter 11: Artificial Intelligence
Chapter 10 Artificial Intelligence. © 2005 Pearson Addison-Wesley. All rights reserved 10-2 Chapter 10: Artificial Intelligence 10.1 Intelligence and.
Neural NASCAR Networks Backpropagation Approach to Fantasy NASCAR Prediction Michael A. Hinterberg ECE 539 Project Presentation Wednesday, 10 May 2000.
Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *
Introduction Many decision making problems in real life
Applying reinforcement learning to Tetris A reduction in state space Underling : Donald Carr Supervisor : Philip Sterne.
Reinforcement Learning for the game of Tetris using Cross Entropy
110/19/2015CS360 AI & Robotics AI Application Areas  Neural Networks and Genetic Algorithms  These model the structure of neurons in the brain  Humans.
1 Phase II - Checkers Operator: Eric Bengfort Temporal Status: End of Week Five Location: Phase Two Presentation Systems Check: Checkers Checksum Passed.
Introduction to Design and Manufacture Supply Chain Analysis (K. Khammuang & H. S. Gan) A scientific approach to decision making, which seeks to.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 11: Artificial Intelligence Computer Science: An Overview Tenth Edition.
Spatio-Temporal Case-Based Reasoning for Behavioral Selection Maxim Likhachev and Ronald Arkin Mobile Robot Laboratory Georgia Tech.
CHECKERS: TD(Λ) LEARNING APPLIED FOR DETERMINISTIC GAME Presented By: Presented To: Amna Khan Mis Saleha Raza.
Relational Macros for Transfer in Reinforcement Learning Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin University.
Algorithms and their Applications CS2004 ( ) 13.1 Further Evolutionary Computation.
Top level learning Pass selection using TPOT-RL. DT receiver choice function DT is trained off-line in artificial situation DT used in a heuristic, hand-coded.
Carla P. Gomes CS4700 CS 4701: Practicum in Artificial Intelligence Carla P. Gomes
CUHK Learning-Based Power Management for Multi-Core Processors YE Rong Nov 15, 2011.
Is One Better than None? Does foresight allow an artificial intelligence to survive longer in Tetris? William Granger and Liqun Tracy Yang.
Tetris Agent Optimization Using Harmony Search Algorithm
Distributed Q Learning Lars Blackmore and Steve Block.
Backgammon Group 1: - Remco Bras - Tim Beyer - Maurice Hermans - Esther Verhoef - Thomas Acker.
Applying reinforcement learning to Tetris Researcher : Donald Carr Supervisor : Philip Sterne.
LO: We’re learning to outline a program using Pseudo Code.
A Heuristic Hillclimbing Algorithm for Mastermind Alexandre Temporel and Tim Kovacs.
Metaheuristics for the New Millennium Bruce L. Golden RH Smith School of Business University of Maryland by Presented at the University of Iowa, March.
Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno 1 Authors : Siming Liu, Christopher Ballinger, Sushil Louis
Chapter 13 Artificial Intelligence. Artificial Intelligence – Figure 13.1 The Turing Test.
Evolutionary Computation Evolving Neural Network Topologies.
Summary of “Efficient Deep Learning for Stereo Matching”
Chapter 11: Artificial Intelligence
Chapter 11: Artificial Intelligence
2009: Topics Covered in COSC 6368
Reinforcement Learning
A Distributed Genetic Algorithm for Learning Evaluation Parameters in a Strategy Game Gary Matthias.
CS 4700: Foundations of Artificial Intelligence
Training Neural networks to play checkers
Multi-Biometrics: Fusing At The Classification Output Level Using Keystroke and Mouse Motion Features Todd Breuer, Paola Garcia Cardenas, Anu George, Hung.
Opportunity Qualification <customer name>
Reinforcement Learning
Heuristic Search in Empire-Based Games
Presentation transcript:

Reinforcement Learning and Tetris Jared Christen

Tetris Markov decision processes Large state space Long-term strategy without long-term knowledge

Background Hand-coded algorithms can clear > 1,000,000 lines Genetic algorithm by Roger Llima averages 42,000 lines Reinforcement learning algorithm by Kurt Driessens averages lines

Goals Develop a Tetris agent that improves on previous reinforcement learning implementations Secondary goals Use as few handpicked features as possible Encourage risk-taking Include rarely-studied features of Tetris

Approach

Neural Net Control Inputs Raw state – filled & empty blocks Handpicked features Outputs Movements Placements

Contour Matching

Structure Active tetromino Next tetromino Held tetromino Placement 1 score Placement 1 match length Placement 1 value Placement n match length Placement n value Placement n score Hold value

Experiments 200 learning games Averaged over 30 runs Two-piece and six-piece configurations Compare to benchmark contour matching agent

Results Two-pieceSix-piece

Results Scor e Lines Cleared Best match Two-piece Six-piece Six-piece with height differences Six-piece with placement heights

Conclusions Accidentally developed a heuristic that beats previous reinforcement learning techniques Six-piece’s outperformance of two- piece suggests there is some pseudo- planning going on A better way to generalize the board state may be necessary