Class Project Due at end of finals week Essentially anything you want, so long as it’s AI related and I approve Any programming language you want In pairs.

Slides:



Advertisements
Similar presentations
Reinforcement Learning
Advertisements

Lirong Xia Reinforcement Learning (2) Tue, March 21, 2014.
Markov Decision Process
Genetic Algorithms (Evolutionary Computing) Genetic Algorithms are used to try to “evolve” the solution to a problem Generate prototype solutions called.
RL for Large State Spaces: Value Function Approximation
Randomized Strategies and Temporal Difference Learning in Poker Michael Oder April 4, 2002 Advisor: Dr. David Mutchler.
Eick: Reinforcement Learning. Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning.
Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning Applications Summary.
Ai in game programming it university of copenhagen Reinforcement Learning [Outro] Marco Loog.
Reinforcement Learning
Minimax and Alpha-Beta Reduction Borrows from Spring 2006 CS 440 Lecture Slides.
1 Reinforcement Learning Introduction & Passive Learning Alan Fern * Based in part on slides by Daniel Weld.
Reinforcement Learning & Apprenticeship Learning Chenyi Chen.
Markov Decision Processes
Class Project Due at end of finals week Essentially anything you want, so long as its AI related and I approve Any programming language you want In pairs.
Reinforcement Learning
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
Reinforcement Learning Introduction Presented by Alp Sardağ.
Reinforcement Learning Game playing: So far, we have told the agent the value of a given board position. How can agent learn which positions are important?
Making Decisions CSE 592 Winter 2003 Henry Kautz.
Exploration in Reinforcement Learning Jeremy Wyatt Intelligent Robotics Lab School of Computer Science University of Birmingham, UK
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
Instructor: Vincent Conitzer
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Reinforcement Learning
REINFORCEMENT LEARNING LEARNING TO PERFORM BEST ACTIONS BY REWARDS Tayfun Gürel.
Introduction Many decision making problems in real life
Reinforcement Learning
1 CSC 8520 Spring Paula Matuszek Kinds of Machine Learning Machine learning techniques can be grouped into several categories, in several ways: –What.
Eick: Reinforcement Learning. Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning.
Reinforcement Learning
CS 621 Reinforcement Learning Group 8 Neeraj Bisht Ranjeet Vimal Nishant Suren Naineet C. Patel Jimmie Tete.
CHECKERS: TD(Λ) LEARNING APPLIED FOR DETERMINISTIC GAME Presented By: Presented To: Amna Khan Mis Saleha Raza.
© D. Weld and D. Fox 1 Reinforcement Learning CSE 473.
Artificial Intelligence and Searching CPSC 315 – Programming Studio Spring 2013 Project 2, Lecture 1 Adapted from slides of Yoonsuck Choe.
Eick: Reinforcement Learning. Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning.
Course Overview  What is AI?  What are the Major Challenges?  What are the Main Techniques?  Where are we failing, and why?  Step back and look at.
CPS 570: Artificial Intelligence Markov decision processes, POMDPs
Reinforcement Learning AI – Week 22 Sub-symbolic AI Two: An Introduction to Reinforcement Learning Lee McCluskey, room 3/10
Chapter 6 Neural Network.
Reinforcement Learning
Deep Learning and Deep Reinforcement Learning. Topics 1.Deep learning with convolutional neural networks 2.Learning to play Atari video games with Deep.
Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning Applications Summary.
Chapter 5 Adversarial Search. 5.1 Games Why Study Game Playing? Games allow us to experiment with easier versions of real-world situations Hostile agents.
Reinforcement Learning  Basic idea:  Receive feedback in the form of rewards  Agent’s utility is defined by the reward function  Must learn to act.
1 Passive Reinforcement Learning Ruti Glick Bar-Ilan university.
Stochastic tree search and stochastic games
Done Done Course Overview What is AI? What are the Major Challenges?
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Markov Decision Processes
Reinforcement Learning
Reinforcement Learning
Reinforcement Learning
Hidden Markov Models Part 2: Algorithms
Announcements Homework 3 due today (grace period through Friday)
CAP 5636 – Advanced Artificial Intelligence
Dr. Unnikrishnan P.C. Professor, EEE
Reinforcement Learning
Instructor: Vincent Conitzer
CS 188: Artificial Intelligence Spring 2006
CS 188: Artificial Intelligence Fall 2008
CS 188: Artificial Intelligence Spring 2006
Artificial Intelligence and Searching
Reinforcement Learning (2)
Reinforcement Learning
Reinforcement Learning (2)
Presentation transcript:

Class Project Due at end of finals week Essentially anything you want, so long as it’s AI related and I approve Any programming language you want In pairs or individual me by Wednesday, November 3

Projects Implementing Knn to Classify Bedform Stability Fields Blackjack Using Genetic Algorithms Computer game players:Go, Checkers, Connect Four, Chess, Poker Computer puzzle solvers: Minesweeper, mazes Pac-Man with intelligent monsters Genetic algorithms: blackjack strategy Automated 20-questions player Paper on planning Neural network spam filter Learning neural networks via GAs

Projects Solving neural networks via backprop Code decryptor using Gas Box pushing agent (competing against an opponent)

What didn’t work as well Too complicated games: Risk, Yahtzee, Chess, Scrabble, Battle Simulation Got too focused in making game work I sometimes had trouble running the game Game was often incomplete Didn’t have time to do enough AI Problems that were too vague Simulated ant colonies / genetic algorithms Bugs swarming for heat (emergent intelligence never happened) Finding paths through snow AdaBoost on protein folding data Couldn’t get boosting working right, needed more time on small datasets (spent lots of time parsing protein data)

Reinforcement Learning Game playing: So far, we have told the agent the value of a given board position. How can agent learn which positions are important? Play whole bunch of games, and receive reward at end (+ or -) How to determine utility of states that aren’t ending states?

The setup: Possible game states Terminal states have reward Mission: Estimate utility of all possible game states

What is a state? For chess: state is a combination of position on board and location of opponents Half of your transitions are controlled by you (your moves) Other half of your transitions are probabilistic (depend on opponent) For now, we assume all moves are probabilistic (probabilities unknown)

Passive Learning Agent learns by “watching” Fixed probability of moving from one state to another

Sample Results

Technique #1: Naive Updating Also known as Least Mean Squares (LMS) approach Starting at home, obtain sequence of states to terminal state Utility of terminal state = reward loop back over all other states utility for state i = running average of all rewards seen for state i

Naive Updating Analysis Works, but converges slowly Must play lots of games Ignores that utility of a state should depend on successor

Technique #2: Adaptive Dynamic Programming Utility of a state depends entirely on the successor state If a state has one successor, utility should be the same If a state has multiple successors, utility should be expected value of successors

Finding the utilities To find all utilities, just solve equations Set of linear equations, solveable Changes each iteration as you learn probabilities Completely intractable for large problems: For a real game, it means finding actual utilities of all states

Technique 3: Temporal Difference Learning Want utility to depend on successors, but want to solve iteratively Whenever you observe a transition from i to j:  = learning rate difference between successive states = temporal difference Converges faster than Naive updating

Active Learning Probability of going from one state to another now depends on action ADP equations are now:

Active Learning Active Learning with Temporal Difference Learning: works the same way (assuming you know where you’re going) Also need to learn probabilities to eventually make decision on where to go

Exploration: where should agent go to learn utilities? Suppose you’re trying to learn optimal game playing strategies Do you follow best utility, in order to win? Do you move around at random, hoping to learn more (and losing lots in the process)? Following best utility all the time can get you stuck at an imperfect solution Following random moves can lose a lot

Where should agent go to learn utilities? f(u,n) = exploration function depends on utility of move (u), and number of times that agent has tried it (n) One possibility: instead of using utility to decide where to go, use Try a move a bunch of times, then eventually settle

Q-learning Alternative approach for temporal difference learning No need to learn probabilities: considered more desirable sometimes Instead, looking for “quality” of (state, action) pair

Generalization in Reinforcement Learning Maintaining utilities for all seen states in a real game is intractable. Instead, treat it as a supervised learning problem Training set consists of (state, utility) pairs Or, alternatively, (state, action, q-value) triples Learn to predict utility from state This is a regression problem, not a classification problem Radial basis function neural networks (hidden nodes are Gaussians instead of sigmoids) Support vector machines for regression Etc…

Other applications Applies to any situation where something is to learn from reinforcement Possible examples: Toy robot dogs Petz That darn paperclip “The only winning move is not to play”