Download presentation
Presentation is loading. Please wait.
Published byMolly Wilson Modified over 6 years ago
1
Building Imitation and Self-Evolving AI in Python
─ A Demo with lasercat Welcome everybody, to a special presentation filled with kittens, and lasers, and mice-UFOs, along with machine learning and python programming. Mention the roles of team members. 1.5 min Marshall Wang, Maria Jahja, Eric Laber North Carolina State University 9/16/2016
2
You have been warned… 10 sec.
3
Road Map Motivation Introducing LaserCat
Imitation AI ─ Supervised Learning Self-Evolving AI ─ Reinforcement Learning Epic Finale and Recap 1 min
4
Why Do We Care? Because it’s cool.
Test bed for machine learning algorithms. Numerous real world problems can be formulated as sequential decision making problems: Self-driving vehicles, personalized medicine, counter-terrorism… Why should we spend any time on video games as researchers? Mention google’s atari, alpha go in the 2nd point. Describe why the examples are sequential decision making problems. 2 min.
5
LaserCat Demo Link 40 seconds .
6
Making Games with Pygame
pygame.org Tutorials: Top 10 Pygame Tutorials Key functions for building AI: 1. Extract information from the current state 2. Map information to an action 3. Pass action to the agent 1 min
7
Imitation Learning Data generated by human player.
A supervised learning problem: Predictors: Coordinates of each object. Response: Human player’s action 0.5 min
8
Data Collection and Missing Values
1.5 min
9
Hierarchical Random Forest
RF 1 RF 2 Pattern Identification Data Action RF 3 Within each category, there no missing values any more as they all have the same number of inputs. 2 min. . . .
10
Remember to emphasize that it’s test data
Remember to emphasize that it’s test data. AI is trying to predict what I would do. 40 sec. Hit 2:11 mark. .
11
From Terminator to Skynet
Pictures taken from the Terminator movie series. AI’s like terminator have their performances internally capped. True AI…get smarter and smarter and surpass humans and hopefully they can still be our friends at that point. Can we build a skynet, and can it outperform human players? 1 min.
12
Reinforcement Learning
Environment Agent state action reward Agent: Learner/Decision maker. Environment: What the agent interacts with. Reward: Numeric representation of desirability. Policy: A map from the observed state to the agent’s action. Analogy with animals: dog training, eating ice cream vs eating chili pepper. Reward could be neg. or pos. Policies could be random walk or simplistic. The goal of RL is for agent to start from a random walk policy to learn the optimal policy. Order: Key concepts, psychology, explain diff with supervised learning. 3 min.
13
LaserCat as a Reinforcement Learning Problem
Agent = Cat/AI that controls cat; environment = everything else on the screen; Pos. reward = killing UFO; Neg. reward = colliding or letting UFO escape. Explain difficulties:, Sparse reward, delayed reward, Exploration vs exploitation: Trying restaurants. 3 min.
14
Value Function Value: ice cream example, checkmate example. Remember to mention that it’s bellman equation. Mention the insufficiency of only knowing V in many situations. 3 min.
15
Q-function and Q-learning
Mention that coding the update alg is trivial. Mention exploration exploitation tradeoff again. Explain that random walk could be unstable. Explain epsilon-greedy. Briefly mention Bayesian Q-learning, starting with “The implementation for LaserCat was a variation of Q-learning, called Bayesian Q-learning. But we won’t get into that today. My brain usually shuts down upon hearing the word “Bayesian”, and I don’t want that happen to you in this short talk. But for those curious, on a high level, …” (Depending on the time, maybe state the pros of Bayesian Q’s exploration strategy). 4 min.
16
How to handle continuous states
Parametric function approximation Deep learning, splines, radial basis functions… Segmentation - Tile coding, nearest neighbors… Mention that coding gets heavier with fcn approx. Explain why it’s difficult to do tile coding with LaserCat. 2 min.
17
Mobile Tile Coding Mention the size of grid. Mention implementation trick: Integer division and shift origin. 1 min.
18
2 min. Link
19
Recap Motivation - Life is a game. Imitation Learning AI
Supervised learning problem. Use hierarchical model structure to handle missing data. Self-Evolving AI Reinforcement learning problem. Value functions. Q-learning. State segmentation with modified tile coding. 2 min.
20
Questions? Main References:
Making Games with Python & Pygame, Albert Sweigart. Reinforcement Learning: An Introduction, R. Sutton & A. Barto. Bayesian Q-learning, R. Dearden, N. Friedman & S. Russell. Contact Information: Website of my research group: LinkedIn: Search for “Longshaokan Wang”. P.S. I’m interested in 2017 summer internship opportunities. If you need a machine learning professional, better call Marshall! 1 min. Picture taken from Breaking Bad
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.