Download presentation
Presentation is loading. Please wait.
1
A Comparison of Learning Algorithms on the ALE
Deep Learning Behzad Ghazanfari
2
Useful and helpful Ref. The ALE: an Evaluation Platform for General Agents Playing Atari with Deep Reinforcement Learning Reinforcement learning: an introduction
3
RL Trial and Error Learn, interact, adapt in complex agents
Generalization Learning a reusable, high level understanding from world General competency : tasks and domains The curse of dimensionality (high dimension) Hand crafted features Online and offline policies
4
RL methods TD(λ): eligibility traces Dynamic programming (model based)
Value iteration Policy iteration Monte Carlo (learning from experiences) TD Q-learning SARSA Actor-critic R learning TD(λ): eligibility traces accumulating Replacing trees
5
Q learning and Sarsa
6
R-learning and Actor-Critic
7
TD(λ)
16
ALE ALE is a wrapper around Stella emulator for the 61 Atari 2600.
Learning from raw video data Ale games are varied ALE provides different challenges than classical testbeds ALE has 18 actions, 5 Basic actions(4 movements and No OP) ALE 210*160 pixel each pixel a color value(7 bits) ALE support a reduced color space, SECAM; the paper use Mapping 128 to 8 colors It encode screen with a courser grid, with a resolution 14*16 Background subtracting
18
Exploration Epsilon greedy: depending the game and situation
Online methods are more sensitive(reduced) Dependent to the problem and generally extracted by testing Softmax policy (simulate annealing) Scalar value need to set(temperature) Too sensitive (fine tuned ) Result is not comparable Optimistic initializations Encourage explorations but non linear function approximations, the value decreased even they have not been seen
20
Learning algorithms Eligibility traces SARSA(λ)
21
Learning algorithms Q(λ): death can be created as a result of random actions, better policy, diverge with function approximations ETTR(λ): It has the advantage of potentially being easier to learn, as it gets a non-noisy signal whenever it actually reaches a positive reward. The disadvantage is a lack of long term planning and poorer risk-aversion.
22
R(λ) Another class of reinforcement learning agents seek to optimize the expected reward per time step instead. R-learning is the primary example of such a method in the off-policy case
23
Learning algorithms Actor-Critic GQ(λ) : gradient temporal difference
30
Convergance The percentage of converged trails out of those that finished for each method was: SARSA: 85%; AC: 80%; ETTR: 84%; GQ: 80%; Q: 82%; R: 85%.
31
DQN in ALE General competency in a variety of tasks and domains without the need for domain-specific tailoring. learning a reusable, high-level understanding of the world from raw sensory data Achieved performance comparable to a human problematic aspects of DQN’s evaluation make it difficult to fully interpret the results. DQN experiments exploit non-standard game-specific prior information and also report only one independent trial per game. What properties were most important to its success?
32
Question
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.