Presentation is loading. Please wait.

Presentation is loading. Please wait.

Staffan Järn.  Intelligent learning algortithm  Doesn’t require the presence of a teacher  The algorithm is given a reward (a reinforcement) for good.

Similar presentations


Presentation on theme: "Staffan Järn.  Intelligent learning algortithm  Doesn’t require the presence of a teacher  The algorithm is given a reward (a reinforcement) for good."— Presentation transcript:

1 Staffan Järn

2  Intelligent learning algortithm  Doesn’t require the presence of a teacher  The algorithm is given a reward (a reinforcement) for good actions  The algortithm tries to figure out what is the best action to take in a given state, without knowing the final optimal solution.  The actions are based on rewards and penalties.

3  Robot control  Elevator scheduling (search for patterns)  Telecommunications (finding networks)  Games (Chess, Backgammon)  Financial trading

4  Gridworld (4 x 12)  The walker (agent) is supposed to find the shortest or safest way to the finish, without falling into the cliff (blue area)  Falling into to cliff gives 100 penalty points, and the walker has to start over again

5 Q-learning algorithm  Matrix, called the Q-matrix  48 x 4 matrix (12x4 gridworld) x 4 (four directions)  The Q-matrix contains a ”price” for taking a certain action  Initialized randomly in the beginning  The walker has two options: Take the optimal action, according to smallest Q-value Explore the gridworld by taking a random step (cannot walk into the wall)  Q-value is updated according to the equation every time the walker takes an action

6  The new value in the Q-matrix for the previous state and taking the previously taken action will be updated based on: what it was before multiplied by (1-α), plus a factor (alfa) multiplied by the sum of the cost to take a step (usually 1, cliff 100) and another factor (gamma) multiplied by the best action the walker can take (optimal action) New valuePrevious step Best action Sum of the cost Alfa = learning factorGamma = reward factor

7 SARSA-algorithm  Another way of updating the Q-matrix  Not based on the next optimal move, but on the next actual move  Means that it will take into account the risk of falling into the cliff, and will eventually arrive at a safer path   Longer, but safer path

8

9 Fig 1) Q-learning, the 100-th walk Fig 2) Q-learning, optimal solution Fig 3) SARSA, the 100-th walkFig 4) SARSA, optimal solution

10 Random steps over the cliff

11

12  Reinforcement Learning (pdf), Jonas Waller [2005]  Cliffwalker program, Jonas Waller [2005]  Reinforcement Learning, An Introduction. Sutton and Barto


Download ppt "Staffan Järn.  Intelligent learning algortithm  Doesn’t require the presence of a teacher  The algorithm is given a reward (a reinforcement) for good."

Similar presentations


Ads by Google