Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.

Similar presentations


Presentation on theme: "1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin."— Presentation transcript:

1 1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin

2 2 Outline Introduction Least-Squares Method for Reinforcement Learning Evolutionary Algorithms For RL Problem (in progress) Technical Analysis based upon hybrid agent-based architecture (in progress) Conclusion (Stage One)

3 3 Introduction Learning From Interaction Interact with environment Consequences of actions to achieve goals No explicit teacher but experience Examples Chess player in a game Someone prepares some food The actions of a gazelle calf after its born

4 4 Introduction Characteristics Decision making in uncertain environment Actions Affect the future situation Effects cannot be fully predicted Goals are explicit Use experience to improve performance

5 5 Introduction What to be learned Mapping from situations to actions Maximizes a scalar reward or reinforcement signal Learning Does not need to be told which actions to take Must discover which actions yield most reward by trying

6 6 Introduction Challenge Action may affect not only immediate reward but also the next situation, and consequently all subsequent rewards Trial and error search Delayed reward

7 7 Introduction Exploration and exploitation Exploit what it already knows in order to obtain reward Explore in order to make better action selections in the future Neither can be pursued exclusively without failing at the task Trade-off

8 8 Introduction Components of an agent Policy Decision-making function Reward (Total reward, Average reward, Discounted reward) Good and bad events for the agent Value Rewards in a long run Model of environment Behavior of the environment

9 9 Introduction Markov Property & Markov Decision Processes “Independence of path”:all that matters is in the current state signal A reinforcement learning task that satisfies the Markov property is called a Markov decision process, MDP Finite Markov Decision Process (MDP)

10 10 Introduction Three categories of methods for solving the reinforcement learning problem Dynamic programming Complete and accurate model of the environment A full backup operation on each state Monte Carlo methods A backup for each state based on the entire sequence of observed rewards from that state until the end of the episode Temporal-difference learning Approximate the optimal value function, and to view the approximation as an adequate guide

11 11 LS Method for Reinforcement Learning For stochastic dynamic system : Current State : Control decision generated by policy : Disturbance independently sampled from some fixed distribution is a Markov chain MDP can be denoted by a quadruple : State Set : Action Set : state transition probability : denotes the reward function : The policy is a mapping

12 12 For each policy, the value function is defined by equation: LS Method for Reinforcement Learning The optimal value function is defined by

13 13 LS Method for Reinforcement Learning The optimal action can be generated through Introducing Q value function Now the optimal action can be generated through

14 14 The exact Q-values for all state-action pairs can be obtained by solving the Bellman equations (full backups): LS Method for Reinforcement Learning or, in matrix format: denotes the transition probability from to

15 15 Traditional Q-learning LS Method for Reinforcement Learning Popular variant of temporal-difference learning to approximate Q value functions. In the absence of the model of the MDP, using sample data The temporal difference is defined as: Consider one-step Q-learning, the updated equation is:

16 16 The final decision base upon Q-learning: LS Method for Reinforcement Learning The reason for the development of approximation methods: Size of state-action space The overwhelming requirement for computation Model Approximation Policy Approximation Value Function Approximation The categories of approximation methods for Machine Learning:

17 17 Model-Free Least-Squares Q-learning LS Method for Reinforcement Learning : Basis Functions : A vector of scalar weights Linear Function Approximator

18 18 For a fixed policy LS Method for Reinforcement Learning is matrix and If the model of MDPis available

19 19 The policy LS Method for Reinforcement Learning where and If the model of MDPis not available: Model-Free Given Samples

20 20 Optimal policy can be found: LS Method for Reinforcement Learning The greedy policy is represented by the parameter and can be determined on demand for any given state.

21 21 Simulation System is hard to model but easy to simulate Implicitly indicate the features of the system in terms of the state visiting frequency Orthogonal least-squares algorithm for training an RBF network Systematic learning approach for solving center selection problem Newly added center always maximizes the amount of energy of the desired network output LS Method for Reinforcement Learning

22 22 Hybrid Least-Squares Method LS Method for Reinforcement Learning Least-Squares Policy Iteration (LSPI) algorithm Simulation & Orthogonal Least-Squares regression Environment State Action Feature Configuration Reward Optimal policy

23 23 LS Method for Reinforcement Learning

24 24 Simulation Cart-Pole System

25 25 Simulation

26 26 Conclusion(Stage One) From Reinforcement learning perspective, the intractability of solutions to sequential decision problems requires value function approximation methods At present, linear function approximators are the best alternatives as approximation architecture mainly due to their transparent structure. Model-free least squares policy iteration (LSPI) method is a promising algorithm that uses linear approximator architecture to achieve policy optimization in the spirit of Q-learning. May converge in surprising few steps Inspired by orthogonal least-squares regression method for selecting the centers of RBF neural network, a new hybrid learning method for LSPI can produce more robust and human-independent solution.


Download ppt "1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin."

Similar presentations


Ads by Google