Advanced Science and Technology Letters Vol.82 (SoftTech 2015), pp.32-36 Hierarchical Reinforcement Learning.

Slides:

Advertisements

Similar presentations

Reinforcement Learning

Advertisements

ECG Signal processing (2)

Introduction to Neural Networks Computing

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Support Vector Machines

1 Reinforcement Learning Problem Week #3. Figure reproduced from the figure on page 52 in reference [1] 2 Reinforcement Learning Loop state Agent Environment.

Class Project Due at end of finals week Essentially anything you want, so long as it’s AI related and I approve Any programming language you want In pairs.

Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.

Planning under Uncertainty

Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Reinforcement Learning.

Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Reinforcement Learning

Bayesian Reinforcement Learning with Gaussian Processes Huanren Zhang Electrical and Computer Engineering Purdue University.

Reinforcement Learning Rafy Michaeli Assaf Naor Supervisor: Yaakov Engel Visit project’s home page at: FOR.

Application of Reinforcement Learning in Network Routing By Chaopin Zhu Chaopin Zhu.

Integrating POMDP and RL for a Two Layer Simulated Robot Architecture Presented by Alp Sardağ.

1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.

1 Kunstmatige Intelligentie / RuG KI Reinforcement Learning Johan Everts.

Ant Colony Optimization Optimisation Methods. Overview.

KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.

Reinforcement Learning (1)

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Introduction to machine learning

Radial Basis Function Networks

CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.

EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.

Attention Deficit Hyperactivity Disorder (ADHD) Student Classification Using Genetic Algorithm and Artificial Neural Network S. Yenaeng 1, S. Saelee 2.

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

Reinforcement Learning

COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.

Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field.

CONTENTS:  Introduction  What is neural network?  Models of neural networks  Applications  Phases in the neural network  Perceptron  Model of fire.

Department of Electrical Engineering, Southern Taiwan University Robotic Interaction Learning Lab 1 The optimization of the application of fuzzy ant colony.

(Particle Swarm Optimisation)

Reinforcement Learning

Reinforcement Learning 主講人：虞台文 Content Introduction Main Elements Markov Decision Process (MDP) Value Functions.

The Application of The Improved Hybrid Ant Colony Algorithm in Vehicle Routing Optimization Problem International Conference on Future Computer and Communication,

Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science ＆ Information Engineering.

Transfer in Variable - Reward Hierarchical Reinforcement Learning Hui Li March 31, 2006.

CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.

Unsupervised Mining of Statistical Temporal Structures in Video Liu ze yuan May 15,2011.

Model Minimization in Hierarchical Reinforcement Learning Balaraman Ravindran Andrew G. Barto Autonomous Learning Laboratory.

Big traffic data processing framework for intelligent monitoring and recording systems 學生 : 賴弘偉教授 : 許毅然作者 : Yingjie Xia a, JinlongChen a,b,n, XindaiLu.

Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.

Sporadic model building for efficiency enhancement of the hierarchical BOA Genetic Programming and Evolvable Machines (2008) 9: Martin Pelikan, Kumara.

Path Planning Based on Ant Colony Algorithm and Distributed Local Navigation for Multi-Robot Systems International Conference on Mechatronics and Automation.

What is Ant Colony Optimization?

Fuzzy Pattern Recognition. Overview of Pattern Recognition Pattern Recognition Procedure Feature Extraction Feature Reduction Classification (supervised)

Reinforcement Learning for Mapping Instructions to Actions S.R.K. Branavan, Harr Chen, Luke S. Zettlemoyer, Regina Barzilay Computer Science and Artificial.

Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

REINFORCEMENT LEARNING Unsupervised learning 1. 2 So far ….  Supervised machine learning: given a set of annotated istances and a set of categories,

A Presentation on Adaptive Neuro-Fuzzy Inference System using Particle Swarm Optimization and it’s Application By Sumanta Kundu (En.R.No.

Big data classification using neural network

Deep Feedforward Networks

Online Multiscale Dynamic Topic Models

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Efficient Image Classification on Vertically Decomposed Data

Efficient Image Classification on Vertically Decomposed Data

Hidden Markov Models Part 2: Algorithms

Chapter 3: The Reinforcement Learning Problem

Dr. Unnikrishnan P.C. Professor, EEE

Chapter 3: The Reinforcement Learning Problem

Chapter 3: The Reinforcement Learning Problem

Support Vector Machines

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Reinforcement Learning (2)

Reinforcement Learning (2)

Presentation transcript:

Advanced Science and Technology Letters Vol.82 (SoftTech 2015), pp Hierarchical Reinforcement Learning with Classification Algorithms Shanhong Zhu 1,2, Weipeng Dong 1, Wei Liu 2 1 School of Computer and Information Engineering,Xinxiang university, Henan, China 2 InternationalSchool of Software, Wuhan University, Wuhan, China Abstract. In autonomous system, Agent assigns to their task through interaction with the environment, using hierarchical reinforcement learning technology can help the Agent in the large, complex environment to improve learning efficiency. Through the experimental results the effectiveness of the proposed algorithm is demonstrated. The goal of this paper is to provide a basic overview for both specialists and non-specialists to how to decide a good reinforcement learning algorithm for classification. Keywords: Reinforcement learning; KNN Classification algorithm; Classifying Option 1 Introduction Reinforcement learning is put forward by Minsky for the first time in the 1950s, the problem to be solved is: an agent can sense the surrounding environment, how to choose by learning to choose reach the goal of the optimal action. When the agent is to make each activity in the environment, it will give appropriate reward or punishment, to show whether the results of the state are correct or not. For example, when agent is training for a chess game, game victory gives positive returns, but the game failed gives negative returns, the other is zero returns [1]. Agent's task is to learn from the indirect learn and delayed return, so that the subsequent action can produce the largest cumulative returns. Reinforcement learning is the mapping study of intelligent body state from the environment to the action, getting the biggest reward value of the accumulation of action from the environment. This method is different from supervised learning techniques which can inform what behavior through positive cases and counterexamples, it is to find the optimal behavior strategy through trial error. 2 KNN classification analysis K nearest neighbors (KNN) classifier is a classical supervised method in the field of machine learning, based on statistical data. In 1967, Cover and Hart had proved the KNN classification deviation approximates to optimal Bayesian classifier [2]. The ISSN: ASTL Copyright © 2015 SERSC

Advanced Science and Technology Letters Vol.82 (SoftTech 2015) KNN classifier is widely used in such areas as text classification, pattern recognition, image processing and spatio-temporal classification. The decision-making of the KNN algorithm is to find test sample k nearest or most similar training samples in the feature space, and then the test sample is assigned to a majority vote of its k nearest neighbors. Essentially, it is still based on statistical method. In the real world, most of the practical data does not obey the typical model (such as Gaussian mixture, linearly separable, etc). Non parametric algorithms like KNN happen to deal with the situation. If domain classes of test sample sets has more cross or overlap, KNN is the more appropriate method. The advantages of KNN algorithm are as follows: (1)There is no explicit training phase or it is very minimal. Once the data is loaded into memory, it began to classify. This means the training phase is pretty fast. (2)The data set has been labeled, that is to o say the selected neighbors of test sample have been classified correctly. (3)Independent of any data distribution, it only needs to be adjusted or assigned an integer parameter k. The disadvantages of KNN algorithm are as follows: (1) It has a costly testing phase. The cost is in terms of both time and memory. More memory is needed to store all training data [3]-[4]. (2) It is highly dependent of number of samples. Along with the number of training samples increases, the classifier performance will be better and better. (3) Classification accuracy is affected because of the property is equivalent to the weight. (4) It is difficult to select an appropriate k value. Small or large value of k has different characteristics. ( 1  η ) Q γVη r ( s, a )  s, a ) s  atat Q (s,a ) Reinforcement learning (RL) is a kind of method to obtain the optimal strategy through the Agent to trial and error. Standard is the discrete time MDP model with a limited set of state set S and operating set A, at any moment t, the Agent, from the current state of st S, chooses next executable action A (st), to obtain the immediate reward valuert and take the value function of a state-action as the target function. The purpose of reinforcement learning is to search the optimal strategy of n* :S →A, which maximizes the value function. Q-learning is a kind of independent learning environment model proposed by Watkins, and its beauty is: value of Current state-action sums up all the needed information, to determine a discount of the total reward value of the optimal sequence which is executed: Where, γ is discount factor. In Q-learning, getting a reward signal will update Q value at a time: 1 k  t t k k  1 t t k t t  1 t 2.1 Q-learning t

 other Where, k is the number of iterations, is the learning efficiency, to control learning speed. Copyright © 2015 SERSC 33

Advanced Science and Technology Letters Vol.82 (SoftTech 2015) 2.2 Option In order to build the hierarchical learning system, we make the introduction of semi- markov process (SMDP). Between state s and execution action a has a time interval τ.Option is a macro Q-learning proposed by Sutton in 1999.Learning task is abstracted into several options, and will make these options as a special kind of "action" added to the original action focus. We use a triple (I i,π i,β i ) to denote an Option.I i S is state set for entry, π i is internal strategy, and β i is termination condition. Internal strategy includes the strategy π i a :I i →A i defined on the original action and the strategyπ i o :I i →O i defined on Option, so (Ii,πio,βi) forms the layered Option. According to the optimal strategyπ i o *, the implemented state-action value function is defined as: According to the optimal strategyπio*, the implemented state-Option value function is defined as: The Option is updated only at the end of an Option in the Option study, the corresponding update formula is: 3 The experimental simulation and analysis In order to assess the performance of the proposed method, it is tested under the environment of the taxi [5]. As shown 5 * 5 grid environment in figure 1, it contains a taxi and a passenger. Specify the location of the four special B, G, R and Y, passengers can choose any position of the four to get on the bus, and then get off at any position. The mission of the taxi is to the initial position to passengers, takes him to the destination and puts him down. Each grid position on the taxi has six original actions: North, South, East, West, Pickup and Putdown, the probability of each movement is performed at 0.8. When the taxi arrives to the initial position of passengers the Pickup action is effective, and will gain + 10 bonus to perform. At the same time when the taxi has passengers and on the target position, the Putdown actions are effective, performing bonus of + 20,but performing other actions is Copyright © 2015 SERSC

Advanced Science and Technology Letters Vol.82 (SoftTech 2015) R G y B x Fig. 1. The taxi problem results are compared with the Q-learning and SCC. Three lines in figure 2 are To reflect whether the proposed new algorithm performance is good, the experimental respectively the taxi experimental results to solve the probems by Q-learnng algorithm, SCC algorithm and the Ant-O algorithm. The ant algorithm to optimize system parameter is set tont=200,nk=10, α =0.98, ρ =0.98.Experiment has two stages, transformation process. Then based on the information transtion diagram, obtains in the initial learning stage, the Agent is traversal in the environment, and records the the interaction wit the environment, learns the optimal strategy. sub-goals. Next,the Agent uses the Option learned in the previous stage, to continue X:Laringycle Fig. 2. Algorithm performance comparison !ftTVIA* ***AWS*4004 y-StudQ SC Ant-o Y:Acivthegoal © opyrig R SC 35

Advanced Science and Technology Letters Vol.82 (SoftTech 2015) However Ant-O algorithm proposed in this paper in a relatively short period of learning, makes search faster convergence to the optimum solution, shortens the time to learn. Compared with the Q-learning algorithm, the Ant-O introduces orderly directed and weighted graph, which is arranged on the shortest path pheromone concentration. 4 Conclusion In essence KNN classification is a supervised learning method based on statistical data. Firstly, we present the general procedure of classification algorithm. There are three factors which affect classification performance, that is the number of neighbor, distance measure and decision rule. We have analyzed different Parameter of KNN, such as different distance function. Then we have summarized several important issues and recent developments of classification problems. These include building index structure to find nearest neighbor, reducing dimension of high-dimensional dataset, and being integrated into SVM or neural network. We do not discuss problems such as feature selection, dimension, class number, which are other important and difficult problem of classification. Specifically, the Option of KNN algorithm, through the combination of the ACO study method, using edge roughness find bottlenecks, makes hierarchical state space. Tasks are divided into multiple sub tasks, division of space, decrease the learning complexity, under the premise of not to affect the solution quality of the algorithm, significantly speed up the solution. Acknowledgments. This project is supported by National Natural Science Foundation of China (Grant No , ). References 1.Imsek, O., Barto, A. G.: Using relative novelty to identify useful temporal abstractions in reinforcement learning. Proc of the 21st International Conference on Machine Learning. New York: ACM Press, (2004) 2.Digney, B.: Learning hierarchical control structures for multiple tasks and changing environments. The 5th international conference on the Simulation of Adaptive Behavior, pp:321 – 330. (1998) 3.Mannor, S., Menache,I.: Dynamic abstraction in reinforcement learning via classifying. The 21st international conference on Machine learning, pp: 71 – 78. (2004), 4.Kazemitabar, S. J., Beigy, H.: Automatic discovery of subgoals in reinforcement learning using strongly connected components. ICONIP 1(5506). 829 – 834. (2008). 5.Taghizadeh, N.: Autonomous skill acquisition in reinforcement learning based on graph classifying. Islamic Republic of Iran: Sharif University of Technology, (2011). 6.Erik, D., Gilbert, P.: Scaling Ant Colony Optimization with Hierarchical Reinforcement Learning Partitioning. The 10th annual conference on Genetic and evolutionary computation.pp: (2008). 36 Copyright © 2015 SERSC