Encoding Robotic Sensor States for Q-Learning using the Self-Organizing Map Gabriel J. Ferrer Department of Computer Science Hendrix College.

Slides:



Advertisements
Similar presentations
Viktor Zhumatiya, Faustino Gomeza,
Advertisements

NEURAL NETWORKS Backpropagation Algorithm
Hadi Goudarzi and Massoud Pedram
Artificial Intelligence 13. Multi-Layer ANNs Course V231 Department of Computing Imperial College © Simon Colton.
Effective Reinforcement Learning for Mobile Robots Smart, D.L and Kaelbing, L.P.
Kohonen Self Organising Maps Michael J. Watts
Unsupervised Networks Closely related to clustering Do not require target outputs for each input vector in the training data Inputs are connected to a.
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Reinforcement Learning & Apprenticeship Learning Chenyi Chen.
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
UNIVERSITY OF JYVÄSKYLÄ Yevgeniy Ivanchenko Yevgeniy Ivanchenko University of Jyväskylä
Apprenticeship Learning by Inverse Reinforcement Learning Pieter Abbeel Andrew Y. Ng Stanford University.
Integrating POMDP and RL for a Two Layer Simulated Robot Architecture Presented by Alp Sardağ.
Three kinds of learning
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
A Multi-Agent Learning Approach to Online Distributed Resource Allocation Chongjie Zhang Victor Lesser Prashant Shenoy Computer Science Department University.
Introduction to Neural Networks John Paxton Montana State University Summer 2003.
November 24, 2009Introduction to Cognitive Science Lecture 21: Self-Organizing Maps 1 Self-Organizing Maps (Kohonen Maps) In the BPN, we used supervised.
Neural Networks Lecture 17: Self-Organizing Maps
A Hybrid Self-Organizing Neural Gas Network James Graham and Janusz Starzyk School of EECS, Ohio University Stocker Center, Athens, OH USA IEEE World.
Lecture 09 Clustering-based Learning
Kunstmatige Intelligentie / RuG KI Reinforcement Learning Sander van Dijk.
Radial Basis Function Networks
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
Introduction to Monte Carlo Methods D.J.C. Mackay.
 C. C. Hung, H. Ijaz, E. Jung, and B.-C. Kuo # School of Computing and Software Engineering Southern Polytechnic State University, Marietta, Georgia USA.
Computer Science Lego Robotics Lab 07 Page 51. CS Lego Robotics Lab 07 (Updated ) Objectives: 1.Extend the Lego robot with three sensors. 2.Program.
Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Student : Sheng-Hsuan Wang Department.
Reinforcement Learning
REINFORCEMENT LEARNING LEARNING TO PERFORM BEST ACTIONS BY REWARDS Tayfun Gürel.
Self Organizing Maps (SOM) Unsupervised Learning.
CZ5225: Modeling and Simulation in Biology Lecture 5: Clustering Analysis for Microarray Data III Prof. Chen Yu Zong Tel:
Self-organizing map Speech and Image Processing Unit Department of Computer Science University of Joensuu, FINLAND Pasi Fränti Clustering Methods: Part.
Artificial Neural Networks Dr. Abdul Basit Siddiqui Assistant Professor FURC.
Community Architectures for Network Information Systems
NEURAL NETWORKS FOR DATA MINING
Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.
Final Presentation.  Software / hardware combination  Implement Microsoft Robotics Studio  Lego NXT Platform  Flexible Platform.
3D polygonal meshes watermarking using normal vector distributions Suk-Hawn Lee, Tae-su Kim, Byung-Ju Kim, Seong-Geun Kwon.
Neural Networks - Lecture 81 Unsupervised competitive learning Particularities of unsupervised learning Data clustering Neural networks for clustering.
Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama1)2) Hirotaka Hachiya1)2) Christopher Towell2) Sethu.
PHE YEONG KIANG A Introduction For this course LMCK1531 KEPIMPINAN & KREATIVITI, I will talk about what I've try to do in the Robot Soccer's Club.
Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.
Learning to Navigate Through Crowded Environments Peter Henry 1, Christian Vollmer 2, Brian Ferris 1, Dieter Fox 1 Tuesday, May 4, University of.
CUHK Learning-Based Power Management for Multi-Core Processors YE Rong Nov 15, 2011.
Semiconductors, BP&A Planning, DREAM PLAN IDEA IMPLEMENTATION.
The George Washington University Electrical & Computer Engineering Department ECE 002 Dr. S. Ahmadi Class3/Lab 2.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Adaptive FIR Neural Model for Centroid Learning in Self-Organizing.
Self Organizing Maps: Clustering With unsupervised learning there is no instruction and the network is left to cluster patterns. All of the patterns within.
Lecture 14, CS5671 Clustering Algorithms Density based clustering Self organizing feature maps Grid based clustering Markov clustering.
ROBOTC Software EV3 Robot Workshop
Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02.
Goal Finding Robot using Fuzzy Logic and Approximate Q-Learning
ROBOTC Software EV3 Robot Workshop Lawrence Technological University.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Abstract LSPI (Least-Squares Policy Iteration) works well in value function approximation Gaussian kernel is a popular choice as a basis function but can.
REINFORCEMENT LEARNING Unsupervised learning 1. 2 So far ….  Supervised machine learning: given a set of annotated istances and a set of categories,
Computational Intelligence: Methods and Applications Lecture 9 Self-Organized Mappings Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Machine Learning Supervised Learning Classification and Regression
Chapter 5 Unsupervised learning
Self-Organizing Network Model (SOM) Session 11
Other Applications of Energy Minimzation
DataMining, Morgan Kaufmann, p Mining Lab. 김완섭 2004년 10월 27일
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
Artificial Neural Networks
Unsupervised Networks Closely related to clustering
Presentation transcript:

Encoding Robotic Sensor States for Q-Learning using the Self-Organizing Map Gabriel J. Ferrer Department of Computer Science Hendrix College

Outline Statement of Problem Q-Learning Self-Organizing Maps Experiments Discussion

Statement of Problem Goal  Make robots do what we want  Minimize/eliminate programming Proposed Solution: Reinforcement Learning  Specify desired behavior using rewards  Express rewards in terms of sensor states  Use machine learning to induce desired actions Target Platform  Lego Mindstorms NXT

Robotic Platform

Experimental Task Drive forward Avoid hitting things

Q-Learning Table of expected rewards (“Q-values”)‏  Indexed by state and action Algorithm steps  Calculate state index from sensor values  Calculate the reward  Update previous Q-value  Select and perform an action Q(s,a) = (1 - α) Q(s,a) + α (r + γ max(Q(s',a)))‏

Certain sensors provide continuous values  Sonar  Motor encoders Q-Learning requires discrete inputs  Group continuous values into discrete “buckets”  [Mahadevan and Connell, 1992] Q-Learning produces discrete actions  Forward  Back-left/Back-right Q-Learning and Robots

Creating Discrete Inputs Basic approach  Discretize continuous values into sets  Combine each discretized tuple into a single index Another approach  Self-Organizing Map  Induces a discretization of continuous values  [Touzet 1997] [Smith 2002]

Self-Organizing Map (SOM)‏ 2D Grid of Output Nodes  Each output corresponds to an ideal input value  Inputs can be anything with a distance function Activating an Output  Present input to the network  Output with the closest ideal input is the “winner”

Applying the SOM Each input is a vector of sensor values  Sonar  Left/Right Bump Sensors  Left/Right Motor Speeds Distance function is sum-of-squared-differences

SOM Unsupervised Learning Present an input to the network Find the winning output node Update ideal input for winner and neighbors –weight ij = weight ij + (α * (input ij – weight ij )) Neighborhood function

Experiments Implemented in Java (LeJOS 0.85) Each experiment  240 seconds (800 Q-Learning iterations)  36 States  Three actions Both motors forward Left motor backward, right motor stopped Left motor stopped, right motor backward

Rewards Either bump sensor pressed: 0.0 Base reward:  1.0 if both motors are going forward  0.5 otherwise Multiplier:  Sonar value greater than 20 cm: 1  Otherwise, (sonar value) / 20

Parameters Discount (γ): 0.5 Learning rate (α):  1/(1 + (t/100)), t is the current iteration (time step)  Used for both SOM and Q-Learning [Smith 2002] Exploration/Exploitation  Epsilon = α/4  Probability of random action Selected using weighted distribution

Experimental Controls Q-Learning without SOM Qa States  Current action (1-3)  Current bumper states  Quantized sonar values (0-19 cm; 20-39; 40+) Qb States  Current bumper states  Quantized sonar values (9) (0-11 cm…; 84-95; 96+)

SOM Formulations 36 Output Nodes Category “a”:  Length-5 input vectors  Motor speeds, bumper values, sonar value Category “b”:  Length-3 input vectors  Bumper values, sonar value All sensor values normalized to [0-100]

SOM Formulations QSOM  Based on [Smith 2002]  Gaussian Neighborhood Neighborhood size is one-half SOM width QT  Based on [Touzet 1997]  Learning rate is fixed at 0.9  Neighborhood is immediate Manhattan neighbors Neighbor learning rate is 0.4

Quantitative Results QaQbQSOMaQSOMbQTaQTb Mean StDv Median Min Max Mean/It StDv/It

Qualitative Results QSOMa  Motor speeds ranged from 2% to 50%  Sonar values stuck between 90% and 94% QSOMb  Sonar values range from 40% to 95%  Best two runs arguably the best of the bunch Very smooth SOM values in both cases

Qualitative Results QTa  Sonar values ranged from 10% to 100%  Still a weak performer on average  Best performer similar to QTb QTb  Developed bump-sensor oriented behavior  Made little use of sonar Highly uneven SOM values in both cases

Experimental Area

First Movie QSOMb Strong performer (Reward: ) Minimum sonar value: 43.35% (110 cm)

Second Movie Also QSOMb Typical bad performer (Reward: 451.6)  Learns to avoid by always driving backwards  Baseline “not-forward” reward: Minimum sonar value: 57.51% (146 cm)  Hindered by small filming area

Discussion Use of SOM on NXT can be effective  More research needed to address shortcomings Heterogeneity of sensors is a problem  Need to try NXT experiments with multiple sonars  Previous work involved homogeneous sensors Approachable by undergraduate students  Technique taught in junior/senior AI course