From Motor Babbling to Planning Cornelius Weber Frankfurt Institute for Advanced Studies Goethe University Frankfurt, Germany ICN Young Investigators’

Slides:



Advertisements
Similar presentations
© Jude Shavlik 2006, David Page 2007 CS 760 – Machine Learning (UW-Madison)RL Lecture, Slide 1 Reinforcement Learning (RL) Consider an “agent” embedded.
Advertisements

NEURAL NETWORKS Backpropagation Algorithm
Ch. Eick: More on Machine Learning & Neural Networks Different Forms of Learning: –Learning agent receives feedback with respect to its actions (e.g. using.
Routing in a Parallel Computer. A network of processors is represented by graph G=(V,E), where |V| = N. Each processor has unique ID between 1 and N.
Neural Networks  A neural network is a network of simulated neurons that can be used to recognize instances of patterns. NNs learn by searching through.
-Artificial Neural Network- Counter Propagation Network
In experiments the controlled variable is the variable which the experimenter has control of. It is also called the independent variable The dependant.
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
The Antnet Routing Algorithm - A Modified Version Firat Tekiner, Z. Ghassemlooy Optical Communications Research Group, The University of Northumbria, Newcastle.
Soft computing Lecture 6 Introduction to neural networks.
On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber FIGSS talk, FIAS, 20 th April 2009.
Biological inspiration Animals are able to react adaptively to changes in their external and internal environment, and they use their nervous system to.
Analysis of Covariance Goals: 1)Reduce error variance. 2)Remove sources of bias from experiment. 3)Obtain adjusted estimates of population means.
On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber, FIAS presented at Honda HRI, Offenbach, 17 th March 2009.
Project funded by the Future and Emerging Technologies arm of the IST Programme FET-Open scheme Neural Robot Control Cornelius Weber Hybrid Intelligent.
Investigation of antnet routing algorithm by employing multiple ant colonies for packet switched networks to overcome the stagnation problem Firat Tekiner.
Start S2S2 S3S3 S4S4 S5S5 Goal S7S7 S8S8 Arrows indicate strength between two problem states Start maze … Reinforcement learning example.
Studies on Goal-Directed Feature Learning Cornelius Weber, FIAS presented at: “Machine Learning Approaches to Representational Learning and Recognition.
Goals of Adaptive Signal Processing Design algorithms that learn from training data Algorithms must have good properties: attain good solutions, simple.
From Exploration to Planning Cornelius Weber and Jochen Triesch Frankfurt Institute for Advanced Studies Goethe University Frankfurt, Germany 18 th International.
Aims To ensure that parents, students and teachers understand the pathways for students from the Science GCSE reforms. To inform parents of important.
Soar-RL: Reinforcement Learning and Soar Shelley Nason.
Fourth International Symposium on Neural Networks (ISNN) June 3-7, 2007, Nanjing, China Online Dynamic Value System for Machine Learning Haibo He, Stevens.
Goal-Directed Feature and Memory Learning Cornelius Weber Frankfurt Institute for Advanced Studies (FIAS) Sheffield, 3 rd November 2009 Collaborators:
Outline 1) Goal-Directed Feature Learning (Weber & Triesch, IJCNN 2009)‏ Task 4.1 Visual processing based on feature abstraction 2) Emergence of Disparity.
Mobile Robot ApplicationsMobile Robot Applications Textbook: –T. Bräunl Embedded Robotics, Springer 2003 Recommended Reading: 1. J. Jones, A. Flynn: Mobile.
Self-organizing Learning Array based Value System — SOLAR-V Yinyin Liu EE690 Ohio University Spring 2005.
Radial Basis Function (RBF) Networks
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
Self-Organized Recurrent Neural Learning for Language Processing April 1, March 31, 2012 State from June 2009.
Evolving a Sigma-Pi Network as a Network Simulator by Justin Basilico.
CSC 331: Algorithm Analysis Decompositions of Graphs.
INVERSE KINEMATICS ANALYSIS TRAJECTORY PLANNING FOR A ROBOT ARM Proceedings of th Asian Control Conference Kaohsiung, Taiwan, May 15-18, 2011 Guo-Shing.
OPERAS CC : An instance of a Formal Framework for MAS Modelling based on Population P Systems P.Kefalas Dept. of Computer Science CITY COLLEGE Thessaloniki,
1 Oblivious Routing in Wireless networks Costas Busch Rensselaer Polytechnic Institute Joint work with: Malik Magdon-Ismail and Jing Xi.
Collaboration Development through Interactive Learning between Human and Robot Tetsuya OGATA, Noritaka MASAGO, Shigeki SUGANO, Jun TANI.
Modelling Language Evolution Lecture 1: Introduction to Learning Simon Kirby University of Edinburgh Language Evolution & Computation Research Unit.
Reinforcement Learning (RL) Consider an “agent” embedded in an environmentConsider an “agent” embedded in an environment Task of the agentTask of the agent.
CHECKERS: TD(Λ) LEARNING APPLIED FOR DETERMINISTIC GAME Presented By: Presented To: Amna Khan Mis Saleha Raza.
Motor Control. Beyond babbling Three problems with motor babbling: –Random exploration is slow –Error-based learning algorithms are faster but error signals.
Akram Bitar and Larry Manevitz Department of Computer Science
A Neural Model for the Adaptive Control of Saccadic Eye Movements Sohrab Saeb, Cornelius Weber and Jochen Triesch International Joint Conference on Neural.
Neural Networks Chapter 7
Learning to Navigate Through Crowded Environments Peter Henry 1, Christian Vollmer 2, Brian Ferris 1, Dieter Fox 1 Tuesday, May 4, University of.
Career Planning What I Like to Do…... Learning Targets I can name the job that I want when I grow up. I can explain why I want this job. I can understand.
Introduction to Artificial Intelligence CS 438 Spring 2008 Today –AIMA, Ch. 25 –Robotics Thursday –Robotics continued Home Work due next Tuesday –Ch. 13:
Chapter 1. Cognitive Systems Introduction in Cognitive Systems, Christensen et al. Course: Robots Learning from Humans Park, Sae-Rom Lee, Woo-Jin Statistical.
Heuristic Methods for Topological Design of Telecommunication Networks Andrzej Mysłek, Piotr Karaś Institute of Telecommunications Warsaw University of.
Career Path Assignment Put Your Name Here. Career #1 Put the name of the career here.
Scatter Plots. Scatter plots are used when data from an experiment or test have a wide range of values. You do not connect the points in a scatter plot,
Shortcomings of Traditional Backtrack Search on Large, Tight CSPs: A Real-world Example Venkata Praveen Guddeti and Berthe Y. Choueiry The combination.
Particle Filter for Robot Localization Vuk Malbasa.
Mark Dorman Separation Of Charged Current And Neutral Current Events In The MINOS Far Detector Using The Hough Transform Mark Dorman 16/12/04.
1 ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 12: Generalization and Function Approximation Dr. Itamar Arel College of Engineering.
Goal-Directed Feature and Memory Learning
Neural Networks A neural network is a network of simulated neurons that can be used to recognize instances of patterns. NNs learn by searching through.
Assignment Help to Explore Importance of Management Principles for an Organization
Firat Tekiner (Phd Student) Z. Ghassemlooy
Introduction to particle filter
Figure 1. Random processes produce apparent cycles
Announcements Homework 3 due today (grace period through Friday)
إستراتيجيات ونماذج التقويم
of the Artificial Neural Networks.
Introduction to particle filter
N-Gram Model Formulas Word sequences Chain rule of probability
Backbone Traffic Engineering
Hill Cipher The Hill Cipher uses matrix multiplication to encrypt a message. First, you need to assign two numbers to each letter in the alphabet and also.
Changing the paradigm in forwarding : How transform daemons to angels?
ATCA Wideband Continuum Operation
Presentation transcript:

From Motor Babbling to Planning Cornelius Weber Frankfurt Institute for Advanced Studies Goethe University Frankfurt, Germany ICN Young Investigators’ Colloquium 26 th June 2008, Frankfurt am Main

Reinforcement Learning valueactor units fixed reactive system that always strives for the same goal Trained Weights

reinforcement learning does not use the exploration phase to learn a general model of the environment that would allow the agent to plan a route to any goal so let’s do this

Learning actor state space randomly move around the state space learn world models: ● associative model ● inverse model ● forward model variables: ► action ► current state ► next state

Learning: Associative Model weights to associate neighbouring states use these to find any possible routes between agent and goal

Learning: Inverse Model weights to “postdict” action given state pair use these to identify the action that leads to a desired state Sigma-Pi neuron model

Learning: Forward Model weights to predict state given state-action pair use these to predict the next state given the chosen action

Planning

goal actor units agent

Planning

Discussion - AI context... assumed links explained by learning - reinforcement learning... if no access to full state space - noise... wide “goal hills” will have flat slopes - shortest path... not taken; how to define? - biological plausibility... Sigma-Pi neurons; winner-take-all - to do: embedding... learn state space from sensor input - to do: embedding... let the goal be assigned naturally - to do: embedding... hand-designed planning phases

Acknowledgments Collaborators: Jochen Triesch FIAS J-W-Goethe University Frankfurt Stefan Wermter University of Sunderland Mark Elshaw University of Sheffield