Outline 1) Goal-Directed Feature Learning (Weber & Triesch, IJCNN 2009)‏ Task 4.1 Visual processing based on feature abstraction 2) Emergence of Disparity.

Slides:



Advertisements
Similar presentations
RL for Large State Spaces: Value Function Approximation
Advertisements

Supervised and Unsupervised learning and application to Neuroscience Cours CA6b-4.
On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber FIGSS talk, FIAS, 20 th April 2009.
Michigan State University1 Visual Attention and Recognition Through Neuromorphic Modeling of “Where” and “What” Pathways Zhengping Ji Embodied Intelligence.
Journal club Marian Tsanov Reinforcement Learning.
RBF Neural Networks x x1 Examples inside circles 1 and 2 are of class +, examples outside both circles are of class – What NN does.
Artificial Neural Networks ML Paul Scheible.
Chapter 8: Generalization and Function Approximation pLook at how experience with a limited part of the state set be used to produce good behavior over.
Radial Basis Functions
Self-Organizing Hierarchical Neural Network
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
From Motor Babbling to Planning Cornelius Weber Frankfurt Institute for Advanced Studies Goethe University Frankfurt, Germany ICN Young Investigators’
On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber, FIAS presented at Honda HRI, Offenbach, 17 th March 2009.
EE141 1 Broca’s area Pars opercularis Motor cortexSomatosensory cortex Sensory associative cortex Primary Auditory cortex Wernicke’s area Visual associative.
Project funded by the Future and Emerging Technologies arm of the IST Programme FET-Open scheme Neural Robot Control Cornelius Weber Hybrid Intelligent.
Studies on Goal-Directed Feature Learning Cornelius Weber, FIAS presented at: “Machine Learning Approaches to Representational Learning and Recognition.
From Exploration to Planning Cornelius Weber and Jochen Triesch Frankfurt Institute for Advanced Studies Goethe University Frankfurt, Germany 18 th International.
Goal-Directed Feature and Memory Learning Cornelius Weber Frankfurt Institute for Advanced Studies (FIAS) Sheffield, 3 rd November 2009 Collaborators:
Self-organizing Learning Array based Value System — SOLAR-V Yinyin Liu EE690 Ohio University Spring 2005.
Radial-Basis Function Networks
Unsupervised learning
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
 The most intelligent device - “Human Brain”.  The machine that revolutionized the whole world – “computer”.  Inefficiencies of the computer has lead.
Backpropagation An efficient way to compute the gradient Hung-yi Lee.
1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 11: Temporal Difference Learning (cont.), Eligibility Traces Dr. Itamar Arel College.
Evaluation Function in Game Playing Programs M1 Yasubumi Nozawa Chikayama & Taura Lab.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
“When” rather than “Whether”: Developmental Variable Selection Melissa Dominguez Robert Jacobs Department of Computer Science University of Rochester.
A note about gradient descent: Consider the function f(x)=(x-x 0 ) 2 Its derivative is: By gradient descent. x0x0 + -
Reinforcement Learning (RL) Consider an “agent” embedded in an environmentConsider an “agent” embedded in an environment Task of the agentTask of the agent.
© D. Weld and D. Fox 1 Reinforcement Learning CSE 473.
Verve: A General Purpose Open Source Reinforcement Learning Toolkit Tyler Streeter, James Oliver, & Adrian Sannier ASME IDETC & CIE, September 13, 2006.
Neural Networks Chapter 7
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
Reinforcement Learning with Laser Cats! Marshall Wang Maria Jahja DTR Group Meeting October 5, 2015.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Implementation of a Biologically Inspired Stereoscopic Vision Model in C+CUDA Most of the depth perception processing is done in the visual cortex, mainly.
Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo.
Deep Learning and Deep Reinforcement Learning. Topics 1.Deep learning with convolutional neural networks 2.Learning to play Atari video games with Deep.
1 ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 12: Generalization and Function Approximation Dr. Itamar Arel College of Engineering.
Deep Learning Overview Sources: workshop-tutorial-final.pdf
1 Neural Networks MUMT 611 Philippe Zaborowski April 2005.
Symbolic Reasoning in Spiking Neurons: A Model of the Cortex/Basal Ganglia/Thalamus Loop Terrence C. Stewart Xuan Choo Chris Eliasmith Centre for Theoretical.
Reinforcement Learning
Convolutional Neural Network
Summary of “Efficient Deep Learning for Stereo Matching”
Deep Learning Amin Sobhani.
Reinforcement Learning (RL)
Mastering the game of Go with deep neural network and tree search
Matt Gormley Lecture 16 October 24, 2016
Goal-Directed Feature and Memory Learning
Dr. Kenneth Stanley September 6, 2006
CSE P573 Applications of Artificial Intelligence Neural Networks
Machine Learning Today: Reading: Maria Florina Balcan
Chapter 3. Artificial Neural Networks - Introduction -
Neuro-Computing Lecture 4 Radial Basis Function Network
CSE 573 Introduction to Artificial Intelligence Neural Networks
network of simple neuron-like computing elements
Sensorimotor Learning and the Development of Position Invariance
Associative Memory: A Spiking Neural Network Robotic Implementation
Neural Networks Geoff Hulten.
Representation Learning with Deep Auto-Encoder
Artificial Neural Networks
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
Attention for translation
Unsupervised Perceptual Rewards For Imitation Learning
by Khaled Nasr, Pooja Viswanathan, and Andreas Nieder
Background “Structurally dynamic” cellular automata (Ilachinski, Halpern 1987) have been shown to simulate biological functions with emergent behavior.
Volume 27, Issue 1, Pages (January 2017)
Outline Announcement Neural networks Perceptrons - continued
Presentation transcript:

Outline 1) Goal-Directed Feature Learning (Weber & Triesch, IJCNN 2009)‏ Task 4.1 Visual processing based on feature abstraction 2) Emergence of Disparity Tuning (Franz & Triesch, ICDL 2007)‏ Task 4.3 Learning of attention and vergence control 3) From Exploration to Planning (Weber & Triesch, ICANN 2008)‏ Task 6.4 Learning hierarchical world models for planning

Outline 1) Goal-Directed Feature Learning (Weber & Triesch, IJCNN 2009)‏ Task 4.1 Visual processing based on feature abstraction 2) Emergence of Disparity Tuning (Franz & Triesch, ICDL 2007)‏ Task 4.3 Learning of attention and vergence control 3) From Exploration to Planning (Weber & Triesch, ICANN 2008)‏ Task 6.4 Learning hierarchical world models for planning

reinforcement learning input s action a weights

actor go right? go left? simple input go right! complex input reinforcement learning input (state space)‏

sensory input reward action complex input scenario: bars controlled by actions, ‘up’, ‘down’, ‘left’, ‘right’; reward given if horizontal bar at specific position

need another layer(s) to pre-process complex data P(a=1) = softmax(Q s)‏a action s state I input Q weight matrix W weight matrix position of relevant bar encodes v = a Q s feature detector s = softmax(W I)‏ E = (0.9 v(s’,a’) - v(s,a)) 2 = δ 2 d Q ≈ dE/dQ = δ a s d W ≈ dE/dW = δ Q s I + ε minimize error: learning rules:

SARSA with WTA input layer

memory extension model uses previous state and action to estimate current state

RL action weights feature weights data learning the ‘short bars’ data reward action

short bars in 12x12 average # of steps to goal: 11

RL action weights feature weights input reward 2 actions (not shown)‏ data learning ‘long bars’ data

WTA non-negative weights SoftMax non-negative weights SoftMax no weight constraints

models’ background: - gradient descent methods generalize RL to several layers Sutton&Barto RL book (1998); Tesauro (1992;1995) - reward-modulated Hebb Triesch, Neur Comp 19, (2007), Roelfsema & Ooyen, Neur Comp 17, (2005); Franz & Triesch, ICDL (2007)‏ - reward-modulated activity leads to input selection Nakahara, Neur Comp 14, (2002) - reward-modulated STDP Izhikevich, Cereb Cortex 17, (2007), Florian, Neur Comp 19/6, (2007); Farries & Fairhall, Neurophysiol 98, (2007);... - RL models learn partitioning of input space e.g. McCallum, PhD Thesis, Rochester, NY, USA (1996)‏

unsupervised learning in cortex reinforcement learning in basal ganglia state space actor Doya, 1999

Discussion - may help reinforcement learning work with real-world data... real visual processing!

Outline 1) Goal-Directed Feature Learning (Weber & Triesch, IJCNN 2009)‏ Task 4.1 Visual processing based on feature abstraction 2) Emergence of Disparity Tuning (Franz & Triesch, ICDL 2007)‏ Task 4.3 Learning of attention and vergence control 3) From Exploration to Planning (Weber & Triesch, ICANN 2008)‏ Task 6.4 Learning hierarchical world models for planning

Representation of depth How to learn disparity tuned neurons in V1?

Reinforcement learning in a neural network after vergence: input at a new disparity if disparity is zero  reward

Attention-Gated Reinforcement Learning Hebbian-like weight learning: (Roelfsema, van Ooyen, 2005) ‏

Six types of tuning curves (Poggio, Gonzalez, Krause, 1988)‏ Measured disparity tuning curves

All six types of tuning curves emerge in the hidden layer! Development of disparity tuning

Discussion - requires application... use 2D images from 3D space... open question as to the implementation of the reward... learning of attention?

Outline 1) Goal-Directed Feature Learning (Weber & Triesch, IJCNN 2009)‏ Task 4.1 Visual processing based on feature abstraction 2) Emergence of Disparity Tuning (Franz & Triesch, ICDL 2007)‏ Task 4.3 Learning of attention and vergence control 3) From Exploration to Planning (Weber & Triesch, ICANN 2008)‏ Task 6.4 Learning hierarchical world models for planning

Reinforcement Learning leads to a fixed reactive system that always strives for the same goal value actor units task: in exploration phase, learn a general model to allow the agent to plan a route to any goal

Learning actor state space randomly move around the state space learn world models: ● associative model ● inverse model ● forward model

Learning: Associative Model weights to associate neighbouring states use these to find any possible routes between agent and goal

Learning: Inverse Model weights to “postdict” action given state pair use these to identify the action that leads to a desired state Sigma-Pi neuron model

Learning: Forward Model weights to predict state given state-action pair use these to predict the next state given the chosen action

Planning

goal actor units agent

Planning

Discussion - requires embedding... learn state space from sensor input... only random exploration implemented - tong... hand-designed planning phases... hierarchical models?