Presentation is loading. Please wait.

Presentation is loading. Please wait.

Outline 1) Goal-Directed Feature Learning (Weber & Triesch, IJCNN 2009)‏ Task 4.1 Visual processing based on feature abstraction 2) Emergence of Disparity.

Similar presentations


Presentation on theme: "Outline 1) Goal-Directed Feature Learning (Weber & Triesch, IJCNN 2009)‏ Task 4.1 Visual processing based on feature abstraction 2) Emergence of Disparity."— Presentation transcript:

1 Outline 1) Goal-Directed Feature Learning (Weber & Triesch, IJCNN 2009)‏ Task 4.1 Visual processing based on feature abstraction 2) Emergence of Disparity Tuning (Franz & Triesch, ICDL 2007)‏ Task 4.3 Learning of attention and vergence control 3) From Exploration to Planning (Weber & Triesch, ICANN 2008)‏ Task 6.4 Learning hierarchical world models for planning

2 Outline 1) Goal-Directed Feature Learning (Weber & Triesch, IJCNN 2009)‏ Task 4.1 Visual processing based on feature abstraction 2) Emergence of Disparity Tuning (Franz & Triesch, ICDL 2007)‏ Task 4.3 Learning of attention and vergence control 3) From Exploration to Planning (Weber & Triesch, ICANN 2008)‏ Task 6.4 Learning hierarchical world models for planning

3 reinforcement learning input s action a weights

4 actor go right? go left? simple input go right! complex input reinforcement learning input (state space)‏

5 sensory input reward action complex input scenario: bars controlled by actions, ‘up’, ‘down’, ‘left’, ‘right’; reward given if horizontal bar at specific position

6 need another layer(s) to pre-process complex data P(a=1) = softmax(Q s)‏a action s state I input Q weight matrix W weight matrix position of relevant bar encodes v = a Q s feature detector s = softmax(W I)‏ E = (0.9 v(s’,a’) - v(s,a)) 2 = δ 2 d Q ≈ dE/dQ = δ a s d W ≈ dE/dW = δ Q s I + ε minimize error: learning rules:

7 SARSA with WTA input layer

8 memory extension model uses previous state and action to estimate current state

9 RL action weights feature weights data learning the ‘short bars’ data reward action

10 short bars in 12x12 average # of steps to goal: 11

11 RL action weights feature weights input reward 2 actions (not shown)‏ data learning ‘long bars’ data

12 WTA non-negative weights SoftMax non-negative weights SoftMax no weight constraints

13 models’ background: - gradient descent methods generalize RL to several layers Sutton&Barto RL book (1998); Tesauro (1992;1995) - reward-modulated Hebb Triesch, Neur Comp 19, 885-909 (2007), Roelfsema & Ooyen, Neur Comp 17, 2176-214 (2005); Franz & Triesch, ICDL (2007)‏ - reward-modulated activity leads to input selection Nakahara, Neur Comp 14, 819-44 (2002) - reward-modulated STDP Izhikevich, Cereb Cortex 17, 2443-52 (2007), Florian, Neur Comp 19/6, 1468-502 (2007); Farries & Fairhall, Neurophysiol 98, 3648-65 (2007);... - RL models learn partitioning of input space e.g. McCallum, PhD Thesis, Rochester, NY, USA (1996)‏

14 unsupervised learning in cortex reinforcement learning in basal ganglia state space actor Doya, 1999

15

16

17 Discussion - may help reinforcement learning work with real-world data... real visual processing!

18 Outline 1) Goal-Directed Feature Learning (Weber & Triesch, IJCNN 2009)‏ Task 4.1 Visual processing based on feature abstraction 2) Emergence of Disparity Tuning (Franz & Triesch, ICDL 2007)‏ Task 4.3 Learning of attention and vergence control 3) From Exploration to Planning (Weber & Triesch, ICANN 2008)‏ Task 6.4 Learning hierarchical world models for planning

19 Representation of depth How to learn disparity tuned neurons in V1?

20 Reinforcement learning in a neural network after vergence: input at a new disparity if disparity is zero  reward

21 Attention-Gated Reinforcement Learning Hebbian-like weight learning: (Roelfsema, van Ooyen, 2005) ‏

22 Six types of tuning curves (Poggio, Gonzalez, Krause, 1988)‏ Measured disparity tuning curves

23 All six types of tuning curves emerge in the hidden layer! Development of disparity tuning

24 Discussion - requires application... use 2D images from 3D space... open question as to the implementation of the reward... learning of attention?

25 Outline 1) Goal-Directed Feature Learning (Weber & Triesch, IJCNN 2009)‏ Task 4.1 Visual processing based on feature abstraction 2) Emergence of Disparity Tuning (Franz & Triesch, ICDL 2007)‏ Task 4.3 Learning of attention and vergence control 3) From Exploration to Planning (Weber & Triesch, ICANN 2008)‏ Task 6.4 Learning hierarchical world models for planning

26 Reinforcement Learning leads to a fixed reactive system that always strives for the same goal value actor units task: in exploration phase, learn a general model to allow the agent to plan a route to any goal

27 Learning actor state space randomly move around the state space learn world models: ● associative model ● inverse model ● forward model

28 Learning: Associative Model weights to associate neighbouring states use these to find any possible routes between agent and goal

29 Learning: Inverse Model weights to “postdict” action given state pair use these to identify the action that leads to a desired state Sigma-Pi neuron model

30 Learning: Forward Model weights to predict state given state-action pair use these to predict the next state given the chosen action

31 Planning

32

33

34

35

36

37

38

39

40

41

42

43

44 goal actor units agent

45 Planning

46 Discussion - requires embedding... learn state space from sensor input... only random exploration implemented - tong... hand-designed planning phases... hierarchical models?


Download ppt "Outline 1) Goal-Directed Feature Learning (Weber & Triesch, IJCNN 2009)‏ Task 4.1 Visual processing based on feature abstraction 2) Emergence of Disparity."

Similar presentations


Ads by Google