Presentation is loading. Please wait.

Presentation is loading. Please wait.

Purposive Behavior Acquisition for a real robot by vision based Reinforcement Learning Minuru Asada,Shoichi Noda, Sukoya Tawarasudia, Koh Hosoda Presented.

Similar presentations


Presentation on theme: "Purposive Behavior Acquisition for a real robot by vision based Reinforcement Learning Minuru Asada,Shoichi Noda, Sukoya Tawarasudia, Koh Hosoda Presented."— Presentation transcript:

1 Purposive Behavior Acquisition for a real robot by vision based Reinforcement Learning Minuru Asada,Shoichi Noda, Sukoya Tawarasudia, Koh Hosoda Presented by: Subarna Sadhukhan

2 Reinforced learning Vision based reinforced learning by which a robot learns to shoot a ball into a goal. Develop a method which automatically acquires strategies for this. The robot and its environment are modeled by two synchronized finite state automatons interacting in discrete time cyclical processes. Robot : senses current state and selects an action Environment makes decision to transition to a new state and generates reward back to the robot Robot learns through purposive behavior to achieve a given goal

3 Environment – Ball, Goal Robot- Mobile and has a camera Nothing about the system is known Assume robot can discriminate the set S of states and take A actions on the world

4 Q-learning Let Q*(s,a) be the expected return for taking action a in situation s. Where T(s,a,s’) be probability of transition from s to s’, r(s,a) is the reward for state-action pair s-a γ is discounting factor Since T and r are not known we can write Where r is the actual reward for taking a. s’ is the next state and α is the learning rate

5 State Set 9*27+27+9 states (3*3 of ball*3*3*3 of goal+no goal+no ball)

6 Action set Two motors Each motor – forward, stop, back 9 actions in all. State-action deviation problem- Small change near observer results in large change in image, large change far from observer small change in image

7 Learning from Early Missions Delayed reinforcement problem due to no explicit teacher signal, since reward received only after ball is kicked to the goal. r(s,a) = 1 only in goal state Construct the learning schedule so that robot can learn in easy situations at early stages and later on learn in more difficult situations – Learning from Easy missions

8 Complexity analysis K states, m possible actions Q-learning for first, for second hence LEM m*k : Get reward at each step

9 Implementing LEM Rough ordering of easy situations Small -> medium -> large (sizes of ball roughly means reaching the goal) State space is categorized into sub-states such as ball size, position and so on. n = size of state space, m = number of ordered sets Apply LEM with m ordered states takes As opposed to

10 When to shift S1 is nearest to goal, next is S2 and so on. Shifting occurs when Where Δ t indicates a time interval for number of steps to change. We suppose that the current state set S(k-1) can transit only to its neighbors

11 From previous Q-learning equation if Q converges Thus

12 LEM

13 Experiments

14


Download ppt "Purposive Behavior Acquisition for a real robot by vision based Reinforcement Learning Minuru Asada,Shoichi Noda, Sukoya Tawarasudia, Koh Hosoda Presented."

Similar presentations


Ads by Google