Purposive Behavior Acquisition for a real robot by vision based Reinforcement Learning Minuru Asada,Shoichi Noda, Sukoya Tawarasudia, Koh Hosoda Presented.

Purposive Behavior Acquisition for a real robot by vision based Reinforcement Learning Minuru Asada,Shoichi Noda, Sukoya Tawarasudia, Koh Hosoda Presented by: Subarna Sadhukhan

Reinforced learning Vision based reinforced learning by which a robot learns to shoot a ball into a goal. Develop a method which automatically acquires strategies for this. The robot and its environment are modeled by two synchronized finite state automatons interacting in discrete time cyclical processes. Robot : senses current state and selects an action Environment makes decision to transition to a new state and generates reward back to the robot Robot learns through purposive behavior to achieve a given goal

Environment – Ball, Goal Robot- Mobile and has a camera Nothing about the system is known Assume robot can discriminate the set S of states and take A actions on the world

Q-learning Let Q*(s,a) be the expected return for taking action a in situation s. Where T(s,a,s’) be probability of transition from s to s’, r(s,a) is the reward for state-action pair s-a γ is discounting factor Since T and r are not known we can write Where r is the actual reward for taking a. s’ is the next state and α is the learning rate

State Set 9*27+27+9 states (3*3 of ball*3*3*3 of goal+no goal+no ball)

Action set Two motors Each motor – forward, stop, back 9 actions in all. State-action deviation problem- Small change near observer results in large change in image, large change far from observer small change in image

Learning from Early Missions Delayed reinforcement problem due to no explicit teacher signal, since reward received only after ball is kicked to the goal. r(s,a) = 1 only in goal state Construct the learning schedule so that robot can learn in easy situations at early stages and later on learn in more difficult situations – Learning from Easy missions

Complexity analysis K states, m possible actions Q-learning for first, for second hence LEM m*k : Get reward at each step

Implementing LEM Rough ordering of easy situations Small -> medium -> large (sizes of ball roughly means reaching the goal) State space is categorized into sub-states such as ball size, position and so on. n = size of state space, m = number of ordered sets Apply LEM with m ordered states takes As opposed to

When to shift S1 is nearest to goal, next is S2 and so on. Shifting occurs when Where Δ t indicates a time interval for number of steps to change. We suppose that the current state set S(k-1) can transit only to its neighbors

From previous Q-learning equation if Q converges Thus

Experiments

Purposive Behavior Acquisition for a real robot by vision based Reinforcement Learning Minuru Asada,Shoichi Noda, Sukoya Tawarasudia, Koh Hosoda Presented.

Similar presentations

Presentation on theme: "Purposive Behavior Acquisition for a real robot by vision based Reinforcement Learning Minuru Asada,Shoichi Noda, Sukoya Tawarasudia, Koh Hosoda Presented."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Purposive Behavior Acquisition for a real robot by vision based Reinforcement Learning Minuru Asada,Shoichi Noda, Sukoya Tawarasudia, Koh Hosoda Presented.

Similar presentations

Presentation on theme: "Purposive Behavior Acquisition for a real robot by vision based Reinforcement Learning Minuru Asada,Shoichi Noda, Sukoya Tawarasudia, Koh Hosoda Presented."— Presentation transcript:

Similar presentations

About project

Feedback