Https://www.youtube.com/watch?v=Qlqe1DX nJKQ#t=206 https://www.youtube.com/watch?v=Qlqe1DX nJKQ#t=206.

Slides:



Advertisements
Similar presentations
Chapter 2 B: Fitness Principles. Review  How did you do on the quiz?  Topics discussed thus far:  Health vs. Wellness  Wellness Dimensions  Benefits.
Advertisements

D1 Develop & Improve Practice Class exercise:. Exercise 1 In pairs discuss why Teaching Assistants should continuously develop and improve their practice.
Neural and Evolutionary Computing - Lecture 4 1 Random Search Algorithms. Simulated Annealing Motivation Simple Random Search Algorithms Simulated Annealing.
Basic Processes of Learning Chapter 4 Gray, Psychology, 6e Worth Publishers © 2010.
Effective Reinforcement Learning for Mobile Robots Smart, D.L and Kaelbing, L.P.
Chapter 17 The Amount and Distribution of Practice.
COSC 878 Seminar on Large Scale Statistical Machine Learning 1.
Environmental Robustness in Multi-agent Teams Terence Soule Robert B Heckendorn Presented at GECCO 09 (The Genetic and Evolutionary Computation Conference)
Reinforcement Learning Tutorial
Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)
Cooperative Q-Learning Lars Blackmore and Steve Block Expertness Based Cooperative Q-learning Ahmadabadi, M.N.; Asadpour, M IEEE Transactions on Systems,
Behaviorism Ed Tech Masters Program Summer What is behaviorism all about? Psychology is purely the study of external behavior Behavior is objective.
Reinforcement Learning: Learning to get what you want... Sutton & Barto, Reinforcement Learning: An Introduction, MIT Press 1998.
Reinforcement Learning Yishay Mansour Tel-Aviv University.
Reinforcement Learning (1)
Exploration in Reinforcement Learning Jeremy Wyatt Intelligent Robotics Lab School of Computer Science University of Birmingham, UK
Part I: Classification and Bayesian Learning
Psychology of Learning EXP4404
Behaviorist Psychology R+R- P+P- B. F. Skinner’s operant conditioning.
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
Animation. Pre-calculated Animation Do more now, less later.
Search and Planning for Inference and Learning in Computer Vision
Learning Theories Goal  How do we acquire behaviors through operant conditioning?
Avoiding Homework Hassles Fifteenth Annual Family Involvement Conference Presentation November 10, 2009 Beth Windover.
History-Dependent Graphical Multiagent Models Quang Duong Michael P. Wellman Satinder Singh Computer Science and Engineering University of Michigan, USA.
DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence.
General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning Duke University Machine Learning Group Discussion Leader: Kai Ni June 17, 2005.
Exercise 4 Soon… Grades for 1-3: Soon… ? Class presentation: Next Tuesday to final Tuesday (28 th of this month)
Introduction Random Process. Where do we start from? Undergraduate Graduate Probability course Our main course Review and Additional course If we have.
CSE-573 Reinforcement Learning POMDPs. Planning What action next? PerceptsActions Environment Static vs. Dynamic Fully vs. Partially Observable Perfect.
HSB4U Chapter 4 Anthropology – systems of land use Technology: An Agent of Social Change.
1 IE 607 Heuristic Optimization Particle Swarm Optimization.
Reinforcement Learning Presentation Markov Games as a Framework for Multi-agent Reinforcement Learning Mike L. Littman Jinzhong Niu March 30, 2004.
Clicker Question (Not marked) Why are there seasons on Earth? – A. The Earth’s orbit is elliptical, not circular. – B. The Sun is not at the center of.
 A -  B -  C -  D - Yes No Not sure.  A -  B -  C -  D - Yes No Not sure.
IMPORTANT TRAINING PRINCIPLE F.I.T.T. PRINCIPLE. DEFINITION The F.I.T.T. Principle is one of the foundations of exercise, a set of guidelines that help.
Reinforcement Learning Yishay Mansour Tel-Aviv University.
A Tutorial on the Partially Observable Markov Decision Process and Its Applications Lawrence Carin June 7,2006.
POMDPs: 5 Reward Shaping: 4 Intrinsic RL: 4 Function Approximation: 3.
Unit 5: Learning (Behaviorism)
Behaviorism Martin Valcke
Cooperative Q-Learning Lars Blackmore and Steve Block Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents Tan, M Proceedings of the.
Schedules of Reinforcement CH 17,18,19. Divers of Nassau Diving for coins Success does not follow every attempt Success means reinforcement.
Facilitated Discussions Starting the Conversation on Emergency Management Planning.
How to Successfully Prepare for Your EXAMS on Twitter and at
1/23 Intelligent Agents Chapter 2 Modified by Vali Derhami.
Particle Swarm Optimization (PSO)
Discovering Optimal Training Policies: A New Experimental Paradigm Robert V. Lindsey, Michael C. Mozer Institute of Cognitive Science Department of Computer.
FITT P RINCIPLE AND SMART GOALS Lesson 6 August 31 st, 2010.
Chapter 9 Sampling Distributions 9.1 Sampling Distributions.
Reinforcement Learning  Basic idea:  Receive feedback in the form of rewards  Agent’s utility is defined by the reward function  Must learn to act.
Reinforcement Learning for 3 vs. 2 Keepaway P. Stone, R. S. Sutton, and S. Singh Presented by Brian Light.
1 Passive Reinforcement Learning Ruti Glick Bar-Ilan university.
Learning Chapter 9.
Important training principle
Mr. Koch Psychology Andover High School
Lord of the Flies Island Map
Reinforcement Learning (1)
Reinforcement Learning in POMDPs Without Resets
Reinforcement learning (Chapter 21)
Basics of Motion Generation
Reinforcement learning
Reinforcement Learning
Reinforcement Learning
Reinforcement learning
When Should I Begin Balance Training?
Reinforcement Nisheeth 18th January 2019.
How to Successfully Prepare for Your EXAMS
Reinforcement Learning (2)
Reinforcement Learning (2)
Presentation transcript:

nJKQ#t=206 nJKQ#t=206

Survey What were the best things about Matt making you lead one class discussion? Good for presentation skills: 4 Good for presenter to learn a topic in depth: 5 Nice to see a selection of different topics: 1

Survey What were the worst things? Some presenters were not prepared enough: 2 Some topics were too advanced/technical for the point in the class or were hard to follow: 4

Survey Do you think Matt should do this next year? Yes: 7 Maybe: 1 Sure, why not?: 1 No: 0

Survey If Matt does it next year, how could he change it? Require handout with takeaways and resources: 1 Require more discussion / engagement: 3 Better scheduling: avoid jumping around, avoid duplication (or group together): 2 Be clear about using examples from exercises: 1 Give a short presentation to Matt a week before. This will improve presentations and get more suggested readings: 1

– 93e3-2f110c2c6f07.srv 93e3-2f110c2c6f07.srv

s in S a in A T(s,a,s’) = Pr(s’ | s,a) R(s,a) o in Ω O(s’,a,o) = Pr(o | s’, a)

MMS/tutorialAAMAS11.pdf MMS/tutorialAAMAS11.pdf

Behaviorist Psychology R+R- P+P- B. F. Skinner’s operant conditioning

Behaviorist Psychology R+R- P+P- B. F. Skinner’s operant conditioning / behaviorist shaping Reward schedule – Frequency – Randomized

In topology, two continuous functions from one topological space to another are called homotopic if one can be "continuously deformed" into the other, such a deformation being called a homotopy between the two functions. What does shaping mean for computational reinforcement learning? T. Erez and W. D. Smart, 2008

According to Erez and Smart, there are (at least) six ways of thinking about shaping from a homotopic standpoint. What can you come up with?

Modify Reward: successive approximations, subgoaling Dynamics: physical properties of environment or of agent: changing length of pole, changing amount of noise Internal parameters: simulated annealing, schedule for learning rate, complexification in NEAT Initial state: learning from easy missions Action space: infants lock their arms, abstracted (simpler) space Extending time horizon: POMDPs, decrease discount factor, pole balancing time horizon

The infant development timeline and its application to robot shaping. J. Law+, 2011 Bio-inspired vs. Bio-mimicking vs. Computational understanding Robot clicker training

Reward Shaping Potential-based shaping: A. Ng, ’99 Shaping for transfer learning: G. Konidaris, 2006 Multi-agent shaping: S. Devlin, 201 Plan-based reward shaping: M. Grez, 2008