Reinforcement Learning. The study of thinking. 1) Problem-Solving 2) Reasoning.

Slides:



Advertisements
Similar presentations
Constraint Satisfaction Problems
Advertisements

Artificial Intelligence: Knowledge Representation
RoboParade Workshop II CJ Chung, Ph.D. Founder & Director of Robofest and RoboParade Brought to you by a grant from.
Hash Tables.
Presentation on Artificial Intelligence
Problem Solving and Algorithm Design
Restructuring Problems. Some aspects of problems are not solved through a gradual search process. The problem may be solved suddenly by ‘seeing’ the problem.
Helping Families Promote Children’s Social Emotional Competence Based on materials from Center for Social Emotional Foundation of Early Learning (CSEFEL)
S1 Versus S2 Thinking GET READY. 27 X 34.
When teacher acts as controllers they are in charge of the class and of the activity taking place in a way that is.
The Design Process Where do consumer products begin?
Dr. Rania Zaini December  Students are expected to: Understand the nature of memory Utilize techniques to improve memory Develop their memory curves.
Cognitive Learning Objective: Describe how conditioning has a cognitive component through notes and discussion.
Welcome To Mr. Schammann’s Class Computers and Advanced Computers.
Operant Conditioning Operant conditioning - the learning of voluntary behavior through the effects of pleasant and unpleasant consequences to responses.
Introduction to Psychology, 7th Edition, Rod Plotnik Module 9: Classical Conditioning Module 9 Classical Conditioning.
CS 357 – Intro to Artificial Intelligence  Learn about AI, search techniques, planning, optimization of choice, logic, Bayesian probability theory, learning,
Computer Simulation (1). 8 A (8) Initial State 44 Goal State B (5) C (3) A (8) B (5) C (3) Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7.
Learning, Motivation and Performance
Cognitive Processes PSY 334 Chapter 8 – Problem Solving.
Kunstmatige Intelligentie / RuG KI Reinforcement Learning Sander van Dijk.
Computational Fluency Flexible & Accessible Strategies for Multi-digit Addition and Subtraction Math AllianceMarch 30, 2010 Beth Schefelker and DeAnn Huinker.
UNIT 9. CLIL THINKING SKILLS
Theories relating to learning movement skills. Connectionist and associationist theories Depend on linking a stimulus to a response This S-R bond is stored.
CREATIVITYCREATIVITY Standard 1: Objective 2 “Creativity requires the courage to let go of certainties.” - Erich.
Cognitive Level of Analysis. What is Cognition? Cognitive LoA is new to psychology (40-50 years) Important way to look at your life – important to be.
Wolfgang Kohler The Foundations of Gesaltism. Introduction to Kohler Kohler was born in Estonia, and earned his Ph.D from the University of Berlin in.
Illuminating Computer Science CCIT 4-6Sep
Learning Part II. Overview Habituation Classical conditioning Instrumental/operant conditioning Observational learning.
Andrew H. Fagg: Symbiotic Computing Laboratory 1.
Problem Solving and Mazes
Psychology: memory. Overview An understanding of human memory is critical to an appreciation of how users will store and use relevant information when.
March 16 & 21, Csci 2111: Data and File Structures Week 9, Lectures 1 & 2 Indexed Sequential File Access and Prefix B+ Trees.
Psychology of Thinking: Embedding Artifice in Nature.
Integrating Background Knowledge and Reinforcement Learning for Action Selection John E. Laird Nate Derbinsky Miller Tinkerhess.
E-learning: The Science of Instruction Ruth Colvin Clark and Richard E Mayer Today we’ll cover: Chapter 1: e-learning: promise and pitfalls Chapter 2:
Learning Styles Test Unlock Your Learning Potential!
Module 10 Operant & Cognitive Approaches. Thorndike’s Law of Effect l Behaviors followed by positive consequences are strengthened while behaviors followed.
How to investigate the Mind? n Ask your subjects (Introspectionism) n First-Person Privileged Access.
Copyright © 2010, Pearson Education Inc., All rights reserved.  Prepared by Katherine E. L. Norris, Ed.D.  West Chester University of Pennsylvania This.
“To be or not to be, that is the question. Whether 'tis nobler in the mind to suffer the slings and arrows of outrageous fortune, or take arms against.
Psychology: Chapter 1, Section 1
Course Overview  What is AI?  What are the Major Challenges?  What are the Main Techniques?  Where are we failing, and why?  Step back and look at.
Chess Strategies Component Skills Strategies Prototype Josh Waters, Ty Fenn, Tianyu Chen.
Behavioral Learning Theory : Pavlov, Thorndike & Skinner M. Borland E.P. 500 Dr. Mayton Summer 2007.
RULES Patty Nordstrom Hien Nguyen. "Cognitive Skills are Realized by Production Rules"
Reinforcement Learning AI – Week 22 Sub-symbolic AI Two: An Introduction to Reinforcement Learning Lee McCluskey, room 3/10
CREATIVITYCREATIVITY Standard 1: Objective 2 “Creativity requires the courage to let go of certainties.” - Erich.
Psychology - Warm Up April 10, 2012 Senior Exit Projects - Metacognition If you have not chosen your senior exit topic, please choose chose one for this.
Information Processing Development of Memory & Thought.
Cognitive Level of Analysis Unit 3. Cognition The mental act or process by which knowledge is acquired.
How to investigate Perception & Cognition n Ask your subjects (Introspectionism) n Look at S-R patterns (Behaviorism) n Infer mental processes (Cognitive.
Problem Solving PERTEMUAN Early research on problem- solving A cat placed in a box with a trapdoor was not observed to show behaviour approximating.
Deriving Consistency from LEGOs What we have learned in 6 years of FLL by Austin and Travis Schuh © 2005 Austin and Travis Schuh, all rights reserved.
Thought & Problem Solving Tell a partner: What are you thinking about?
The normal approximation for probability histograms.
PSY 360 ASSIST Learning for leading/psy360assistdotcom.
CREATIVITY “Creativity requires the courage to let go of certainties.” - Erich Standard 1: Objective 2.
Done Done Course Overview What is AI? What are the Major Challenges?
Unit 6: Cognition WHS AP Psychology
STATE SPACE REPRESENTATION
Next theories Keep the empirical rigor of behaviorism and add
Why Anagrams Are Cool! By Scott Trepanier 11/18/2018.
Unit 7: Cognition AP Psychology
Cognition and Learning:
Quick Quiz Describe the 5 types of transfer and give an example of how each of them can be used in sports. (5)
Unit 7: Cognition AP Psychology
Cognitive Processes PSY 334
Psychology of Thinking: Embedding Artifice in Nature
CS 416 Artificial Intelligence
Presentation transcript:

Reinforcement Learning

The study of thinking. 1) Problem-Solving 2) Reasoning

PerceptionMemoryThinking/Cognition Sensation Encoding Retrieval   Low Level Higher Level Thinking is a higher-level cognitive process that requires all sorts of cognitive operations (e.g. attention, perception, memory, language) and is often a conscious, controlled process Should we wait until we understand the lower-level processes first? Research in higher-level cognition might inform research at lower-level cognition and vice-versa.

The study of thinking Modern view: Thinking is an internal cognitive process The exact nature of these processes cannot be observed directly from behavior However, most cognitive theories lead to testable predictions. Behavioral experiments can test these predictions. Cognitive processes are inferred indirectly from behavior.

Well-defined & Ill-defined Problems Well-defined problems have completely specified initial conditions, goals, and operators  works well with computer simulation Ill-defined problems have some aspects which are not completely specified  sometimes requires insight to see problem in a new way 1. Writing a good paper = ? 2. solving an algebra problem = ? 3. conducting a statistical significance test = ? 4. designing a good experiment = ? 5. choosing a president = ? 6. reducing drunk driving = ? 7. being a nice person = ?

Well-defined problem solving INITIAL STATEGOAL STATE INITIAL STATE GOAL STATE ? Play the game: - given state - goal state - obstacles - operators

problem solving strategies How to solve the maze? - trial and error - forward - backward - means-end analysis

Most problem solving situations involves a combination of planning (means-end analysis), trial and error, and reinforcement learning and perhaps... insight Reinforcement learning  grew out of behaviorism Insight  Gestaltists view Planning  grew out of AI and cognitive psychology

Learning by Reinforcement Associationist theories of thinking -> thinking as response learning Three elements of associationist theory: 1)stimulus: a problem solving situation 2) response: a particular problem solving behavior 3) associations: strength between stimulus and response S R3R3 R2R2 R1R1

Thorndike’s work on cats in a puzzle box Cats initially solved the puzzle box problem by trial and error – trying various responses until one accidentally worked After being placed in the box many times, it learned the successful response and pulled the string almost immediately

Habit Family Hierarchy Try most dominant response first, then second strongest, etc.

1)Law of exercise: practice tends to increase S-R link 2) Law of effect: responses that solve a problem increase in strength. Responses that do not help solve problem lose strength S R3R3 R2R2 R1R1

What about response chains? E.g.: How can path from initial state to goal state be strengthened? How to avoid dead-ends? How can we reward a successful action that only much later in time leads to success?  problem of delayed reinforcement Modern reinforcement learning involves passing strengths of successful responses back through a chain. start goal

Maze example Reinforcement learning example for mazes

Reinforcement Learning Behavior follows simple associations in response chains. No planning, no mental maps, no “insight” Learning from very simple feedback: failure or success Associative strengths between response chains are learned. Passing strength back in time start goal

Demo’s Reinforcement learning in mazes: Reinforcement learning in robot-arm control: Robot learning task of pole-balancing and devilsticking:

Some Amazing Anagrams OriginalBecomes... DormitoryDirty Room DesperationA Rope Ends It The Morse CodeHere Come Dots Slot MachinesCash Lost in 'em AnimosityIs No Amity Snooze AlarmsAlas! No More Z's Alec GuinnessGenuine Class SemolinaIs No Meal The Public Art GalleriesLarge Picture Halls, I Bet A Decimal PointI'm a Dot in Place The EarthquakesThat Queer Shake Eleven plus twoTwelve plus one ContradictionAccord not in it To be or not to be: that is the question, whether tis nobler in the mind to suffer the slings and arrows of outrageous fortune. In one of the Bard's best-thought-of tragedies, our insistent hero, Hamlet, queries on two fronts about how life turns rotten. "That's one small step for a man, one giant leap for mankind." -- Neil A. Armstrong A thin man ran; makes a large stride; left planet, pins flag on moon! On to Mars!

SR1R1 R2R2 R3R3 R4R4 StimulusResponse (a new letter combination) g o r w n g r o w n w r o n g w r g n o … Anagram solving time depends on: - familiarity of goal word - letter transition probability of goal word - letter transition probability of presented word - number of moves

Class Experiment Replicate effect of familiarity

Ready...? nrdki »(drink 7.0) aewtr »(water 3.0) cahtb »(batch 16.0) milbc »(climb 7.5) kcler »(clerk 17.5) rtypa »(party 14.0) huocg »(cough 23.5) rmcap »(cramp 12.0)

nrdki »(drink 7.0) aewtr »(water 3.0) cahtb »(batch 16.0) milbc »(climb 7.5) kcler »(clerk 17.5) rtypa »(party 14.0) huocg »(cough 23.5) rmcap »(cramp 12.0) Mean solution times: High familiarity = 7.9 sec Low familiarity = 17.3 sec

Can all thinking be described by trial and error/ stimulus- response? What about insight?  Gestaltist view What about planning?  AI view

The Handcuffs Puzzle The Set-Up For this puzzle you need two people, some rope and some empty space to do the puzzle in. Each person will need a piece of rope with a loop tied in both ends, so it can be worn as handcuffs. The rope should be reasonably long, so that the person wearing it can easily step over it if they want. Each person puts on a complete set of handcuffs. Before putting them on, they loop their handcuffs around each other so they are tied together. Each person should wear a complete set of handcuffs. They then have to get themselves apart while following these rules: The handcuffs cannot be removed. Do not break, cut, saw through, bite through or in any other way damage the rope. Damaging each other is probably a bad idea too. content copied from: