Presentation on theme: "Operant Conditioning Unit 4 - AoS 2 - Learning. Trial and Error Learning An organism’s attempts to learn or solve a problem by trying alternative possibilities."— Presentation transcript:
Trial and Error Learning An organism’s attempts to learn or solve a problem by trying alternative possibilities until a correct solution or desired outcome is achieved. Often involves many attempts (trials) and incorrect choices (errors) Was called instrumental learning,now Operant conditioning - the learner ‘operates’ on the envioronment
Thorndike’s Puzzle Boxes Put hungry cats into a ‘puzzle box’, food outside box outside of reach Cat had to get out of box to get food. The more times a cat was put in the box, the faster it got out (fewer trials) After 7 trials would go straight for lever and get out immediately. Lever pushing now learnt, not random
Thorndike’s Law of effect a behaviour that is followed by ‘satisfying’ consequences is strengthened (more likely to occur) and a behaviour that is followed by ‘annoying’ consequences is weakened (less likely to occur) Instrumental learning because cat is instrumental in obtaining its release
Operant Conditioning first used by Burrhus Skinner. Operant is a response (or set of responses) that occurs and acts (operates) on the environment to produce some kind of effect. behaviour that has consequences ALL behaviour can be explained this way
Operant vs Respondent respondents are behaviours that are elicited by known or recognised stimuli. Pavlov’s dogs responded by salivating to meat powder, then a bell. Thorndikes cats made responses not prompted by stimuli. In CC, behaviour has no effect on consequences
Skinner Boxes small chamber where an animal learns to make a response for which the consequences can be controlled by experimenter. A lever that delivers food / water into a dish. Some have lights / buzzers Some have a flaw that can shock
Reinforcement Reinforcement - applying a positive stimulus OR removing a negative stimulus to subsequently strengthen or increase the likelihood of a particular response that it follow. Reinforcer - any object or event that changes the probability that an operant behaviour will occur again. Interchangeable with reward, but different
Reinforcement Initially, most success if behaviour is continually reinforced. Continuous Reinforcement - reinforcing every correct response after it occurs Partial Reinforcement - process of reinforcing some correct responses but not all of them. Partial may be delivered by different schedules
Fixed-Ratio Schedules When the reinforcer is given after a set (fixed) and unvarying number (ratio) of desired responses have been made eg every third response, one response for every 10 correct responses (1:10) during acquistion phase must be frequent workers who are paid ‘piecework’ eg commission, amount per basket picked.
Variable-Ratio when the reinforcer is given after an unpredictable number of correct responses. A mean number of correct responses that receive reinforcement. Very effective, fast acquisition and doesn’t cease easily. Poker machines - expected payout, but don’t know when
Fixed-Interval schedule when the reinforcer is delivered after a specific period of time has elapsed since the previous reinforcer, provided the correct response has been made. One correct response is all that is needed, like pressing the crossing button. Often erratic, since we realise time not responses are the factor, so wait until time
Variable-Interval Schedule when the reinforcer is delivered after an irregular period of time has elapsed, provided the correct response has been made. a mean period of time, but unpredictable. responses before the delivery time are not reinforced even if correct. Fishing, speed cameras, booze busses.
Positive Reinforcement giving or applying a positive reinforcer after the desired response has been made. positive reinforcer - provides a satisfying consequence (reward), so strenghtens the likelihood of a response.
Negative Reinforcement Removal or avoidance of an unpleasant stimulus. Negative Reinforcer - any unpleasant stimulus that when removed strengthens the likelihood of a desired response occurring. In negative reinforcemnt the reinforcer is removed or avoided, not given (positive)
Examples Getting and A on your exam (positive reinforcer)can be achieved by studying, so studying will be repeated (increased behaviour) Failing your exam (Negative reinforcer) is avoided by studying, so studying will be repeated (increased behaviour) Both lead to desirable / positive consequence.
Punishment delivery of an unpleasant stimulus following a response, or removal of a pleasant stimulus. Consequence of punishment is weakening of response, or decrease in probability of response occuring again
Activity partial or continuous? positive or negative or punishment?
Order of presentation for reinforcement and punishment, it must be presented immediately after a desired response not before. the rat needs to press the lever before getting positive reinforcer
Timing most effective when given immediatley after the response, so they are associated directly. Delay will cause learning to be slow or unsuccessful. Easier in lab than real life. Eg student reports, delayed response.
Appropriateness reinforcers must provide pleasing consequences, Punishments must provide unpleasant consequences. but how do you know what will please each person? Not all reinforcers will work in all situations. Inappropriate punishers can become reinforcers - eg. attention seekers
Key processes - Acquisition In OC, acquisition is the establhsiment of a response through reinforcement. speed depends on whether continuous or partial reinforcement. For complex behaviours successive approximations can be reinforced building up to target behaviour.
Acquisition Shaping - a procedure in which reinforcement is given for any response that successively approximates a final target response, Also known as method or successive approximations eg skinner’s pigeon will have to turn more and more to get same reward.
Extinction the gradual decrease in the strength or rate of a conditioned response following consistent non-reinforcement of the response. eg when does the pigeon stop turning after it isn’t being fed. may actually increase at first, to try to get the reinforcement. don’t want to stop
Spontaneous Recovery can also occur with operant conditioning, when the response occurs in absence of reinforcement after extinction has occurred. likely weaker and temporary
Stimulus Generalisation when the correct response is made ot another stimulus that is similar to the stimulus that was present when the CR was reinforced. Usually at a reduced level (weaker or less often)
Stimulus Generalisation when an oranism makes the correct response to a stimulus and is reinforced, but doesn’t respond to other stimuli, even when similar. eg if reinforced for red lights not green lights, will only respond for red.
CC and OC Role of Learner Timing of Stimulus and Response Nature of Response - Reflex or Voluntary? LA 12