Chapter 6: Learning.

Chapter 6: Learning

Classical Conditioning
Ivan Pavlov (1901) was a Russian physiologist renowned for his work on digestive processes. In the late 1890’s he was doing some r and e (research and experimentation) on canine salivary processes and digestion when he stumbled onto what became the biggest discovery of his career. Ironically enough, it had nothing to do with digestion or physiology. He unwittingly trained his dog subjects to act in an unnatural way toward a previously neutral stimulus. Dogs do not normally salivate to the sound of a bell ringing. However, when Pavlov paired a bell signaling a new subject to enter with a stimulus that would make the dog salivate (meat for example), the animals soon began to salivate before the meat was displayed, because the dog associated the bell and the appearance of the meat together. In the dog’s mind, the bell became synonymous with the meat. In doing so, he discovered classical, sometimes called Pavlovian, conditioning. Classical conditioning explains how a neutral stimulus can acquire the capacity to elicit a response originally elicited by another stimulus. Ivan Pavlov, a prominent Russian physiologist in the early 1900’s, who did Nobel prize winning research on digestion, discovered (partly by accident) that dogs will salivate in response to the sound of a tone. In classical conditioning, the UCS is a stimulus that elicits an unconditioned response without previous conditioning…Pavlov’s meat powder. The UCR is an unlearned reaction to a UCS that occurs without previous conditioning…salivating. The CS is a previously neutral stimulus that has acquired the capacity to elicit a conditioned response...the sound of a tone. The CR is a learned reaction to a conditioned stimulus…salivating to the tone.

Terminology Unconditioned Stimulus (UCS) – the stimulus that naturally evokes the desired response Conditioned Stimulus (CS) – originally, this is a neutral stimulus (NS) the stimulus that doesn’t naturally evoke the desired response; this is the stimulus a trainer must teach the subject to respond to. Unconditioned Response (UCR) – whenever the unconditioned stimulus is present, any response made by the subject will be unconditioned. A conditioned response only occurs when the UCS is removed and the subject responds to the CS alone. Conditioned Response (CR) – once the UCS is removed, if the subject performs the desired behavior with only the CS present, we now have a CR. The subject has been trained to respond to the CS in a way that it normally would not (a dog salivating to the sound of a bell for example).

There are four steps in the classical conditioning process: 1. The trainer must prove that the CS is neutral. In other words, the CS will not cause the subject to act in the way the trainer is attempting to train the subject to act (i.e. a bell will not, prior to training, cause a dog to salivate). 2. Put the CS and UCS together back to back, causing a response from the subject: CS + UCS >>>>> UCR 3. Repeat step #2 until the association between the bell and the meat is fully ingrained. 4. Remove the UCS and only present the CS to the subject. If the subject continues to respond the same as in step 2, you now have a CR, so… CS >>>>>>>> CR See next slide for a demonstration…

Classical Conditioning: Terminology
Trial : the initial pairing of a CS and a UCS Acquisition: initial stage in learning, where the subject initially acquires a learned response. Stimulus contiguity: Pavlov and Watson’s notion that the reason the classical conditioning association happens is because the two stimuli (CS and UCS) occur closely together in time and space. Stimulus continuity: Robert Rescorla put forth an alternative view of why the classical conditioning association happens. He said that, in part, Watson and Pavlov were correct, but that there is another factor involved as well. Rescorla proposed that another reason why the learned association happens is because the CS reliably predicts the UCS (the dog learned, for example, that the bell predicted or announced that the meat would be along presently!)

Classical Conditioning Paradigms
Simultaneous conditioning: CS and UCS begin and end at the same time. Delayed conditioning: UCS begins just as the CS ends, no gap between the two, seamless. This is the best of the paradigms. Trace conditioning: CS begins and ends before UCS is presented. There is a gap between the two, gap can be as long as the experimenter wants it to be. The longer the gap in time, the weaker the association. Temporal conditioning: Time becomes the CS Backward conditioning: UCS is presented before the CS Delayed conditioning promotes acquisition best. Trace conditioning can also be effective; ideally the delay between the end of the CS and the beginning of the UCS should be very brief, less than a second is best. The worst option is backward conditioning; in effect, it reverses the stimuli, and the order of presentation prevents the desired association from forming.

Classical Conditioning in Everyday Life
Conditioned emotional responses can become a problem as people grow and develop. Often, experiences in life (conditioned stimuli) are accompanied by other events (unconditioned stimuli) that cause unnatural responses (conditioned responses). For example, say you’re 6 yrs. old and a dog comes up to you (CS). The dog then bites your booty (UCS). After this traumatic event, you now might learn to fear dogs (CR). Conditioning with drug effects is also commonplace. Let’s say you take a pill (CS), it makes you feel good (UCS) so you take more and more pills afterward (CR).

Figure 6.3 Classical conditioning of a fear response

Figure 6.22 Conditioned taste aversion (Garcia Effect)

Processes in Classical Conditioning
Acquisition is when the initial learning of a new behavior occurs. Extinction occurs when the CS and UCS are no longer paired and the response to the CS weakens and then disappears. We know that the learned response is still there somewhere in the mind of the subject, but it’s no longer active. We know this because of spontaneous recovery, when an extinguished response rapidly reappears after retraining. The renewal effect occurs if a response is extinguished in a different environment than it was acquired and the extinguished response reappears if the subject is returned to the original environment where acquisition took place. This is one of the reasons why conditioned fears and phobias are difficult to extinguish permanently.

Processes in Classical Conditioning
Generalization occurs when conditioning generalizes to additional stimuli that are similar to the CS; for example, recall Watson and Rayner’s Little Albert, who was conditioned to fear a white rat but later came to be afraid of all furry white objects. Discrimination is the opposite of generalization; that is, the response is to a specific stimulus and there is no response to similar stimuli. Higher order conditioning occurs when a CS functions as if it were a UCS to establish new conditioning. This condition can be used to stack CS’s on top of one another. For example, if after training the dogs Pavlov had initiated a green light prior to the bell ringing, the dogs would have learned to salivate to the light before the bell or meat was displayed.

Figure 6.10 Higher-order conditioning

The Fall from Grace of John Watson
By the late 19-teens and twenties, Watson had become the most influential psychologist in the world. Freud was engrossed in political philosophy as he watched the growing threat of anti-Semitism and fascism begin to take root in Germany and Austria, leaving the door open for new ideas and fresh faces. In 1920, Watson and his assistant, Rosalie Raynor, conducted a landmark (and ethically challenged) experiment on classical conditioning and stimulus generalization. The Case of Baby Albert was both celebrated and criticized, and as more information came out, the criticism soon outweighed the praise.

9 month old Albert was put on a mattress on a table in the middle of a room. A white laboratory rat was placed near Albert and he was allowed to play with it. After a few minutes, an unpleasant and loud sound was created behind Albert's back by striking a suspended steel bar with a hammer, and was repeated each time the baby touched the rat. Albert responded to the noise by crying and showing fear. After several such pairings of the two stimuli, Albert was presented with only the rat. Upon seeing the rat, Albert got very distressed, crying and crawling away. Apparently, the infant associated the white rat with the noise. The rat, originally a neutral stimulus, had become a conditioned stimulus, and it was eliciting an emotional response (conditioned response) similar to the distress (unconditioned response) originally given to the noise (unconditioned stimulus). In further experiments, Albert seemed to generalize his response to the white rat. a rabbit, a furry dog, and even a Santa Claus mask with white cotton balls in the beard.

After the experiment was conducted in front of radio and print media, it became known that Albert was suffering from a terminal condition, and the idea of tormenting a baby with major health problems raised ethical questions. Questions about his experimental design became a problem (only a single subject and no control subjects). Then other researchers, most notably Mary Cover Jones, showed that Watson could have proven classical conditioning worked with a positive paradigm rather than a negative one, such as instilling a fear into a baby. Her Baby Peter experiment, where she combined a rabbit, a fearful child, and the child’s mother bearing love and chocolate pudding to eradicate a fear rather than creating one. Finally, it was revealed that Watson had been involved in an extramarital affair with Rosalie Raynor when his wife found their love letters and sent them to newspapers where they were published widely. Johns Hopkins fired Watson and that was that.

Operant Conditioning Edward L. Thorndike (1913) wrote about his experiments with cats (Cat in the Puzzle Box) in his doctoral dissertation. He discussed what he referred to as the law of effect, the idea that animals continue to act in ways that have provided favorable consequences for them in the past and are less likely to repeat behaviors that have led to unfavorable consequences in the past. He called this “instrumental learning”. This law later became the cornerstone of Skinner’s operant conditioning theory.

Operant Conditioning B.F. Skinner (1953) the law of effect was the root idea of his theory of operant conditioning. The other main ideas were the principles of reinforcement and punishment. Skinner’s principle of reinforcement holds that organisms tend to repeat those responses that are followed by favorable consequences, which he referred to as reinforcement. Skinner defined reinforcement as when an event following a response increases an organism’s tendency to make that response. Reinforcement contingencies are The cumulative recorder

Operant Conditioning Operant chambers (Skinner Boxes): Skinner created a prototype experimental procedure, using animals and an operant chamber or “Skinner box.” This is a small enclosure in which an animal can make a specific response that is recorded, while the consequences of the response are systematically controlled. Rats, for example, press a lever. Emission of response: Because operant responses tend to be voluntary, they are said to be emitted rather than elicited. Reinforcement contingencies are the circumstances, or rules, that determine whether responses lead to the presentation of reinforcers. Cumulative recorders create a graphic record of responding and reinforcement in a Skinner box as a function of time (how many responses/reinforcements in a set amount of time).

Figure 6.12 Reinforcement in operant conditioning

Figure 6.13 Skinner box and cumulative recorder

The Strengthening and Weakening of Responses
Primary Reinforcers Satisfy biological needs and have value because of what they are. Secondary Reinforcers Conditioned reinforcement, have no real value in and of themselves, but rather can be used to obtain primary reinforcers. Operant theorists distinguish between primary reinforcers, which are events that are inherently reinforcing because they satisfy biological needs, and secondary reinforcers, which are events that acquire reinforcing qualities by being associated with primary reinforcers. Primary reinforcers in humans include food, water, warmth, sex, and maybe affection expressed through hugging and close bodily contact. Secondary reinforcers in humans include things like money, good grades, attention, flattery, praise, and applause.

Reinforcement Used to increase or strengthen a response
Positive reinforcement occurs when a desirable behavior is followed by a rewarding stimulus Negative reinforcement occurs when a desirable behavior is followed by the removal of an aversive stimulus. Some theorists have recently questioned the value of the distinction between positive and negative reinforcement. They argue that the distinction is ambiguous and unnecessary. For example, the behavior of rushing home to get out of the cold (negative reinforcement) could also be viewed as rushing home to enjoy the warmth (positive reinforcement).

Reinforcement Negative reinforcement regulates escape and avoidance learning. In escape learning, an organism learns to perform a behavior that decreases or ends aversive stimulation (turning on the air conditioner). In avoidance learning, an organism learns to prevent or avoid some aversive stimulation (turn on the a/c before it gets too hot).

Figure 6.18 Positive reinforcement versus negative reinforcement

Figure 6.19 Escape and avoidance learning

Punishment Used to Decrease or weaken a response
Positive Punishment (aka Punishment) occurs when an undesirable behavior is followed by an aversive stimulus Negative Punishment (aka Omission Training) occurs when an undesirable behavior is followed by the removal of a rewarding stimulus. Punishment must be immediate and intense in order to have the desired effect . If you wait too long after the undesirable behavior was performed or don’t deliver meaningful punishment when it occurs, it won’t work.

Figure 6.20 Comparison of negative reinforcement and punishment

Problems With Using Punishment in Training
Problems with using punishment as a training tool 1. PUNISHMENT OFTEN FAILS TO STOP, AND CAN EVEN INCREASE THE OCCURRENCE OF, THE UNDESIRED RESPONSE. Since attention is one of the most potent rewards available, and since it is difficult to punish without paying attention to the offender, punishing may serve more as a reward than as a punishment. 2. PUNISHMENT AROUSES STRONG EMOTIONAL RESPONSES THAT MAY GENERALIZE. Once the strong emotional responses are aroused the degree and direction of generalization is largely uncontrollable. The result may be excessive anxiety, apprehension, guilt, and self-punishment.

Reinforcement vs. Punishment Strengthening and Weakening of Responses
3. USING PUNISHMENT MODELS AGGRESSION. The meaning of "social power is exemplified. 4. INTERNAL CONTROL OF BEHAVIOR IS NOT LEARNED. The offender may learn to inhibit the punished response during surveillance, but once surveillance ends there is no internal control mechanism to continue inhibiting the behavior. 5. PUNISHMENT CAN EASILY BECOME ABUSE. Most parents who abuse children do not intend to do the damage they inflict. Most of the damage and injury occurs when the parent loses control, and goes beyond the boundaries of reasonable behavior.

Reinforcement vs. Punishment Strengthening and Weakening of Responses
6. PAIN IS STRONGLY ASSOCIATED WITH AGGRESSION. The pain of punishment often leads to a display of aggression against either the source of the pain or, in some cases, an innocent scapegoat. 7. PUNISHMENT WORKS BEST WHEN IT OCCURS EVERY TIME. While reward works best when given on an intermittent basis, punishment works best when a continuous basis. The degree of vigilance required to constantly monitor behavior so that every occurrence of the undesired behavior can be punished is rarely possible. The undesired behavior is, therefore, intermittently reinforced when it is not punished, and the behavior continues.

Operant Conditioning - Basic Processes
Shaping: gradually bringing the subject closer and closer to the desired behavior by rewarding successive approximations of the desired behavior. Chaining: linking together individually shaped behaviors to form a more complex behavior Think of a child learning to potty train. There are several steps that must be learned in sequence in order to perform the whole task properly. A parent will shape individual behaviors and then chain them together in a sequence, ending with washing hands.

Operant Conditioning - Basic Processes
Stimuli that precede a response can exert considerable influence over operant behavior, basically becoming “signals” that a reinforcer is coming. These signals are called discriminative stimuli. These are cues that influence operant behavior by indicating the probable consequences of a response (ex. slow down when the highway is wet, ask Mom for a favor when she’s in a good mood, etc.). Discrimination occurs when an subject responds to one stimulus, but not another one similar to it, while generalization occurs when a new stimulus is responded to as if it were the original. (ex. cat runs to the sound of a can-opener which signals food, but not to the sound of the mixer, which shows discrimination. Yet if you get a new can opener, the cat runs to it.. That is generalization.

Table 6.1 Comparison of Basic Processes in Classical and Operant Conditioning

Schedules of Reinforcement
A schedule of reinforcement determines when a subject will receive a reinforcement. There are two general types of partial reinforcement. Ratio schedules are based on the ratio of behavior to reward, or how many desired behaviors must the subject perform to get a reward. Interval schedules are based on how much time elapses while the subject performs the desired behavior before the subject is rewarded. A fixed schedule means the # of behaviors/time interval doesn’t change, whereas variable schedules the # of behaviors/time interval is constantly changing.

Continuous reinforcement is a ratio schedule where there is a 1/1 ratio of behavior to reward. In other words, every time the subject does what the trainer wants it to do, the subject gets a reward. Intermittent, or Partial reinforcement occurs whenever the subject is not rewarded each time it performs the desired behavior, but instead is rewarded intermittently, or occasionally. Continuous reinforcement occurs when every instance of a designated response is reinforced (faster acquisition, faster extinction). Intermittent reinforcement occurs when a designated response is reinforced only some of the time (greater resistance to extinction). Ratio schedules require the organism to make the designated response a certain number of times to gain each reinforcer. A fixed-ratio schedule entails giving a reinforcer after a fixed number of non-reinforced responses. A variable ratio schedule entails giving a reinforcer after a variable number of non-reinforced responses. Interval schedules require a time period to pass between the presentation of reinforcers. A fixed-interval schedule entails reinforcing the first response that occurs after a fixed time interval has elapsed. A variable-interval schedule entails giving the reinforcer for the first response after a variable time interval has elapsed. More than 50 years of research on these schedules has yielded an enormous amount of information about how organisms respond to different schedules.

Fixed Ratio – # of behaviors needed for reward stays constant. Variable Ratio - # of behaviors for reward keeps changing Fixed Interval – elapsed time for reward stays constant Variable Interval - elapsed time for reward keeps changing

Observational Learning
Albert Bandura (1977, 1986) is credited with the theory of observational learning. In observational learning, vicarious conditioning occurs when a subject watches another organism (a model) be conditioned or exhibit a behavior. Observational learning can occur for both classical and operant conditioning. In order for observational learning to take place, four key processes are at work. First the organism must pay attention to the model, retain the information observed, and be able to reproduce the behavior. Finally, an observed response is unlikely to be reproduced unless the organism is motivated to do so, i.e., they believe there will be a pay off or benefit gained. Bandura distinguished between acquisition (having the response in your repertoire) and performance (actually engaging in the behavior). Bandura asserts that reinforcement usually influences already acquired responses more than the acquisition of new responses.

Figure 6.25 Observational learning

The Media Violence Controversy
Studies demonstrate that exposure to TV and movie violence increases the likelihood of physical aggression, verbal aggression, aggressive thoughts, and aggressive emotions. The correlation between media violence and aggression is almost as strong as the correlation between smoking and cancer Thus controlling what movies, TV shows, and video games your child is exposed to can go a long way toward protecting it from developing a violent and aggressive nature.

Figure Comparison of the relationship between media violence and aggression to other correlations.

Cognitive Learning Theory
Cognitive learning theories developed in the 1920’s and 1930’s, and continue to be useful to this day. In fact, these theories exerted a primary influence on Albert Bandura’s movement away from traditional S-R Behaviorism and toward a more flexible and eclectic paradigm (Cognitive Behaviorism and Reciprocal Determinism).The two most prominent cognitive learning theorists were the great Gestalt psychologist Wolfgang Kohler and Edward C. (E.C.) Tolman.

Cognitive Learning Wolfgang Kohler and Insight learning
Authored “The Mentality of Apes” starring Sultan the Chimpanzee Demonstrated that learning many times occurs as a “flash of insight’ whereby a subject, w/o any reinforcement, simply figures out the answer to a problem. E.C. Tolman’s Latent Learning and cognitive maps: Tolman put rats in a maze and timed how quickly they made their way to the finish. After they finished, he put them back at the start and retimed them. With each successive trial, the rat’s times became faster, showing that they learned the maze simply by experiencing it over and over again. Tolman referred to the learning as a development of a “cognitive map”, or mental layout of the environment, which develops naturally, w/o any reinforcement.

Cognitive Learning In the 1980s, Bandura became a Cognitive-Behaviorist, leaving strict behaviorism behind as he evolved theoretically. His theory of Reciprocal Determinism (aka Triadic Reciprocality) introduced his combining of the two perspectives (cognitive psychology and behaviorism), and is considered by many to be his most important theory, even if it isn't as well known as his social learning theory. On the next slide is a detailed graphical description of Bandura’s theory of reciprocal determinism.

Reciprocal Determinism

Constraints on Conditioning
New research has greatly changed the way we think about conditioning, with both biological and cognitive influences having been discovered. Instinctive Drift occurs when an animal ignores training and drifts back to instinctive behavior (like when a well trained horse sees/smells a wolf, it rears up, throws the rider, and then gallops away). Conditioned Taste Aversion – aka the “Garcia Effect”. Conditioned taste aversions can be readily acquired, after only one trial and when the stimuli are not contiguous (i.e., becoming ill occurs hours after eating a food), suggesting that there is a biological mechanism at work.

Constraints on Conditioning
Martin Seligman has outlined the fact that some phobias are more easily conditioned than others, suggesting the concept of preparedness, ie. that we are biologically prepared to learn to fear objects or events that have inherent danger. Signal relations theory (Rescorla) illustrates that the predictive value of a CS is an influential factor governing classical conditioning. Response-outcome relations - when a response is followed by a desired outcome, it is more easily strengthened if it seems that it caused or predicted the outcome . You study for an exam while listening to Lil Wayne and make an A. But what is strengthened, the potential for you to study or your potential to listen to Lil Wayne?

Chapter 6: Learning.

Similar presentations

Presentation on theme: "Chapter 6: Learning."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 6: Learning.

Similar presentations

Presentation on theme: "Chapter 6: Learning."— Presentation transcript:

Similar presentations

About project

Feedback