Presentation on theme: "Clicker Training: Does it work? Evidence for and against the advantages of clicker training."— Presentation transcript:
Clicker Training: Does it work? Evidence for and against the advantages of clicker training
Clicker training is misunderstood and misused Assume that just use the clicker – Not realize that have to condition the click: is a conditioned reinforcer, not a primary reinforcer Have to add a cue for the response – Clicker marks the behavior, not elicits it! Fade the clicker as behavior becomes fluent – No longer need the marker – In a chain, a cue becomes a conditioned reinforcer
Cueing Basics: A fluent cue response is: – Precise: animal performs the behavior exactly as you had envisioned – Performed with low latency – Performed with optimal speed – Shows resistant to distraction – Performed at any distance from the handler – Performed for the duration required by the handler
Example of a chain: Each cue serves as: – A reinforcer for the prior behavior – A cue for the next behavior – A cue to keep going
Note that with a behavior chain Several responses are emitted before a C/T is given HOW the behaviors are taught is what is different – Shape each step – Often backward chain: start from end and work backward Work towards – Accuracy – Fluency
Very few experimental reports on the value animal training Just assume it works Lots of incidental evidence Does clicker training work? Is it better? Let’s look at 2 studies
Strategic Reinforcement What I learned from Michele Poulliot
Rewards = powerful tool EVERY reward is a “delivery event” – This event greatly influences behavior – Must be strategic in delivery of reinforcers That is: Each reward delivery should involve a plan!
Strategic Reinforcement does NOT ask for additional behavior After the click: “delivery event” must flow smoothly and NOT require more behavior before the dog receives his/her reward. Example: Click for sitting – Dog gets up – You require dog to sit back down to get the reward – This is requiring an EXTRA behavior How to avoid this? Strategic Reinforcement!
Defining Strategic Reinforcement TWO components: – Effective reward location – Effective delivery method = powerful training! Assists training progress Avoids hindering training progress Supports preventing undesired behavior! Lack of strategic reinforcement: The trainer – Has no reward strategy – The animal controls the delivery event = not so powerful training
Many Reasons for Using Strategic Reinforcement Safety: – Open hand vs. fingers accessible – Must use a reward strategy that includes a safe and comfortable method of reward delivery for both trainer and the dog Convenience: – Easy for the trainer to deliver – Easy for the dog to get the reward – But: doesn’t ask for additional behavior OR disrupt the target response
How to Introduce Reinforcement Strategy into your training Several considerations: – Your animal Your relationship with the animal Value of different reinforcers available Mechanics of giving/taking reward – You as a trainer Your ability The training goals Each behavior may have a different reward strategy
Reward Location Several Location strategies: – Location resets dog: places dog in best location for next repetition – Rewards where the dog was at the time of the click (adding value to that location) – Rewards at specific location (moves the dog) which supports the goal behavior – Rewards where the dog is at time of delivery (may be different location from where click happened)
Speed of Delivery Too fast: may startle the dog with the movement Too slow: not relate the reward to the behavior Powerful reward “event” begins AFTER the click and is “smooth” from start to delivery
Delivery Strategies: Relocating Animal via Preset Rewards – Prompts the dog to move to the reward – Remote controlled feeding machine – Preplaced reward on a “trained” target (cue for taking reward) Reward Stationing: – Use a frequently used reward location – Place or position from which most behaviors are cued and/or most behaviors are reinforced – E.g.: head position, in front of handler, on platform Protected contact: – Physical set up limits animal’s access to reward – Reward delivery supports desired behavior: Waiting for delivery – E.g., remaining behind a barrier to deliver the reward: animal waits for you to approach barrier and delivery safely
Delivery Strategies Pizza Delivery! – Make it so there is NO NEED to TRAVEL to get the reward – Direct Delivery: Prompts the dog to remain at the “click” location Come and Get It! – Upon click: dog comes to reward at its location (usually handler) – Reset dog for next repetition: Initial movement towards goal behavior is heavily reinforced – Reset dog for Completion Point: final behavior position is more heavily reinforced “Longer” delivery Events: Distance Clicks – Extend “reward event” time – Reward is delivered to that distant location
Examples: Promoting a goal behavior: Down – Dog lies down, click – Where should you give reward to reinforce that position? Promoting goal behavior: Back up – Dog backs up, click – Where should you give reward to reinforce that position? Promoting heel position (dog on left side) – Dog heels, click – Dog moves to in front of you – Where should you reinforce to keep dog in position?
Examples: Promoting a goal behavior: Ignore distractions – Dog is walking along, not looking at distraction – Click for this behavior – Where should you give reward to reinforce that position? Promoting goal behavior: dog comes off A-frame, must make contact with lower part of board – Dog moves down ramp, hits yellow area – Where should you give reward to reinforce that position? Promoting 4-feet on platform – Dog gets x feet on platform, click – Where should you reinforce to keep dog in position?
Thorn, et al. 2006 Training shelter dogs Shelters are high stress environments – Very little training – Lots of inappropriate behavior inadvertently reinforced – Little time for staff to train Need for quick but effective training programs – Teach basic manners – Sit is the most basic
Three experiments Experiment 1: – 3 researchers: Handler, timer, “stranger” – Approach dog: record time to sit/start sit – Used verbal “good dog” and a treat Experiment 2: – Compared verbal vs. clicker – Verbal: waited, said “good dog” and treat – Clicker: waited, clicked, treat – 2 days of training for each: look at speed of training and retention Experiment 3: – Compared Contingent vs. noncontingent reward – 3 days of 15 minute training with food /verbal reward for sit – 4 post training conditions: Same trainer, same room Different trainer, same room Same trainer, different room Different trainer, different room
Results Latency to sit significantly decreased. All dogs were able to sit within 60 sec
Exp. 1, con’t Mean latency of second session less than ½ of first session.
Experiment 2 Day 1 of training: No difference between clicker and verbal training – Significant decrease in latency to sit Day 2: verbal training appeared better than clicker training – dogs showed better retention – Dogs showed lower latency
Experiment 3 No differences between groups on first day Dogs in contingent trials sat significantly longer, and this increased across trials Was a day x treatment interaction – Dogs in noncontingent condition sat less – Dogs in contingent condition sat more
No differences in generalization tests No difference across four test conditions for the dogs in the contingent reward condition Shows that the dogs generalized the “sit” across new settings and new trainers
Summary: Data show that minimal training in shelter (10-15 minutes over 2 days) works! – Even with novice trainers – Dogs quickly learned sit – Able to retain new response Why verbal work better than clickers? – Limited supply of verbal reward in shelter, so more valuable? – Negative association or no conditioning to clicker, but GOOD conditioning to the word “good” – Really this was a study about different MARKERS Good dog Click Power is not in the clicker itself, but in the procedure!
Summary: Shelter implemented new training policy: – All staff required to have dog sit when moving dog, feeding dog, interacting with the dog – Dog exposed to continuous training across settings Found that other behaviors also affected: – Decreased inappropriate responses such as barking, stress responses, jumping on cage – Better response to the dogs; increased adoption probability
Ferguson and Rosalez-Ruiz (2001) Trailer loading = critical horse behavior Horses often not like trailers – Small, dark, confined – Aversive methods often (usually) used – Too much negative reinforcement and punishment, which often escalates to increasingly aversive treatment
Horse Loading Behaviors Back up Move Forward Turn left/right Step up Loading problems = leading problems – Horse not going where led – Balks, turns away, etc.
Method 5 horses with poor loading history 2-horse straight load step-up trailer – Butt chain instead of butt bar – Side windows and rear doors left open – White inside and outside – Railroad tie used as extension of trailer deck Target: Red pot holder Reinforcers = typical horse treats 15 min training sessions
Method Baseline compared to training Loading behavior chain: horse approach trailer: 1.Within 3 meters (about 10 feet) 2.Within 1.5 meters 3.With head/neck in trailer 4.With front legs in trailer 5.With ½ body in trailer 6.With 3 legs in trailer 7.With 4 legs in trailer; less than 5 sec 8.4 legs in, allowed butt chain to be fastened, door to be closed.
Behaviors Recorded: Inappropriate responses/stress responses: – Amount of horse in trailer (using 8 step chain) – Freezing – Head toss – Standing – Turning Loading: – Getting into trailer (less than 5 sec) – Loading and staying in trailer Number of prompts New leads (re-approaches) Latency to respond to cue: – Within 5 sec – Greater than 5 sec – No response Also obtained interobserver agreement (IOA)
Procedure: Baseline: 1 day of repeated 5 min baselines Target training: 2 days; 20-30 trials/day – Touch target – Criteria: 80% of prompts Trailer training: – Trials to touch (just inside trailer) – Upon entry, lead back to start and another trial – Started at each horse’s baseline distance Added then faded trailer extension Trained to load on left/right sides Added limited hold with Fancy: gave several steps to move forward Multiple baseline design across horses
Results All horses learned to target during first training session – First session: average of about 60% accuracy – Second session: average of 80% – Sammy took 3 sessions, but reached 90%
Red’s data: Red shaped quickly: reached criteria at each step of the chain before moving on to next Change in setting disrupted behavior, but recovered criteria quickly
All horses able to be loaded Baseline: no horse able to get beyond step 4 After initial target training: – Red: step 6, some 7 – Penny/Shadow: steps 5,6 mastered; some 7,8 – Sammy: steps 3 to 5 mastered; some 6-8 – Fancy: 5 to 6 mastered When added extension: all but Fancy reached criterion performance – Fancy outwitted researchers: could stretch to touch target even with extension – Had to add the limited hold condition to outwit him.
Combined horse data All horses maintained loading behaviors when extension removed Loading left and right and new trailers produced some disruption but quickly recovered All reached 90%
Inappropriate Responses Most common: – Standing – Turning – Head toss Immediately decreased with training – Note: not targeted – Suggests these are stress responses
Leads and Prompts During baseline: Few leads and LOTS of prompts During training: Fewer prompts Leads were about 1:1 with prompts
Summary Target training decreased inappropriate responses secondary to increasing trailer loading Target training established stimulus control – This allowed stimulus control during situations which usually elicited problem behaviors – Horses so busy focusing on target that they ignored poisoned cues
Summary Why is clicker training better? – Faster responding with fewer disruptive responses – Fewer avoidance responses – Happier horses and trainers – Changes in loading procedures (e.g., left vs. right position, new trailer) easily dealt with and overcome
WHY does target training work? Avoids learned helplessness – Dogs learn to “move” or “do something else” to get a reward when NOT clicked – Gives organism control over the environment – Opposite of LH, where learn that their behavior has no power Not so much that it is “all positive”, as it is teaching the rule that you either must – Do what you just did to get the click – Or if that doesn’t work, do something different – “not” behavior is not an option Teaches “creativity”, “persistence”, “resistance”
For this week: We will be working on some “heeling” for half the period Then will work on developing your behavior chain. – Begin working on shaping any new behaviors for your chain – Work on fluency/maintenance for old behavior – Don’t forget to use the shaping staircase if you need to: helps you figure out the steps for shaping