Presentation on theme: "1 Research Methods Fall 2011. 2 Science Helps Avoid Bias Biases confound our judgment Overconfidence Confirmation bias Self-fulfilling prophecies Belief."— Presentation transcript:
1 Research Methods Fall 2011
2 Science Helps Avoid Bias Biases confound our judgment Overconfidence Confirmation bias Self-fulfilling prophecies Belief perseverance Illusory correlations Availability heuristic Conjunction fallacy Seeing patterns when there are none
3 1. Our clinical judgment is biased –biases in perception and interpretation are pervasive (e.g., blinding, grading, replication effect sizes lower) –when figuring out causes of problems –when figuring out what is effective for individuals –change occurs for many reasons so difficult to detect specific causes with the naked eye –hard to see subtle, delayed, or slow change resulting from specific causes –“common sense” is often wrong –harmful treatments can seem helpful Is a Research Class Necessary?
4 Early in therapy, my BPD client got worse. Caused by therapy? What could be other causes? After 8 months, her behavior improved. …soon after her parents threatened to stop paying for therapy What could be other causes? BPD Case Study
5 2. Lists of ESTs is overly simplistic –research does tell us what is effective NOW for my particular client in this particular setting –largely ignores moderators –minimally addresses principles of change (mediators) –too many treatments on the list Is a Research Class Necessary?
6 Numerous therapists practice unvalidated and sometimes discredited methods Astrology, Tarot cards, palm reading Homeopathic remedies Primal Scream Therapy Sensory deprivation therapy Rebirthing Therapy Thought Field Therapy Facilitated Communication (autism) EMDR (SDPA article) Pseudo-Science is Easy to Believe
7 Science provides a systematic and (relatively) reliable approach to figuring out causes of important problems and change: Why Do I Care about Research?
8 Guidelines for physical health: Exercise, omega-3, anti-oxidants, vitamin D3 Red wine and alcohol healthy Flossing could make you live longer Breast cancer - soy and estrogen replacement Amalgam fillings Low calorie diets Cholesterol Would you get surgery from a doctor whose practice wasn’t based on reliable evidence? Why Do I Care about Research?
9 1. Dissertation (quicker, easier) 2. Research is fun! 3. Better understand disorders (causes) 4. Prediction of clinical outcomes 5. Improve effectiveness with clients –evidence-based practice –what needs to change (causes) –how to change it (causes) 6. Publish (to get internships & postdocs) Why a Research Class?
10 1.Cannot please everyone! 1.Testing is a drag! 2.Try to balance pace 2.Some students feel overwhelmed whereas others have said they have learned nothing new 1.need solid grasp of stats 3.Labeling/terminology is important This Class is Demanding!
11 Plaque on teeth correlated with plaque in arteries, therefore flossing could make you live longer (on TV news show) how to choose a topic - bring in an article the ideal experiment - a time machine RG mom said she was been worse since start of therapy RG better because of coercion "low calorie" foods can lead to weight gain (like a placebo effect) Garlic causes insomnia
17 It is problematic to ignore research Research findings can easily be misinterpreted or misused (be careful relying on experts or over-valuing statistical significance) Not all research is created equal
18 1. Many sources of bias (many subtle) 2. Confounds make results ambiguous 3. Results do not generalize Research Findings Can Easily Be Misinterpreted
19 Ways to Determine What Works Clinical observation and intuition Treatment research can reduce bias and ambiguity
20 Three basic designs Observational/correlational studies Non-randomized manipulations Randomized experiments Correlational studies never randomize!! Not all Research Designs are Equally Persuasive
21 Too many causes to untangle Hard to isolate a specific cause (poor internal validity) Non-randomized Studies Often Yield Ambiguous Answers Regarding Cause and Effect
22 1. Have a clear causal theory (if possible) 2. Causes must precede effects “cause” = independent variable (IV) “effect” = dependent variable (DV) OR the IV precedes (and predicts) the DV Studying Cause and Effect
23 Internal validity improves when you rule out confounds; for example, you improve internal validity when: –you include gender as covariate –you exclude men –you match non-randomized groups on gender –the supposed cause precedes and correlates with the effect Internal Validity of Non-Randomized Studies
24 1.Cross-sectional (one time data collection) –correlations among current events, experiences, behaviors, and constructs –retrospective: some measures rely on memory for prior events, behaviors, etc. 2.Prospective (longitudinal) Correlational Studies
25 PUT IN SLIDE ON PROSPECTIVE STUDY OF SHAME PREDICTING SUBSEQUENT SELF-INJURY, WHILE COVARYING BASELINE SELF-INJURY Prospective Correlational Studies
26 Percent Eventual Suicide of Persons at High Risk for Suicide Who Obtain Treatment vs. Refuse Treatment Motto (1976)
27 Percent Suicide for Contacted vs. Non- Contacted High Suicide Risk Persons Who Refuse Further Treatment Motto (1976) *=p<.05
28 If you believe that non-randomized studies are sufficient to evaluate treatment efficacy, then you have to admit that treatment-as-usual increases suicide among high risk individuals. If you don’t want to make that conclusion, then you need experimental research.
29 suicide treatment study estrogen therapy study critical incident stress debriefing cholesterol studies yields consistent findings Non-randomized Studies Often Yield Different Findings than RCTs
30 What conclusions can be made with what degree of confidence? 1. Is the IV really the IV? DV? 2. Is I.V. really a true cause of D.V.? (internal validity) –Alternative interpretations of findings? –Does the intervention work? 3. Why did the “cause” lead to the effect? –How does the I.V. cause the D.V.? –Why does intervention work (mediators) 4. For whom are the causes truly causes (what populations and settings)? Research Validity
31 Are the results “confused”? Is the I.V. confounded with another variable, and could this third variable be the main cause of the I.V. and the D.V.? Is the IV-DV relationship spurious? Does the I.V. cause the D.V. for the specific hypothesized reasons or are supposedly non-essential parts the primary causes? Confound = Confuse
32 Tell your Grandma! 1. State the IV-DV relationship and why you think the relationship exists 2. Identify a confound However, the IV-DV relationship “could be due to ___” 2a. State the IV-confound relationship 2b. State the confound-DV relationship 2c. Therefore, the IV-DV relationship may simply be due to the confound. What are Confounds?
33 Example: Why is hot weather associated with ice cream sales? Causal link or spurious correlation? Example: If darker-skinned people commit more crimes, what could be the reason? Causal link or spurious correlation? Example: Why did CBT group end up with less depression than group who got supportive counseling? Causal link? Or is outcome difference due to some other difference between the groups? Example: Why do hairier players score more goals? Causal link or spurious correlation? Confounds
34 Mediation 7. Treatment leads to changes in outcome (direct effect) 8. The direct effect for treatment diminishes (in magnitude and significance) when the mediator is entered in the analysis. Treatment is effective because it changes ME. ME = stressful events or negative cognitions
35 Time confounds Maturation confound Cognitive training to young people Cognitive training to old people –natural cognitive decline could mask benefit Internal Validity
36 Hairiness is associated with scoring more goals on the mixed-gender soccer team because having more hairs keeps muscles warmer. However, that may be due gender, strength, and speed. Men are more often stronger and faster players. Stronger and faster players can score more. Therefore, hairiness per se may not cause the more effective performance. psych medications is likely to be an important internal validity confound in a study of bipolar disorder. If the control group does not have equal amounts and types of meds then difference between bipolar participants and participants in the control group could simply be caused by the difference in medications. Bipolar participants may have worse memory simply because they have more toxic medications in their body that cause memory impairment because they cause brain atrophy or heavy sedation. We have to acknowledge this possibility and hopefully we can rule it out by controlling for these variables. Homework #1
37 TWO MOODLE SUBMISSIONS ARE REQUIRED for this assignment. 1) Identify one specific plausible mediator. Describe the mediator clearly and completely. 2) Identify one specific plausible threat to internal validity? Describe the confound clearly and completely, including the direction of the effect. 3) Identify one specific plausible moderator of the relationship between the independent and dependent variables, and explain the moderation effect clearly and completely, including the direction of the associations. Study 1: Consider a single-group correlational study to assess the effects of level of childhood sexual abuse on adult interpersonal violent behavior, both measured as continuous variables. All participants had at least some history of abuse during childhood, and level of abuse was a combination of frequency and severity of prior abuse episodes, ranging from a single instance of an older child touching the participant's genitals over clothing, up to multiple rapes involving intercourse with an adult stranger and threats of violence. Level of violence was a combination of frequency and severity, ranging from a single instances of verbal cruelty or destroying the property of others, up to multiple physical assaults or murder. Study 2: Consider a two-group study designed to test the hypothesis that having a history of childhood sexual abuse (binary independent variable: none versus any) increases risk of physically assaulting another person (binary dependent variable: never versus at least one). The percentage of adult participants who have physically assaulted another person at least once will be compared between the abused and non-abused groups. Homework #1
38 Confounds vs. Mediators The label depends on your theory of the causal process –intrinsic or extrinsic part of your I.V.? If there is no main effect of the IV. –mediators cannot be examined (overall) –there can be moderator effects crossing regression lines mediators can be examined for subgroups
39 Internal Validity: Is the I.V. really a true cause of the D.V.? Construct Validity: Why did the “cause” lead to the effect? Depends on your theory Is explanatory variable an intrinsic part of I.V.? Internal vs. Construct Validity
40 Construct Validity Finding out: Why does a I.V. lead to a D.V.? Why does a manipulation cause a change? Why is an intervention effective?
41 Two Types of Construct Validity Construct validity of IV –some intrinsic part of the IV accounts for the IV-DV correlation or group difference Causal sequence (mediation) –the IV => mediator => DV –the mediator occurs after the IV –the mediator occurs before the DV
42 Construct Validity Examples: Ice Cream
43 Mediation 7. Treatment leads to changes in outcome (direct effect) 8. The direct effect for treatment diminishes (in magnitude and significance) when the mediator is entered in the analysis. Treatment is effective because it changes ME. ME = stressful events or negative cognitions
44 First state your specific theory of why the IV is related to the DV (intermediate cause) IV => M => DV Mediation requires: 1. IV is correlated with mediator 2. mediator is correlated with DV 3. The IV-DV correlation is reduced when the mediator is added into the prediction equation 4. A mediation test shows the IV-DV correlation reduction is statistically significant Mediation
45 Correlational mediation tests do not prove causal direction All pathways could be true: IV => M => DV M => IV => DV DV => M => IV DV => IV => M Mediation
46 Construct Validity Confounds 1. State your specific theory of why the IV is related to the DV (mediator) 2. Identify a competing theory of the IV-DV relationship (C.V. confound), e.g., how the relationship could be due to other intrinsic parts of the IV or DV (which may be more generic or broader characteristics of the IV or the DV. 3. Identify variables that share the confounds –include the variable(s) as covariate(s) –have those characteristics in control group
47 Validity vs. Confounds
48 Confound Examples Do BPD patients have more shame than normal controls? Does shame lead to self-injury? Does CBT reduce depression (compared to wait list control condition)? Does HRV biofeedback plus exposure lead to more reduction in fear than exposure alone? What are the IV and DV construct validity confounds?
49 Confound Examples Independent variables: Chinese (human, race, culture, Asian) shame (emotion, depression, anxiety, guilt) BPD (disorder, personality disorder, history of abuse, suicidality) CBT (support, hope, commitment, payment, problem-solving, new thinking, beh. activation)
50 Designing a Control Group 1.Figure out all the variables (characteristics or experiences) that are likely to influence your dependent variable other than your independent variable (confounds) 2.Include most important confounds in the control group 3.Figure out all the components of your independent variable (population, disorder, manipulation, treatment, etc.) 4.Decide what part of the independent variable you want to evaluate (broad or specific?) and include all other part(s) of the I.V. in the control group
51 Example: If people who get CBT truly get improved outcomes and placebo effects fully explain why, is that evidence that “CBT” works? Example: If people who get CBT truly get improved outcomes and number of stressful events during therapy fully explains why, is that evidence that “CBT” works? Internal vs. Construct Validity
52 Example: If people with BPD have worse suicidal behavior than normals and poor coping fully explains why, is that evidence that “BPD” per se is the cause? Example: If people with BPD have worse suicidal behavior than normals and number of stressful events during therapy fully explains why, is that evidence that “BPD” per se is the cause? Example: If people with BPD have worse suicidal behavior than normals and psych meds fully explains why, is that evidence that “BPD” per se is the cause? Internal vs. Construct Validity
53 Merge this with other slides Use a control group that has the characteristics that you want to rule out Adjust for independent variable confounds with covariates –Ex: Tangney "shame-free guilt" and "guilt-free shame" Include a DV that measures some non- essential aspect of the primary DV to show there is a stronger IV-DV association for the primary DV
54 Moderators Moderators are third variables that moderate or change the magnitude or direction of the relationship between two other variables
55 Moderation 5. Treatment is less effective for [men]. 6. Coping skills reduce the impact of stressors 6. Mindfulness skills reduce the impact of negative thoughts ME = stressful events or negative cognitions
56 Confounds vs. Moderators Internal validity confounds and moderators are separate issues A variable can be a confound and a moderator, but still are separate issues –cannot enter as a covariate If there is no main effect of the IV. –mediators cannot be examined (overall) –there can be moderator effects crossing regression lines mediators can be examined for subgroups
57 Moderator Examples Resiliency (reduce causality) –Does abuse cause violence? Risk factors (increase causality) –Does smoking cause cancer? Do opiates cause euphoria? (Naltrexone) Does alcohol cause euphoria? Antabuse MBCT to prevent depression relapse BA best for severe depression Psychotropic meds depend on race (Kazdin) Acculturation
Mindfulness-Based CT Study 1 # epis.MBCTTAU 1-254%31% >2 37%66% Study 2 MBCTTAU 1-250%20% >236%78%
59 Moderator Example
60 Depends on your question: Why? – construct validity Generalize? – external validity Kazdin, pg 63 External vs. Construct Validity
61 Passage of time Testing – desensitizing to shame questions Testing – diary cards Instrumentation – July 4th arrests, suicides – what is “good” or “bad” to raters is relative to what has already been seen Confound Examples
62 Sources of Bias Hawthorne effects Demand effects Placebo effects –active placebo –placebo surgery Interpersonal expectancy effects (Rosenthal)
63 Participant Reactivity (Bias) A = Hawthorne; B = Demand effects C = Expectancy/Placebo; D = Rosenthal
64 Limiting Bias Forms of bias –Hawthorne effects –expectancy effects (e.g., placebo) –demand effects Bias protection –naive experimenters (therapists vs. assessors) no knowledge of experimental condition no knowledge of hypotheses –automated and standardized procedures
65 Reduce Bias Rosenthal Effect –naïve experimenters and interviewers check knowledge of subjects and hypotheses –scripted procedures (standardized) written or recorded instructions –balance bias: therapist allegiance in RCT –balance bias: plausible rationale for control group –measure expectancies Demand effects –withhold hypotheses check subjects’ beliefs about purpose of study (debriefing) –have indirect measures (not self-report)
66 Example: Mediators and Moderators Rosenthal Effect among Teachers Mediator: students persist more teachers provide more help (persist) teachers more effectively reinforce students teachers’ biased evaluations of students Moderator: student-teacher similarity student attractiveness and kindness teacher burnout
67 Example: Mediators and Moderators Rosenthal Effect among Researchers Mediator: subjects/patients persist more experimenter/therapist provides more prompting experimenter/therapist reinforce target behavior experimenter biased evaluations of subjects Moderator: subject-experimenter similarity (patient-therapist) patients’ fear of failure/success
68 1. ____ is a serious problem 2. Currently we do not adequately understand the problem (insufficient data) 3. Current treatments are insufficient because: 4. It is plausible that a missing piece is… 5. No study has yet tested… 6. This study is needed to address… Study Justification
69 1. Do not state null hypotheses. Instead: –show differences in correlations –test moderator effects –specify effect size and/or confidence intervals 2. State basic (non-jargon) idea/theory 3. Operationally define how test theory (measures) 4. State direction of effects clearly 5. Not too many, not too few Research Hypotheses
70 Dealing with I.V. Confounds Measure all plausible internal validity confounds and evaluate their role in the results Prevent confounds if possible –make subjects homogeneous on confound variable narrow inclusion criteria –use a control group that is equivalent on participant baseline characteristics randomize (stratified) to groups match subjects (case-control, quasi-experimental study) within-subjects design –standardize/yoke procedures/scripts –naive experimenters/interviewers to reduce bias check the blind Adjust for confounds with covariates
71 Within-Subjects Designs Within the same person: comparing multiple things comparing the effect of two manipulations Advantages: virtually eliminates person variable confounds increases statistical power can yield truer correlations Disadvantages time confounds order and carry-over effects (manipulations)
72 Within-Subjects Designs Multiple-treatment designs multiple manipulations Single-subject designs ABABAB multiple-baseline Repeated measures designs change over time covariation over time Small sample diary studies (N < 30) (Caspi (1987)
73 Reasons for Parasuicide Method of Analysis Between-SsWithin-Ss ReasonNSSANSSA Feeling Generation5421**5915*** Self-punishment6338*5951 Anger expression6324***5428**
74 Correlational Methods Research Question: What is the association between shame and suicide ideation? 1. Between-subjects: each subject has one shame score and one ideation score 2. Within-subjects: each subject has multiple pairs of shame and ideation scores –correlations per person (HLM) –small sample diary study (Caspi method)
75 Correlation: shame is correlated with SI BS: people with higher shame have more SI WS: when the shame (of individuals) increases their SI also increases Experiment: increasing shame increases SI BS : people who get a shame induction increase their SI more than people who do not WS: SI increases more when people get a shame induction than when they (same individuals) do something else Shame and Suicide Ideation
76 Within-Subjects Correlations
77 Within-Subjects Correlations: HLM 1. Regression lines for each subject 2. Compute the average regression line
78 Within-Subjects Correlations: HLM 1. One regression line for all subjects 2. Many IV-DV pairs per subject
80 Caspi (1987) Correlation Method Which correlation is larger?
81 Caspi (1987) Correlation Method 1. Collect many frequent measures of IVs and DVs –e.g., daily scores for at least several weeks 2. Data for all subjects in one regression equation 3. Remove between-subjects effect: –dummy code each participant and enter all dummy coded variables into regression 4. Test if today’s IV score predicts tomorrow’s DV score better than it predicts yesterday’s DV 5. Test if today’s IV score predicts tomorrow’s DV score when covarying today’s DV score
82 Randomized Experiments Step 1: Select an intervention or manipulation that simulates a cause in the natural world (independent variable) Step 2: Select a randomization method and verify that groups are comparable on confounds Step 3: Verify that the intended independent variable actually occurred sufficiently (manipulation check or treatment adherence)
83 Randomized Experiments Analog studies = simulations Independent Variables Emotionsmood induction Social exclusioncomputer simulation Attribution biasambiguous aggression scenarios Jury decisionsvignettes of criminal trials Malingeringinstructions to fake malingering Suppressionsuppression instructions Worryworry instructions Self-Injuryenduring cold-pressor pain
84 Randomized Experiments Analog studies = simulations Dependent Variables (samples of behavior) Aggressionelectric shock (self-harm studies too) Aggressionpoint subtraction penalties Self-harmcold pressor task, electric shock Stigma attitudeelectric shock (to patients) Persistencetime with unsolvable anagrams Impulsivitygambling games Binge eatingsnack food left in room
85 Dealing with I.V. Confounds Measure all plausible internal validity confounds and evaluate their role in the results Prevent confounds if possible –make subjects homogeneous on confound variable narrow inclusion criteria –use a control group that is equivalent on participant baseline characteristics randomize (stratified) to groups match subjects (case-control, quasi-experimental study) within-subjects design –standardize/yoke procedures/scripts –naive experimenters/interviewers to reduce bias check the blind Adjust for confounds with covariates
86 1. The subjects differ (selection bias) at different levels of the I.V. –baseline levels of the D.V. (severity) –demographics (gender, age, ethnicity, SES) –differential drop outs 2. Subjects’ experiences in study differ –demand/expectancy effects experimental group more hopeful control group demoralized or competitive –amount of treatment received (or practice) Internal Validity Confounds
87 Randomization Failure Probability that at least one confound will occur due to chance : 22.6% if 5 confounds are tested 40.1% if 10 confounds are tested 64.1% if 20 confounds are tested (assumes p<.05 per confound and that all confounds are independent of each other)
88 Must Check if Randomization Worked! Not rare that baseline differences on at least one variable emerge due to chance! A stratum (level) may be too big Subjects do not fill all strata (levels) –In a DBT study, stratified randomization failed because only “medium” severity subjects entered study. More severe “medium” severity subjects were in control group.
89 Effect Sizes Pearson r indicates the magnitude of association between two continuous variables Cohen’s d indicates the magnitude of association between a binary variable and a continuous variable
90 Effect Sizes correlationt-test Pearson r (r2)Cohen’s d Small.10 (.01).20 Medium.25 (.08).50 Large >.38 >.80
91 Effect Sizes
92 Problems of Multiple Tests: Inflated Type I error rate If alpha level is set at.05, the chance of finding at least one Type I error is: 22.6% if five statistical tests are done 40.1% if ten statistical tests are done 64.1% if 20 statistical tests are done
93 Test which cars can get you from San Diego to Los Angeles the fasted—red or blue? Why (mediator)? (knowledge of) faster route push gas harder (no fear of police or crash) car accelerates faster car has more horse power When (moderator)? number of stops (because of acceleration) number of hills (because of horse power) Example
94 Nonspecific Predictors of Outcome
95 Passive recruiting (e.g., posting flyers) will result in a very biased sample. People who respond to ads are different than the general public. Instead go to places in person and approach people. Go to a mall and give people 5 dollars in advance. External Validity
96 Problems with Self-Report measure inequivalence contaminates static group comparison studies questions mean different things to different people (concept inequivalence) –intimacy example NEED BETTER EXAMPLE people use number scales differently (metric inequivalence) –Italians vs. Irish pain ratings –BPD more emotional than APD? Solutions: randomization balances out the differences within-group correlations
97 Best Self-Report Methods Current observable states –to avoid memory bias –to avoid unnecessary inferences …measured in relevant contexts to ensure activation of schema –mood induction –priming procedures –experience sampling in natural contexts Interviews
98 Are Women More Emotional? YES, when comparing global retrospective self-reports of women vs. men NO, when comparing average emotional states measured by experience sampling Barret et al. (1998). Are women the "more emotional" sex?
99 Survival Plot for Shame
100 Why does trait shame not predict self-injury? Shame Variability
101 Shame Variability
102 Avoid the Problems of Shared Method Variance and Socially Desirable Responding
103 Alternatives to Self-Report Your research proposal should include at least one: informants (e.g., spouse or significant others) behavioral samples –Behavioral Approach/Avoidance Test –observational coding (e.g., FACS) –unobtrusive behaviors (e.g., Bargh studies) –Davison ATSS psychophysiology performance tests (reaction time measures) –semantic priming –stroop –IAT
104 Alternatives to Self-Report PASAT Exclusion computer program PSAP Electric shock concentration
105 Implicit Association Test
106 Implicit Association Test
107 Implicit Association Test
108 Implicit Association Test
109 Implicit Association Test
110 Semantic Priming The driver stepped on the… GAS The driver stepped on the… BRIDGE They said it was the… BRIDGE
111 Semantic Priming He was less stressed when he had the… BEER They said it was the… BEER
112 Semantic Priming 1. I deserve…PUNISHMENT 2. I deserve…WATER 3. I deserve…PRAISE 4. A criminal deserves…PUNISHMENT 5. I injure myself for…PUNISHMENT 6. They said it was for…PUNISHMENT RT: 1<2<3, 1=4<6, 4=5<6
113 Semantic Priming I injure myself for…RELIEF I injure myself for…EXPERIENCE Aspirin can provide…RELIEF
114 Schema Activation Schema: “Black people are dangerous” Situation: police officer sees person standing up from behind an object in an alley Motor response: ??
115 Schema Activation Schema: “Old people are slow and sickly” Priming: see “old” words Motor response: slower walking down hall Schema: “Interrupting is rude, helping is nice” Situation: describing a nice friend Motor response: offer help to someone else
116 Valid Coding/Interviewing Training select extra material not to be analyzed talk through examples code separately and confirm inter-rater reliability with Kappa, ICC, or Pearson Have primary rater be naïve to hypotheses/subjects Verify reliability of coded analysis variables inter-rater reliability (>20% of data) intra-rater reliability Re-train if necessary use material that will not be analyzed do not reveal discrepancies for real data that must be re-coded
117 Manipulation Checks Verify that the experimental manipulation worked (as intended) to know mediation subjects paid attention and retained important information subjects actually had the targeted emotional experience subjects complied with instructions the intervention was delivered correctly (integrity / adherence / fidelity)
118 Statistical Issues Missing data Type I versus Type II errors –data snooping (fishing) Power analyses Maximizing power
119 Type II Errors are Ubiquitous Most studies are underpowered to detect anything but large effect sizes A statistically non-significant result does not mean no correlation or no difference –medium-sized effects are often not statistically significant –most studies cannot detect small effects
120 Ways to Increase Power increase sample size increase group differences (effect size) within-subjects (use both pre- and post-tests, control stimuli) increase alpha (e.g.,.10% Type I error rate) one-tailed tests (for a priori hypotheses) be parsimonious –in primary hypotheses and analyses –only have two groups decrease variability –standardize procedures and use scripts –do not counterbalance unless necessary –homogeneous sample (narrow inclusion) –use reliable measures –clean your data
121 Controversy Position 1: Dodo Bird (e.g., Wampold) all therapy benefit due to common factors specific techniques make no difference Position 2: RCTs are irrelevant to real world Poor external validity (too many exclusions) Position 3: RCT methods do not identify best treatments (because they compare to TAU or waitlist). Other research strategies better for figuring out active treatment ingredients (Sprenkle, Davis, & Lebow)
122 Controversy: The Dodo Bird In a diverse literature it is vital to consider these specific issues for specific studies 1.Credibility of the results (or opinion) 2.What specific problem? 3.What specific treatments?
123 Controversy: The Dodo Bird All therapy benefit due to common factors and specific techniques make no difference. Based on metaanalyses that inappropriately average across studies with diverse disorders, treatment comparisons, and methodologies. Analogy: Ask all San Diegans “Do pills effectively treat sore throat, cough, & nasal congestion?"
124 Goals of Treatment Research To find out most effective treatments: The treatment DID cause change (efficacy) The treatment causes change (specificity) The treatment causes meaningful and lasting change The treatment works in the real world –for whom? The reasons why treatment works
125 Treatment Research efficacious – when a treatment is better than no treatment or is comparable to another treatment with established efficacy specific – when a treatment is better than a placebo control condition or another credible treatment effective – when a treatment is shown improve outcomes in real-world settings
126 NIMH Stage Model 1. treatment development, single-subject and single-group designs, predictors of treatment success, small pilot RCTs 2. RCTs with sufficient power showing –2a) efficacy (high internal validity) –2b) specificity (more than generic therapy) 3. RCTs with high external validity (may lose some internal validity) -may be quasi-experimental studies 4. Mechanisms and mediators
127 Kazdin Stage Model 1. treatment development and small pilot RCTs 2. RCTs with sufficient power and mediation correlational analyses –2a) efficacy (high internal validity) –2b) specificity (more than generic therapy) –2c) component analysis studies 3. RCTs with high external validity
128 Levels of Empirical Support Level 5: Efficacious and Specific Level 4: Efficacious Level 3: Probably Efficacious Level 2: Possibly Efficacious Level 1: Not empirically supported
129 Effectiveness When treatments are shown to be: 1. effective for common patient populations –high compliance 2. effective when delivered by common therapists in common settings –high acceptability and compliance –easy to disseminate and train 3. cost-effective
130 What Does Work and Why? External Validity EXTERNAL VALIDITY True experiments with high variability and common people (few exclusions) True experiments in common settings with common therapists True experiments with various subgroups of patients Quasi-experimental designs
131 What Does Work and Why? Construct Validity CONSTRUCT VALIDITY Rigorous control groups –rule out that therapy was received –rule out amount of therapy received –rule out placebo/expectancy/demand effects –rule out therapist characteristics/differences –do manipulation checks Dismantling studies –identify active treatment components –(experimental) psychopathology research
132 Treatment Research: Gold Standard Randomized Controlled Trial
133 Two Uses of Control Groups Internal validity –have equivalent groups treated equally EXCEPT the intervention Construct validity –have equivalent groups treated equally EXCEPT the most important active ingredients of an intervention
134 Control Group Hierarchy Control group is/includes: components of active treatment (e.g., behavioral activation but not cognitive restructuring) comparable morale/confidence/allegiance comparable "quality" of treatment (e.g., experience/expertise) attention placebo (comparable amount of treatment, modes, and relationship) treatment as usual only measures (no treatment or wait list)
135 Why Did the Treatment Work? Whatever parts of the primary treatment that are not in the control group must be acknowledged as the possible reasons why the treatment improved outcomes.
136 Did the Treatment Work? You cannot conclude that a treatment is effective unless the change in the treatment group is better than the change in a no treatment group You cannot conclude that a treatment is effective if is only compared to another treatment Sometimes plausible treatments interfere with natural recovery or are iatrogenic –Ex: CISD, BPD process groups
137 Control Groups
138 Control Groups
139 Control Groups If favored treatment showed significant change, while control group showed non significant change, DO NOT conclude that treatment is superior to control!
140 RCT Analyses Analyses must show between-group differences in change H1: CBT will improve depression more than will TAU. H1a: The treatment-by-time interaction effect in HLM will show that the CBT group will have larger reductions in BDI scores than will the TAU group.
141 RCT Analyses This is WRONG H1: CBT will have lower depression scores at post treatment than TAU.
142 These Results don’t Test H1
143 RCT Longitudinal Analyses Repeated Measures ANOVA cannot be used with subjects with missing data (dropouts) –missing data for dropouts can be imputed (e.g., LOCF method), but results can be misleading. HLM is a better imputation method, although differential dropouts can still bias results. –test dropout-by-treatment interaction effect
144 Influence of Dropouts True and complete data (if there were no dropouts)
145 Influence of Dropouts
146 Influence of Dropouts
147 Influence of Dropouts Observed data (missing data due to dropouts)
148 Influence of Dropouts
149 FIX THIS GRAPH Observed data (missing data due to dropouts)
150 Analyze attrition: number of inquiries, appointments, inclusion criteria met, started study, completed study Report reasons for exclusion and dropout. Compare dropouts to completers for each treatment group Analyze treatment-by-time-by-completer three-way interaction effect Prevent study attrition despite tx drop-outs –do Intent-To-Treat analyses External Validity
151 Clarkin et al IV: DBT vs. Transference-Focused PsychoTx vs. dynamic supportive tx Sample: 30 BPD patients in each group “no differences between groups in demographics or psychopathology” No statistically significant outcome differences Large difference would not be statistically significant: d=0.52, r=.25, 10% vs. 40% ITT analyses “did not show different results” Therapists were monitored and rated for adherence.
152 Clarkin et al OASMprepostdecrease IrritabilityTFP DBT AngerTFP DBT Verbal AssaultTFP DBT Direct AssaultTFP DBT
153 Clarkin et al. 2007
154 Clinical Significance Does the IV really make a meaningful difference in people’s lives? Studies need to show if findings are large or clinically meaningful (vs. small or trivial) –functioning rather than just symptoms –statistical effect size –IV and DV scores need to indicate whether “severe” or “good” or “large” vs. “small” (i.e., binary)
155 Clinical Significance Correlational studies need to show: –at least some clinically severe scores on the IV and the DV (overall, a wide range of scores is best) –the correlation represented as percentages of severe outcomes (DV) for severe and non-severe IV groups Group comparison studies need to ensure that clinical groups are severe (e.g., DSM diagnosis) Treatment studies need to show that participants start off as severe and end in good shape
156 Clinical Significance Most studies fail to show that findings are large or meaningful or the problems are severe. Examples: mood induction manipulation check Rorshach correlations restricted emotionality (E. Rogers) Williams Syndrome
157 Clinical Significance To examine the C.S. of a correlation between continuous variables make the variables binary in meaningful way. Example: Rorshach aggression scale (RAS) and overt aggressive behavior (OAB). never assaultprior assault High-RAS10%90% Low-RAS90%10%
158 Clinical Significance IV: insult (anger) vs. neutral DV: pre-frontal brain activity Manipulation check: “…indicate to what extent they felt each feeling during the experiment (1 = not at all; 5 = extremely)” “subjects in the insult cond. Reported more anger (M = 2.0) than did subjects in the no-insult condition (M = 1.4), (p <.01)”
159 Clinical Significance IV: Williams Syndrome vs. normal controls DV: sociability (hyper-sociability?) Manipulation check: Results of WS subjects from the approachability test will be compared to that of the two normal control group by computing a clinical significant cutoff score for the WS group. The cutoff score will be obtained by the Jacobsen et al. formulas. Also, two independent raters, who are blind to the study objectives and identities of subjects, will code responses of the Sociability Questionnaire into four categories: shy, social (highest social and high social), and in-between (tested with Chi square). If more than 50% of WS subjects and less than 25% for the NC group are classified in the most social category, based on a previous study (Doyle, et al., 2004), then sociability of WS subjects is considered to be clinically significant for this study
160 Limits to Methods for Confounds Randomization is sometimes artificial (external validity) –not choosing a treatment is artificial –some I.V.s (e.g., emotions) must be simulated (artificial) Standardized/yoked procedures/scripts are artificial Stratified randomization –may not end up with even distribution among levels, which can prevent stratification from working Non-randomized matching is limited: –can only match on a couple variables –regression to different means – matching sometimes creates this new confound Use of covariates is limited: –homogeneity of regression – the regression lines must be parallel for different levels of the confound variable –reduces statistical power
161 Confounds vs. Moderators Internal validity confounds and moderators are separate issues A variable can be a confound and a moderator, but still are separate issues –cannot enter as a covariate If there is no main effect of the IV. –there can be confounds to covary –there can be moderator effects crossing regression lines
162 Confounds vs. Mediators The label depends on your theory of the causal process –intrinsic or extrinsic part of your I.V.? If there is no main effect of the IV. –there can be confounds to covary –overall mediators cannot be examined –mediators can be examined for subgroups
163 Internal vs. External Validity Increase internal validity: homogeneous sample rigid interventions (e.g., session duration) scripted interactions (to reduce bias) no choice of intervention (random) Increase external validity: heterogeneous sample (few exclusions) flexible interventions natural interactions in natural settings choice of intervention
164 NIMH Stage Model 1. treatment development and small pilot RCTs 2. RCTs with sufficient power showing –efficacy (high internal validity) –specificity (more than generic therapy) 3. RCTs with high external validity (may lose some internal validity) -may be quasi-experimental studies 4. Mechanisms and mediators
165 Kazdin Stage Model 1. treatment development and small pilot RCTs 2a. RCTs with sufficient power and high internal validity and test of mediation 2b. RCT component analysis studies 3. RCTs with high external validity
166 Why Emphasize Mechanisms of Change? to distill interventions to their most potent components (maximally efficient treatments) to facilitate treatment matching (i.e., moderators that are baseline characteristics of the mediator variable) to facilitate implementation in normal clinical contexts (generalization) by highlighting ways that the form can be adapted while maintaining the key change processes (function). –should not rigidly apply manuals!! (CBT vs. IPT)
167 Evidence for Mechanisms of Change Strong association Gradient (more change ingredient, more change occurs) –dose-response relation Specificity (other plausible variables do not show mediation) Experiment (try to manipulate change mechanisms) –component analysis studies Temporal relation (rarely established) Replication Plausibility and coherence (credible change process)
168 Evidence for Mechanisms of Change Component analyses have more internal validity than mediation correlations Examples of misleading severity confounds In DBT study, amount of therapy in DBT condition was not correlated with outcomes It is possible that in CT study amount of BA vs. CR would not be related to outcome
169 Component Analysis Studies Also called dismantling studies Test if a component is necessary by comparing the full treatment to the treatment when that component is removed Add common factors to the reduced condition to rule them out as explanation why component is necessary Confirm precise mediational causal process in correlational analyses
170 Component Analysis Studies Mediational (shared variance) analyses such as the Sobell typically test causal processes that may account for treatment group differences Shared causal processes may be shown by: –within-group mediational analyses –simple correlational analyses
171 Component Analysis Studies
172 Component Analysis Studies
173 Component Analysis Studies Also called dismantling studies 2 groups needed to answer the question if one component is necessary 3 groups needed to answer the question if two components are necessary 4 groups are needed to the question if two components are necessary and how much they matter
174 HRV Biofeedback Component Analysis Study Two-group RCT 1. Slow breathing + (fake feedback) 2. Slow breathing + HRV visual feedback Can test if the HRV feedback is useless
175 EMDR Component Analysis Study Two-group RCT EMDR vs. EMDR - EM (11 out of 13 studies) –exposure + eye = exposure PE vs. EMDR Power and effect size?
176 First DBT Study Validity in the first DBT study was criticized because DBT patients received many more hours of therapy than TAU. Follow-up data analyses indicated that there was no correlation between number of hours and outcome When treatment hours were entered as covariates, DBT still had superior outcomes However, treatment hours could still account for why DBT subjects did better simply because the worse people got the most treatment Need to covary severity and hours in comparing DBT to TAU
177 DBT Replication Study Controlled for: 1. therapist expertise 2. therapist allegiance to treatment provided 3. clinical supervision group 4. prestige of DBT 5. psychotherapy 6. treatment affordability and hours 7. therapist gender, training/degree, and clinical experience
178 DBT Component Analysis Study Three-group RCT 1. DBT (individual + group skills training) 2. Individual DBT + activities group 3. DBT group skills training + case manag.
179 Cognitive Therapy Mediation Studies 1) CT is based on cognitive theory –thinking causes emotions and behavior 2) Change in CT is associated with cognitive change as hypothesized –concurrent change correlations
180 Component Analysis Study Cognitive therapy for depression is comprised of cognitive restructuring (CR) and behavioral activation (BA) Removing CR does not reduce its effectiveness BA is as effective as BA+CR
181 Component Analysis Study 5 interpretations of change process BA works better because –thinking is irrelevant –BA is better at changing thinking –it improves environment and thinking CR and BA are both effective and redundant –both change thinking –both change environment (reinf + punish)
182 Component Analysis Study If honey does not improve a sugar- sweetened desert, is honey less tasty? Three groups are needed Could CR be as effective as CR+BA?
183 New Behavior Changes Cognition “On the one hand, explanations of change processes are becoming more cognitive. On the other hand, it is performance-based treatments that are proving most powerful in effecting psychological changes. Regardless of the method involved, the treatments implemented through actual performance achieve results consistently superior to those in which fears are eliminated to cognitive representations of threat (Bandura, 1977, p. 78)
184 Activity Scheduling in Cognitive Therapy Pleasurable activities “Nothing is meaningful or worthwhile” Mastery activities (self-efficacy) “I am incapable of doing anything” Behavioral experiments “I am incapable of that” “It won’t work out”
185 New thinking prompts new behaviors that lead to more reinforcers and fewer punishers, which changes depressive affect
186 New behaviors lead to more reinforcers and fewer punishers, which changes belief, which changes depressive affect
187 Amount of smoking causes cancer which causes lower quality of life H1: Amount of smoking correlated with QOL Entire sample has cancer H2: Exercise moderates the correlation between cancer and QOL. Analysis of constant variables
188 Do Not Covary your Main Effect For Williams Syndrome children, visual- motor skills will predict daily living skills above and beyond intelligence (IQ) –hand-eye coordination is measured by the WAIS performance tests Paul Paris class proposal example
191 Race vs. Ethnicity vs. Culture race = biology ethnicity is one aspect of culture culture is learned race does not always correspond with ethnicity one’s culture is combination of one’s family culture and mainstream culture (acculturation) NIMH categories: Latino is only an ethnicity
192 Internal Validity is Usually a Higher Priority than External Validity Ethnic minorities are usually underrepresented and researchers do not make extra effort to recruit them Consequence: conclusions often cannot be made about the relevance for ethnic minorities (ethnicity can be a moderator)
193 Sue (1999) Sue argues that we should make extra effort to recruit ethnic minorities to increase generalizability for social justice However: we cannot assume generalizability from a main effect since ethnicity can be a moderator it is often not feasible to have enough power to test moderator effects in a heterogeneous sample therefore, findings from heterogeneous samples are often ambiguous
194 Ethical Issues informed consent (vs. thoughtless compliance or coercion) –Milgram and Zimbardo studies –obtained by the therapist deception vs. withholding full rationale debriefing –mood improvement protocol –verify subjects understand and are back to normal confidentiality vs. anonymity