Download presentation
Presentation is loading. Please wait.
1
Correlation and regression
Week 4
2
Associational research
Looks at the relationship between two variables Usually continuous variables No manipulation of IV Correlation coefficient shows relationship between 2 variables Regression: equation used to predict outcome value based on predictor value Multiple regression: same, but uses more than 1 predictor
3
What is a correlation? Know that statistical model is:
๐๐ข๐ก๐๐๐๐ ๐ = ๐๐๐๐๐ + ๐๐๐๐๐ ๐ For correlation, this can be expressed as: ๐๐ข๐ก๐๐๐๐ ๐ = ๐ ๐ฅ ๐ + ๐๐๐๐๐ ๐ Simplified: outcome is predicted from predictor variable and some error b = Pearson product-moment correlation, or r
4
Covariance Covariance: extent to which 2 variables covary with one another Shows how much deviation with one variable is associated with deviation in the second variable
5
Covariance example
6
Covariance example
7
Covariance Positive covariance: As one variable deviates from mean, other variable deviates in same direction Negative covariance: As one variable deviates from mean, other variable deviates in opposite direction Problem with covariance: depends on scales variables measured on Canโt be compared across measures Need standardized covariance to compare across measures
8
Correlation Standardized measure of covariance
Known as Pearsonโs product-moment correlation, r
9
Correlation example From previous table:
10
Correlation Values range from -1 to +1
+1: perfect positive correlation: as one variable increases, other increases by proportionate amount -1: perfect negative correlation: as one variable increases, other decreases by proportionate amount 0: no relationship. As one variable changes, other stays the same
11
Positive correlation
12
Negative correlation
13
Small correlation
14
Correlation significance
Significance tested using t-statistic ๐ก ๐ = ๐ ๐โ2 1โ ๐ 2
15
Correlation and causality
Correlation DOES NOT imply causality!!! Only shows us that 2 variables are related to one another Why correlation doesnโt show causality: 3rd variable problem: some other variable (not measured) responsible for observed relationship No way to determine directionality: does a cause b, or does b cause a?
16
Before running a correlationโฆ
17
Bivariate correlation in SPSS
18
Note on pairwise & listwise deletion
Pairwise deletion: removes cases from analysis on an analysis-by-analysis basis 3 variables: A, B, & C Correlation matrix between A, B, & C Case 3 is missing data on variable B, but not on A or C Case 3 will be excluded from correlation between B & C, and A & B, but not from correlation beteween A & C Advantage: keep more of your data Disadvantage: not all analyses will include the same cases: can bias results
19
Note on pairwise & listwise deletion
Listwise deletion: removes cases from analysis if they are missing data on any variable under consideration 3 variables: A, B, & C Correlation matrix between A, B, & C Case 3 is missing data on variable B, but not on A or C Case 3 will be excluded from correlation between B & C, A & B, and A & C Advantage: less prone to bias Disadvantage: donโt get to keep as much data Usually a better option than pairwise
20
Correlation output
21
Interpreting correlations
Look at statistical significance Also, look at size of correlation: +/- .10: small correlation +/- .30: medium correlation +/- .50: large correlation
22
Coefficient of determination, R2
Amount of variance in one variable shared by other variable Example: pretend R2 between cognitive ability and job performance is .25 Interpretation: 25% of variance in cognitive ability shared by variance in job performance Slightly incorrect but easier way to think of it: 25% of the variance in job performance is accounted for by cognitive ability
23
Spearmanโs correlation coefficient
Also called Spearmanโs rho (ฯ) Non-parametric Based on ranked, not interval or ratio, data Good for minimizing effect of outliers and getting around normality issues Ranks data (lowest to highest score) Then, uses Pearsonโs r formula on ranked data
24
Kendallโs tau (ฯ) Non-parametric correlation Also ranks data
Better than Spearmanโs rho if: Small data set Large number of tied ranks More accurate representation of correlation in population than Spearmanโs rho
27
Point-biserial correlations
Used when one of the two variables is a truly dichotomous variable (male/female, dead/alive) In SPSS: Code one category of dichotomous variable as 0, and the other as 1 Run normal Pearsonโs r Example: point-biserial correlation of .25 between species (0=cat & 1=dog) and time spent on the couch Interpretation: a one unit increase in the category (i.e., from cats to dogs) is associated with a .25 unit increase in time spent on couch
28
Biserial correlation Used when one variable is a โcontinuous dichotomyโ Example: passing exam vs. failing exam Knowledge of subject is continuous variable: some people pass exam with higher grade than others Formula to convert point-biserial to biserial: P1=proportion of cases in category 1 P2=proportion of cases in category 2 y is from z-table: find value roughly equivalent to split between largest and smallest proportion See table on p. 887 in book
29
Biserial correlation Example:
Correlation between time spent studying for medical boards and outcome of test (pass/fail) was % of test takers passed. ๐ ๐ = โ = .46
30
Partial correlation Correlation between two variables when the effect of a third variable has been held constant Controls for effect of third variable on both variables Rationale: if third variable correlated (shares variance) with 2 variables of interest, correlation between these 2 variables wonโt be accurate unless effect of 3rd variable is controlled for
32
Partial correlation Obtain by going to Analyze-correlate-Partial
Choose variables of interest to correlate Choose variable to control
33
Semi-partial (part) correlations
Partial correlation: control for effect that 3rd variable has on both variables Semi-partial correlation: control for effect that 3rd variable has on one variable Useful for predicting outcome using combination of predictors
34
Calculating effect size
Can square Pearsonโs correlation to get R2: proportion of variance shared by variables Can also square Spearmanโs rho to get R2s: proportion of variance in ranks shared by variables Canโt square Kendallโs tau to get proportion of variance shared by variables
35
Regression Used to predict value of one variable (outcome) from value of another variable (predictor) Linear relationship Yi=(bo+b1x1)+ei = outcome = intercept: value of outcome (Y) when predictor (X) = 0 = slope of line: shows direction & strength of relationship = value of predictor (x) = deviation of predicted outcome from actual outcome
36
Regression ๐ ๐ and ๐ 1 are regression coefficients
Negative ๐ 1 : negative relationship between predictor and criterion Positive ๐ 1 : positive relationship between predictor and criterion Will sometimes see ฮฒ ๐ and ฮฒ 1 instead: these are standardized regression coefficients Put values in standard deviation units
37
Regression
38
Regression Regression example:
Pretend we have the following regression equation: Exam grade (Y) = (Hours spent studying) + error If we know that someone spends 10 hours studying for the test, what is the best prediction of their exam grade we can make? Exam grade = 45 + (.35*10) = 80
39
Estimating model Difference between actual outcome and outcome predicted by data
40
Estimating model Total error in model = ( ๐๐๐ ๐๐๐ฃ๐๐ ๐ โ ๐๐๐๐๐ ๐ ) 2
Called sum of squared residuals (SSR) Large SSR: Model not a good fit to data; small = good fit Ordinary least squares (OLS) regression: used to define model that minimizes sum of squared residuals
41
Estimating model Total sum of squares (SST): Total sum of squared differences between observed data and mean value of Y Model sum of squares (SSM): Improvement in prediction as result of using regression model rather than mean
43
Estimating model Proportion of improvement due to use of model rather than mean: ๐
2 = ๐๐ ๐ ๐๐ ๐ Also is indicator of variance shared by predictor and outcome F-ratio: statistical test for determining whether model describes data significantly better than mean ๐น= ๐๐ ๐ ๐๐ ๐
44
Individual predictors
b should be significantly different from 0 0 would indicate that for every 1 unit change in x, y wouldnโt change Can test difference between b and null hypothesis (b = 0) using t-test ๐ก= ๐ ๐๐๐ ๐๐๐ฃ๐๐ ๐๐ธ ๐
46
Multiple regression Week 5
47
Outliers in regression
Outlier can affect regression coefficient
48
Outliers in regression
Residual: difference between actual value of outcome and predicted value of outcome Large residuals: poorly-fitting regression model Small residuals: regression model good fit Unstandardized residual: difference between actual and predicted outcome value, measured in same units as outcome Standardized residual: Residuals converted to z-scores Studentized residual: unstandardized residual divided by estimate of standard deviation
49
Influential cases Influential case: value that strongly influences regression model parameter estimates Cookโs distance: measure of overall influence of case on the model Values larger than 1 = problem Leverage: shows influence of observed value of outcome variable over predicted values of outcome Average leverage = (k + 1)/n, where k is number of predictors and n is sample size Problematic values: (3(k + 1)/n)
50
Influential cases DFBETA: compares regression coefficient when case is excluded from the model to regression coefficient when case is included in the model Problematic if values larger than 2(โn) Mahalanobis distance: measures distance of case from mean of predictor variable(s) Chi square distribution with degrees of freedom equal to number of predictors Significant value = problem
51
Independent errors Durbin-Watson test: tests whether adjacent residuals are correlated Value of 2: residuals uncorrelated Value larger than 2: negative correlation between residuals Value smaller than 2: positive correlation between residuals Values greater than 3 or less than 1 problematic
52
Assumptions of linear regression models
Additivity and linearity: outcome linearly related to additive combination of predictors Independent errors: uncorrelated residuals Homoscedasticity: at all levels of predictor, should be equal variance of residuals Normally distributed errors (residuals) Predictors uncorrelated with external variables: external variables = variables not included in model that influence outcome variable
53
Assumptions of linear regression models
Predictors must be quantitative, or categorical with only 2 categories Can dummy-code variables if more than 2 categories Outcomes quantitative and continuous No perfect multicollinearity: No perfect linear relationship between predictor pairs Non-zero variance: predictors need to vary
55
Multiple regression Incorporates multiple predictors into regression model Predictors should be chosen based on theory/previous research Not useful to chuck lots of random predictors into model to see what happens
56
Semi-partial correlation
Foundation of multiple regression Measures relationship between predictor and outcome, controlling for relationship between that predictor and other predictors in the model Shows unique contribution of predictor in explaining variance in outcome
57
Reasons for multiple regression
Want to explain greater amount of variance in outcome โWhat factors influence adolescent drug use? Can we predict it better?โ Want to look at set of predictors in relation to outcome Very useful: human behavior rarely determined by just one thing โHow much do recruiter characteristics and procedural justice predict job satisfaction once hired?โ
58
Reasons for multiple regression
Want to see if adding another predictor (or set of predictors) will improve prediction above and beyond known set of predictors โWill adding a job knowledge test to current battery of selection tests improve prediction of job performance?โ Want to see if predictor(s) significantly related to outcome after controlling for effect of other predictors โIs need for cognition related to educational attainment, after controlling for socioeconomic status?โ
59
Entering predictors Hierarchical regression Forced entry Stepwise
Known predictors entered into model first New/untested predictors added into models next Good for assessing incremental validity Forced entry All predictors forced into model at same time Stepwise DONโT USE IT! Adds predictors based upon amount of variance explained Atheoretical & capitalizes on error/chance variation
60
Multicollinearity Perfect collinearity: one predictor has perfect correlation with another predictor Canโt get unique estimates of regression coefficients: both variables share same variance Lower levels of multicollinearity common
61
Multicollinearity Problems with multicollinearity:
Untrustworthy bs due to increase in standard error-more variable across samples Limits R: If two variables highly correlated, they share a lot of variance. Each will then account for very little unique variance in the outcome Adding predictor to model thatโs correlated strongly with existing predictor wonโt increase R by much even if on itโs own itโs strongly related to outcome Canโt determine importance of predictors: since variance shared between predictors, which accounts for more variance in outcome?
62
Multicollinearity Example: Youโre trying to predict social anxiety using emotional intelligence and number of friends as predictors What if emotional intelligence and number of friends are related?
63
Emotional intelligence
Multicollinearity Social anxiety Number of friends Emotional intelligence Both explain this variance in outcome
64
Multicollinearity Could have high R accompanied by very small bs
Variance inflation factor (VIF): evaluates linear relationship between predictor and other predictors Largest VIF greater than 10: problem Average VIF greater than 1: problem Calculate this by adding up VIF values across predictors, and then dividing by number of predictors Tolerance: reciprocal of VIF (1/VIF) Below .10: major problem Below .20: potential problem
65
Multicollinearity Many psychological variables are slightly correlated
Likely to run into big multicollinearity problems if you include 2 predictors measuring the same, or very similar, constructs Examples: Cognitive ability and problem-solving 2 different conscientiousness measures Job knowledge and a situational interview Scores on 2 different anxiety measures
66
Homoscedasticity Can plot zpred (standardized predicted values of DV based on model) against zresid (standardized residuals)
67
Homoscedasticity Should look like a random scatter of values
68
Multiple regression in SPSS
69
Multiple regression in SPSS
70
Multiple regression in SPSS
71
Regression output R: Correlation between actual outcome values, and values predicted by regression model R2: Proportion of variance in outcome predicted by model Adjusted R2: estimate of value in population (adjusted for shrinkage that tends to occur in cross-validated model due to sampling error)
72
Regression output F-test: compares variance explained by model to variance unaccounted for by model (error) Shows whether predictions based on model are more accurate than predictions made using mean
73
Regression output Beta (b) values: change in outcome associated with a one-unit change in a predictor Standardized beta (ฮฒ) values: beta values expressed as standard deviations
74
Practice time! The following tables show the results of a regression model predicting Excel training performance using 5 variables: self-efficacy (Setotal), Excel use (Rexceluse), Excel formula use (Rformulause), cognitive ability (WPTQ), and task-switching IAT score (TSA_score)
75
Interpret thisโฆ
76
And thisโฆ
77
And finally this
78
Moderation Week 6 and 7
79
Categorical variables
When categorical variable has 2 categories (male/female, dead/alive, employed/not employed), can put it directly into regression When categorical variable has more than 2 categories (freshman/sophomore/junior/senior, entry level/first line supervisor/manager), canโt input it directly into regression model Have to dummy code categorical variable
80
Categorical variables
Dummy variables: represent group membership using zeroes and ones Have to create a series of new variables Number of variables=number of categories - 1 Example: freshman/sophomore/junior/senior
81
Categorical variables
Eight steps in creating and using dummy coded variables in regression: 1. Count number of groups in variable and subtract 1 2. Create as many new variables as needed based on step 1 3. Choose one of groups as baseline to compare all other groups against Usually this will be the control group or the majority group 4. Assign values of 0 to all members of baseline group for all dummy variables
82
Categorical variables
5. For first dummy variable, assign 1 to members of the first group that you want to compare against baseline group. Members of all other groups get a 0. 6. For second dummy variable, assign 1 to all members of second group you want to compare against baseline group. Members of all other groups get a 0. 7. Repeat this for all dummy variables. 8. When running regression, put all dummy variables in same block
83
Categorical variables
Example: One variable with 4 categories: Freshman, sophomore, junior, senior.
84
Categorical variables
85
Categorical variables
86
Categorical variables
87
Categorical variables
88
Categorical variables
Each dummy variable is included in the regression output Regression coefficient for each dummy variable shows change in outcome that results when moving from baseline (0) to category being compared (1): difference in outcome between baseline group and other group Example: Compared to freshmen, seniorsโ attitudes towards college scores are 1.94 points higher Significant t-value: group coded as 1 for that dummy variable significantly different on outcome than baseline group
89
Moderation Relationship between 2 variables depends on the level of a third variable Interaction between predictors in model
90
Moderation Many research questions deal with moderation!
Example: In I/O psychology, moderation important for evaluating predictive invariance Does the relationship between a selection measure and job performance vary depending on demographic group (Male vs. female, White vs. Black, etc.)? Example: In clinical/counseling, moderation important for evaluating risk for mental illness Does the relationship between exposure to a stressful situation and subsequent mental illness diagnosis vary depending on the individualโs social support network?
91
Moderation How you would want to think about testing for Moderation in SPSS: include predictor and moderator and create third variable that multiplies moderator and predictor All 3 variables predict the outcome
92
Moderation ๐ ๐ = ๐ ๐ + ๐ 1 ๐ด ๐ + ๐ 2 ๐ต ๐ + ๐ 3 ๐ด๐ต ๐ + ๐ ๐
๐ ๐ = ๐ ๐ + ๐ 1 ๐ด ๐ + ๐ 2 ๐ต ๐ + ๐ 3 ๐ด๐ต ๐ + ๐ ๐ Basic regression equation with minor change: ๐ด๐ต ๐ Outcome depends on Intercept ( ๐ ๐ ) Score on variable A ( ๐ 1 ๐ด ๐ ), and relationship between variable A and Y Score on variable B ( ๐ 2 ๐ต ๐ ), and relationship between variable B and Y Interaction (multiplication) between scores on variables A and B (๐ 3 ๐ด๐ต ๐ ), and relationship between AB and Y
93
Moderation Moderator variables can be either categorical (low conscientiousness/high conscientiousness; male vs. female, etc.) or continuous (conscientiousness scores from 1-7) Categorical: can visualize interaction as two different regression lines, one for each group, which vary in slope (and possibly in intercept)
94
Moderation
95
Moderation Continuous moderator: visualize in 3-dimensional space: more complex relationship between moderator and predictor variable Slope of one predictor changes as values of moderator change Pick a few values of moderator and generate graphs for easier interpretation Simple slopes analysis: picks out few levels of predictor and moderator and look at our slopes
96
Moderation Prior to analysis, need to grand-mean center predictors
Doing makes interactions easier to interpret (why we center) Regression coefficients show relationship between predictor and criterion when other predictor equals 0 Not all variables have meaningful 0 in context of study: age, intelligence, etc. Could end up trying to interpret effects based on non-existing score (such as the level of job performance for person with intelligence score of 0) Once interactions are factored in, interpretation becomes increasingly problematic Also reduces nonessential multicollinearity (i.e., correlations due to the way that the variables were scaled) Categorical: predictor only Continuous: both predictor and moderator Interpreting regression coefficients when no meaningful zero skews our predictions Without centering, you may end up predicting outcomes based on non-existent score Centering will put variables on new metric with a mean of 0 doesnโt help with multicollinearity much Any nuisance correlation will be taken care of through centering
97
Moderation Grand mean centering: subtract mean of variable from all scores on that variable Centered variables used to calculate interaction term Creates interaction variable Donโt center categorical predictors Just make sure it is scaled 0 and 1 Donโt center outcome/dependent variable Centering only applies to predictors Only center continuous predictor variables
98
Moderation For centered variable, value of 0 represents the mean value on the predictor Since transformation is linear, doesnโt change regression model substantially Interpretation of regression coefficients easier Without centering: interaction = how outcome changes with one-unit increase in moderator when predictor = 0 With centering: interaction = how outcome changes with one-unit increase in moderator when predictor = mean 0 is mean of our new variable Doesnโt change SD of variable Allowing to compare regression lines at the mean Small interaction term: not much of a difference Big interaction: moderator had big difference in terms of outcome Simple slopes analysis creates graph to allow us to see relationship between predictor and outcome at different levels of the moderator
99
Grand mean centering Run descriptives for all variable to center
Centered video games variable Transform > Compute variable > Create new variable Video_centered > subtracted the mean
100
Moderation Steps for moderation in SPSS:
1. Grand-mean center continuous predictor(s) 2. Enter both predictor variables into first block 3. Enter interaction term in second block Doing it this way makes it easier to look at R2 change 4. Run regression and look at results 5. If interaction term significant: Categorical predictor: Line graph between predictor and DV, with a different line for each category Continuous predictor: Simple slopes analysis Hierarchical regression Predictor in first block > include centered rather than raw variable > block 2 interaction term (new variable created by multiplying predictor and moderator together) Run regression and look at results Interaction term is significant, not finished (p value significance indicates another step) Simple slopes analysis
101
Simple slopes analysis
Basic idea: values of outcome (Y) calculated for different levels of predictor and moderator: low, medium, and high Usually defined as -1 SD, mean, + 1 SD Recommend using online calculator for these (can be done by hand, but itโs a pain)
102
Simple slopes analysis
Example: Aggression = (.17*video) + (.76*callous) + (.027(video*callous) For 1 SD below on video games at low levels of callous unemotionality: (.17* )+(.76* )+(.027*( * ) = 33.29 Would do this 8 more times so that you had values of aggression at low, medium, and high levels of callous unemotionality and video game playing Value of outcome is calculated for different levels of the predictor and the moderator (low โ 1 SD below mean, med - mean, and high โ 1 SD above mean) Plug in values for low, med, and high Process* is an add-on available afhayes.com or processmacro.org For personal copy of SPSS
103
Simple slopes analysis
Every possible interaction calculated
104
Creating interaction term
Predictor and moderator and multiply together (use centered variables; unless categorical)
105
Entering variables 1st block predictor and moderator
2nd block interaction term
106
Entering variables Hierarchical option to see how much R2 changes
107
Output .027 is saying that when predictor is at itโs mean there is a difference of .027 in aggression as we move from one level of callousness to the next level of callousness positive: higher callousness is the more of a relationship there will be between video game playing and aggression Negative: as callousness decreases, weaker relationship between video game playing and aggression Next step is simple slope analysis b/c of significance
108
Simple slopes analysis
For ppl low in callousness, aggression levels didnโt really change High in callousness increases aggression If you have significant interaction, always do a simple slope analysis โ makes life easier!
109
Rescale graph Right click > Select Format Axis > Change min value to 0
110
Interaction between Attractiveness (predictor) and Support (outcome) with Gender as moderator
First step: center variables Mean of attractiveness (descriptives) Centered mean Name variable > move attract into numeric expression box and enter Attractiveness > Ok Interaction term Transform > compute Variable > rename Attract_x_gender > move attract_cent > multiplication sign * > move gender over Attract_centered * Gender Can now run moderated analysis Regression Analyze > Regression > Linear > Support as dependent (donโt center dependent!!!) > 1st block predictors only (centered attract and gender) > Click Next > move over interaction term into 2nd block > Ok Running regression with 3 different things predicting support (attraction, gender, and interaction between attract and gender) Significant interaction: regression lines of groups cross at some point. Cross or distinctly not parallel means significant interaction Slope analysis is next step!
112
Research Designs Comparing Groups
Week 8
113
Quasi-experimental designs
114
Quasi-experiments No random assignment
Goal is still to investigate relationship between proposed causal variable and an outcome What they have: Manipulation of cause to force it to happen before outcome Assess covariation of cause and effect What they donโt have: Limited in ability to rule out alternative explanations But design features can improve this
115
One group posttest only design
Problems: No pretest: did anything change? No control group: what would have happened if IV not manipulated? Doesnโt control for threats to internal validity X O1 Do not use this!! Only measures one group Graph: x=manipulation, o=observation
116
One group posttest only design
Example: An organization implemented a new pay-for- performance system, which replaced its previous pay-by- seniority system. A researcher was brought in after this implementation to administer a job satisfaction survey
117
One group pretest-posttest design
Adding pretest allows assessment of whether change occurred Major threats to internal validity: Maturation: change of participants due to natural causes History: change due to historical event (recession, etc.) Testing: desensitizing participants to the test, using the same pretest for posttest O1 X O2 O1=pretest, O2=post-test
118
One group pretest-posttest design
Example: An organization wanted to implement a new pay-for-performance system to replace its pay-by- seniority system. A researcher was brought in to administer a job satisfaction questionnaire before the pay system change, and again after the pay system change
119
Removed treatment design
Treatment given, and then removed 4 measurements of DV: 2 pretests, and 2 posttests If treatment affects DV, DV should go back to its pre- treatment level after treatment removed Unlikely that threat to validity would follow this same pattern Problem: assumes that treatment can be removed with no lingering effects May not be possible or ethical (i.e., ethical conundrum: taking away schizophrenic patientsโ medicine treatment; possibility conundrum: therapy for depression, benefits would still be experienced) ) O1 X O2 O3 O4 X = treatment was removed
120
Removed treatment design
Example: A researcher wanted to evaluate whether exposure to TV reduced memory capacity. Participants first completed a memory recall task, then completed the same task while a TV plays a sitcom in the background. After a break, participants again complete the memory task while the TV plays in the background, then complete it again with the TV turned off.
121
Repeated treatment design
O1 X O2 O3 O4 Treatment introduced, removed, and then re-introduced Threat to validity would have to follow same schedule of introduction and removal-very unlikely Problem: treatment effects may not go away immediately Very good at controlling for threats to validity
122
Repeated treatment design
Example: A researcher wanted to investigate whether piped-in classical music decreased employee stress. She administered a stress survey, and then piped in music. One week later, stress was measured again. The music was then removed, and stress was measured again one week later. The music was then piped in again, and stress was measured a final time one week later.
123
Posttest-only with nonequivalent groups
NR X O1 O2 Participants not randomly assigned to groups One group receives treatment, one does not DV measured for both groups Big validity threat: selection NR = not randomly assigned
124
Posttest-only with nonequivalent groups
Example: An organization wants to implement a policy against checking after 6pm in an effort to reduce work-related stress. The organization assigns their software development department to implement the new policy, while the sales department does not implement the new policy. After 2 months, employees in both departments complete a work stress scale.
125
Untreated control group with pretest and posttest
NR O1 X O2 Pretest and posttest data gathered on same experimental units Pretest allows for assessment of selection bias Also allows for examination of attrition Same as previous, just giving each group pretest
126
Untreated control group with pretest and posttest
Example: A community is experimenting with a new outpatient treatment program for meth addicts. Current treatment recipients had the option to participate (experimental group) or not participate (control group). Current daily use of meth was collected for all individuals. Those in the experimental group completed the new program, while those in the control group did not. Following the program, participants in both groups were asked to provide estimates of their current daily use of meth.
127
Switching replications
NR O1 X O2 O3 Treatment eventually administered to group that originally served as control Problems: May not be possible to remove treatment from one group Can lead to compensatory rivalry Switching the treatment
128
Switching replications
Example: An organization implemented a new reward program to reduce absences. After a month of no absences, employees wereโฆThe manufacturing organization from the previous scenario removed the reward program from the Ohio plant, and implemented it in the Michigan plant. Absences were gathered and compared 1 month later.
129
Reversed-treatment control group
Control group given treatment that should have opposite effect of that given to treatment group Rules out many potential validity threats Problems: may not be feasible (pay/performance, whatโs the opposite?) or ethical NR O1 X+ O2 X-
130
Reversed-treatment control group
Example: A researcher wanted to investigate the effect of mood on academic test performance. All participants took a pre-test of critical reading ability. The treatment group was put in a setting which stimulated positive mood (calming music, lavender scent, tasty snacks) while the control group was put in a setting which stimulated negative mood (annoying childrenโs show music, sulfur scent, no snacks). Participants then completed the critical reading test again in their respective settings.
131
Randomized experimental designs
132
Randomized experimental designs
Participants randomly assigned to groups Random assignment: any procedure that assigns units to conditions based on chance alone, where each unit has a nonzero probability of being assigned to any condition NOT random sampling! Random sampling concerns how sample obtained Random assignment concerns how sample assigned to different experimental conditions
133
Why random assignment? Researchers in natural sciences can rigorously control extraneous variables People are tricky. Social scientists canโt exert much control. Canโt mandate specific level of cognitive ability, exposure to violent TV in childhood, attitude towards women, etc. Random assignment to conditions reduces chances that some unmeasured third variable led to observed covariation between presumed cause and effect
134
Why random assignment? Example: what if you assigned all participants who signed up in the morning to be in the experimental group for a memory study, and all those who signed up in the afternoon to be in the control group? And those who signed up in the morning had an average age of 55 and those who signed up in the afternoon had an average age of 27? Could difference between experimental and control groups be attributed to manipulation?
135
Random assignment Since participants randomly assigned to conditions, expectation that groups are equal prior to experimental manipulations Any observed difference attributable to experimental manipulation, not third variable Doesnโt prevent all threats to validity Just ensures theyโre distributed equally across conditions so they arenโt confounded with treatment
136
Random assignment Doesnโt ensure groups are equal
Just ensures expectation that they are equal No obvious reason why they should differ But they still could Example: By random chance, average age of control group may be higher than average age of experimental group
137
Random assignment Random assignment guarantees equality of groups, on average, over many experiments Does not guarantee that any one experiment which uses random assignment will have equivalent groups Within any one study, groups likely to differ due to sampling error But, if random assignment process was conducted over infinite number of groups, average of all means for treatment and control groups would be equal
138
Random assignment If groups do differ despite random assignment, those differences will affect results of study But, any differences due to chance, not to way in which individuals assigned to conditions Confounding variables unlikely to correlate with treatment condition
139
Posttest-only control group design
X O Random assignment to conditions (R) Experimental group given treatment/IV manipulation (X) Outcome measured for both groups (O)
140
Posttest-only control group design
Example: Participants assigned to control group (no healthy eating seminar) or treatment group (90 minute healthy eating seminar) 6 months later, participants given questionnaire assessing healthy eating habits Scores on questionnaire compared for control group and treatment group
141
Problems with posttest-only control group design
No pretest If attrition occurs, canโt see if those who left were any different than those who completed study No pretest makes it difficult to assess change on outcome
142
Pretest-posttest control group design
X O Randomly assigned to conditions Given pretest (P) measuring outcome variable One group given treatment/IV manipulation Outcome measured for both groups Variation: can randomly assign after pretest
143
Pretest-posttest control group design
Example: Randomly assign undergraduate student participants to control group and treatment group Give pretest on attitude towards in-state tuition for undocumented students Control group watches video about history of higher education for 20 minutes, while treatment group watches video explaining challenges faced by undocumented students in obtaining college degree Give posttest on attitude towards in-state tuition for undocumented students
144
Factorial designs Have 2 or more independent variables 3 advantages:
Naming logic: # of levels in IV1 x # of levels in IV2 x โฆ# of levels in IV X 3 advantages: Require fewer participants since each participant receives treatment related to 2 or more IVs Treatment combinations can be evaluated Interactions can be tested
145
Factorial designs R XA1B1 O XA1B2 XA2B1 XA2B2 For 2x2 design:
Randomly assign to conditions (there are 4) Each condition represents 1 of 4 possible IV combinations Measure outcome Variables: A and B Levels: 1 and 2 First row: XA1B1 is level 1 of A and level 1 of B Second row: XA1B2 is level 1 of A and level 2 of B
146
Factorial designs Example:
2 IVs of interest: room temperature (cool/hot) and noise level (quiet/noisy) DV = number of mistakes made in basic math calculations Randomly assign to 1 of 4 groups: Quiet/cool Quiet/hot Noisy/cool Noisy/hot Measure number of mistakes made in math calculations Compare means across groups using factorial ANOVA
147
Factorial designs 2 things we can look for with these designs:
Main effects: average effects of IV across treatment levels of other IV Did participants do worse in the noisy than quiet conditions? Did participants do worse in the hot than cool conditions Main effect can be misleading if there is a moderator variable Interaction: Relationship between one IV and DV depends on level of other IV Noise level positively related to number of errors made, but only if room hot When looking at main effect of one variable, you ignore the other variable โ looking at each IV on itโs on and how it relates to DV
148
Within-subjects randomized experimental design
Participants randomly assigned to either order 1 or order 2 Participants in order 1 receive condition 1, then condition 2 Participants in order 2 receive condition 2, then condition 1 Having different orders prevents order effects Having participants in more than 1 condition reduces error variance R Order 1 Condition 1 O1 Condition 2 O2 Order 2 Same people are in both conditions โ two conditions are the same due to all people experience both conditions; reduces selection bias and increases statistical power
149
Within-subjects randomized experimental design
Example: Participants randomly assigned to order 1 or order 2 Participants in order 1 reviewed resumes with the applicantโs picture attached and made hiring recommendations. They then reviewed resumes without pictures and made hiring recommendations. Participants in order 2 reviewed resumes without pictures and made hiring recommendations. They then reviewed resumes with the applicantโs picture attached and made hiring recommendations. Very important to counterbalance the order. Do not want participants doing tasks or experiencing treatment in the exact same order.
150
Data analysis
151
With 2 groups Need to compare 2 group means to determine if they are significantly different from one another If groups independent, use independent samples t-test If participants in one group are different from the participants in the other group If repeated measures design, use repeated measures t- test
152
With 3 or more groups Still need to compare group means to determine if they are significantly different If only 1 IV, use a one-way ANOVA If 2 or more IVs, use a factorial ANOVA If groups are not independent, use repeated measures ANOVA
153
Design practice Research question:
Does answering work-related communication ( s, phone calls) after normal working hours affect work-life balance? Design BOTH a randomized experiment AND a quasi- experiment to evaluate your research question For each design (random and quasi): Operationalize variables and develop a hypothesis(es) Name and explain the experimental design as it will be used to test your hypothesis(es) Name and explain one threat to internal validity in your design This was worth extra credit ๏
154
Comparing means Week 9
155
Comparing means 2 primary ways to evaluate mean differences between groups: t-tests ANOVAs Which one you use will depend on how many groups you want to compare, and how many IVs you have 2 groups, 1 IV, 1 DV: t-test 3 or more groups, 1 or more IVs, 1 DV: ANOVA One-way ANOVA if only 1 IV Factorial ANOVA if 2 or more IVs
156
t-tests Used to compare means on one DV between 2 groups
Do men and women differ in their levels of job autonomy? Do students who take a class online and students who take the same class face-to-face have different scores on the final test? Do individuals report higher levels of positive affect in the morning than they report in the evening? Do individuals given a new anti-anxiety medication report different levels of anxiety than individuals given a placebo? Plenty of situations in where we need to compare two groups with the DV
157
t-tests 2 different options for t-tests:
Independent samples t-test: individuals in group 1 are not the same as individuals in group 2 Do self-reported organizational citizenship behaviors differ between men and women? Repeated measures t-test: individuals in group 1 are the same as individuals in group 2 Do individuals report different levels of job satisfaction when surveyed on Friday than they do when surveyed on Monday?
158
A note on creating groups
Beware of dichotomizing a continuous variable in order to make 2 groups Example: everyone who scored a 50% or below on a test goes in group 1, and everyone who scored 51% or higher goes in group 2 Causes several problems People with very similar scores around cut point may end up in separate groups Reduces statistical power Increases chances of spurious effects Relevant for t-tests and ANOVA Ex: satisfaction with course and test scores, want to compare high to low test scores (dichotomized test scores) โ when artificially dichot a continuous variable can create problems 1. Ppl with very similar scores can end up in different groups 2. Reduces statistical power (anytime you work with a categorical variable, you will reduce stat power) โ difficult to find significant result 3.Does not fit with the way the variable was collected T-tests are good when you have a categorical variable on itโs own โ do not dichotomize variable just so you can do a t-test
159
t-tests and the linear model
t-test is just linear model with one binary predictor variable ๐ ๐ = ๐ 0 + ๐ 1 ๐ฅ 1 + ๐ ๐ Predictor has 2 categories (male/female, control/experimental) Dummy variable: 0=baseline group, 1 = experimental/comparison group ๐ 0 is equal to mean of group coded 0 ๐ 1 is equal to difference between group means T-test is no difference with a regression model with one predictor and 2 categories
160
Rationale for t - test 2 sample means collected-need to see how much they differ If samples from same population, expect means to be roughly equivalent Large differences unlikely to occur due to chance When we do a t-test, we compare difference between sample means to difference we would expect if null hypothesis was true (difference = 0)
161
Rationale for t-test Standard error = gauge of differences between means likely to occur due to chance alone Small standard error: expect similar means if both samples from same population Large standard error: expect somewhat different means even if both samples from same population t-test evaluates whether observed difference between means is larger than would be expected, based on standard error, if samples from same population If there is a difference between our means and itโs large enough that it would be significant, then we would reject the null Standard error: Gauge of the difference b/w means if the change was due to chance alone Small SE: Similar means if both samples came from the same population Large SE: Even if sample came from same population, we would expect to see differences between the means; ussualy happens with very small sample, measure variables poorly
162
Rationale for t-test Top half of equation = model
Bottom half of equation = error
163
Independent samples t-test
Use when each sample contains different individuals Look at ratio of between-group difference in means to estimate of total standard error for both groups Variance sum law: variance of difference between 2 independent variables = sum of their variances Use sample standard deviations to calculate standard error for each populationโs sampling distribution
164
Independent samples t-test
Assuming that sample sizes are equal: ๐ก= ๐ 1 โ ๐ ๐ 2 1 ๐ ๐ 2 2 ๐ 2 Top half: difference between means Bottom half: each sampleโs variance divided by its sample size Top half: ( ๐ 1 ) mean of group 1 โ mean of group 2 Bottom half: ( ๐ 2 1 ) variance for sample 1/( ๐ 1 )sample size sample 1 + variance for sample 1/sample size sample 1
165
Independent samples t-test
If sample sizes are not equal, need to use pooled variance, which weights variance for each sample to account for sample size differences Pooled variance: Important: Sample that is bigger would have an undue influence of the variance estimate if you didnโt weight the samples
166
Independent samples t-test
Equation for independent samples t-test with different sample sizes: Differences between groups Error Pooled variance/sample size
167
Paired samples/repeated measures t-test
Use when same people are in both samples Average difference between scores at measurement 1 and measurement 2: ๐ท Shows systematic variation between measurements Difference that we would expect between measurements if null hypothesis true: ๐ ๐ท Since null hypothesis says that difference = 0, this cancels out Measure of error = standard error of differences: ๐ ๐ท ๐ Use anytime you have the same people in the same sample ( ๐ท = average difference between scores) ๐ ๐ท = 0 which cancels out of the equation
168
Paired samples/repeated measures t-test
= 0 and cancels out (what we would expect to see if the null were true)
169
Assumptions of t-tests
Both types of t-tests are parametric and assume normality of sampling distribution For repeated measures, refers to sampling distribution of differences Data on DV have to be measured at interval level Canโt be nominal or ordinal Independent samples t-test assumes variances of each population equivalent (homogeneity of variance) Also assumes scores in each sample independent of scores in other sample
170
Assumptions of t-tests
Independent samples t-tests will automatically do Leveneโs test for you If Leveneโs not significant, homogeneity of variance assumption met: interpret first line of output (equal variances assumed) If Leveneโs is significant, homogeneity of variance assumption not met: interpret second line out output (equal variances not assumed)
171
Independent samples t-test example
DV = Number of items skipped on ability test Group 1: Took test in unproctored setting Group 2: took test in proctored setting
172
Independent samples t-test example
173
Independent samples t-test example
DV into test variable IV into grouping variable spot (do not worry about ??) Pop-up window asking how groups were coded > Select continue
174
Independent samples t-test
1st line of output: statistics broken up by groups Next line is actual t-test: 1st two boxees are Leveneโs test (DO NOT INTERPRET AS T-TEST if Leveneโs test is significant!!!) Interpret 2nd line of output (equal variances not assumed) โ t = 7.650, df = , p < .000 (use 2nd significance value; p never equal to 0 โ SPSS reads p=.000 as p < .000) Use first three boxes on 2nd line AFTER Leveneโs test
175
Analyze > compare means > independent sample t-test
Day 1 hygiene score over into test variables box > move gender into grouping variable box > Select define groups and set Group 1 as 0 and Group 2 as 1 > Select continue > select OK
176
Look at Leveneโs test: if non-sign. Look at 1st line for t-test
Difference is significant p< .05 Negative t values subtracts Female from Male mean (whatever second group was had a higher mean than the first group; hint: look at group statistics at group means) Women had higher hygiene scores than men Dft = (n1 + n2) โ 2
177
Independent samples t-test
Need to report effect size Can convert to r: r = โ(7.65*7.65)/(7.65*7.65) ))= .184 Values taken from Slide 21 Independent samples t-test (proctored v. unproctored)
178
Independent samples t-test
More commonly use d: ๐ = ๐ 1 โ ๐ 2 ๐ 2 d = ( )/1.431 = 0.23 Note on d: Book shows d calculation using only 1 sd In practice, more common to use pooled standard deviation Interpretation (Cohen, 1988): .20 = small, .50 = medium, .80 = large Negative d means that ๐ 2 larger than ๐ 1 Values taken from Slide 21 Independent samples t-test (proctored v. unproctored) SPSS will not find your d (tip: lots of t-tests, use excel spreadsheet and will calculate Cohenโs d)
179
Repeated measures t-test example
DV = Perceptions of procedural justice Measurement 1: Participants took one type of Implicit Association Test (task-switching ability) Measurement 2: Participants took traditional cognitive ability test (WPT-Q) Rationale behind repeated measures is the same as independent t-test (difference is sample content) Repeated used because procedural justice perceptions were measured for both measurements
180
Repeated measures t-test example
Paired samples t-test is the same as repeated measures t-test
181
Repeated measures t-test example
Variable 1 is scores for time 1 (move over 1st) and variable 2 is scores for time 2 (move over 2nd) > Select ok
182
Repeated measures t-test example
Paired Samples Correlations: looking at the correlation -group time 1 and time 2; correlation is strong here Really interested with Paired Samples Test: Leveneโs test not necessary because our means came from the same population/samples 1st line (column) is mean difference between groups: tells us that 2nd measurement had a higher mean t-value is -9.74, meaning 2nd measure had higher mean df is a little bit different because we only have 1 sample (dftr = N-1)
183
Repeated measures t-test effect sizes
Still need to calculate effect sizes Problem with r in repeated measures t-test: tends to over-estimate effect size Better off using d with repeated measures designs: better estimate of effect size Formula for repeated measures d = (D โ ฮผD)/S r not the best choice for repeated measures design because of overestimation of effect size
184
Comparing the t-tests If you have the same people in both groups, ALWAYS use repeated measures t-test (or you violate one of the assumptions of the independent t-test) Non-independence of errors violates assumptions of independent samples t-test Power is higher in repeated measures t-test Reduces error variance by quite a bit since same participants are in both samples Different people in groups brings a certain amount of error because they bring their own idiosyncrasies. Increase random error with independent t-test Repeated measures t-test allows us to make do with fewer participants
185
Use one participant per row
186
Fear scores were higher (saw the real spider) for the second measure than the first measure
When writing up result that you have same participants in both results โ make it very clear that it was a repeated measures design and that the same people were used for measure 1 and measure 2
187
One-way ANOVA ANOVA = analysis of variance
One-way ANOVA allows us to compare means on a single DV across more than 2 groups
188
Why we need ANOVA Doing multiple t-tests (control vs. group 1, control vs. group 2, etc.) on data inflates the Type I error rate beyond acceptable levels Familywise error rate assuming ฮฑ = .05 for each test: 1 โ (.95)n n = number of comparisons being made So, with 3 comparisons, overall ฮฑ = .143 With 4 comparisons, overall ฮฑ = .185 ฮฑ = .05, psychology standard Multiple comparisons using same DV, increase error rate Greatly increases chances of Type I error if you do a bunch a t-tests
189
ANOVA and the linear model
Mathematically, ANOVA and regression are the same thing! ANOVA output: F-ratio: comparison of systematic to unsystematic variance Same as F ratio in regression: shows improvement in prediction of outcome gained by using model as compared to just using mean Only difference between ANOVA and regression: predictor is categorical variable with more than 2 categories Exactly the same as using dummy variables in regression Linear model with # of predictors equal to number of groups - 1
190
ANOVA and the linear model
Intercept (b0) will be equal to the mean of the baseline group (group coded as 0 in all dummy variables Regression coefficient b1 will be equal to the difference in means between baseline group and group 1 Regression coefficient b2 will be equal to the difference in means between baseline group and group 2
191
F ratio ๐น= ๐ ๐ฆ๐ ๐ก๐๐๐๐ก๐๐ ๐ฃ๐๐๐๐๐๐๐ ๐ข๐๐ ๐ฆ๐ ๐ก๐๐๐๐ก๐๐ ๐ฃ๐๐๐๐๐๐๐ (๐๐๐๐๐)
๐น= ๐ ๐ฆ๐ ๐ก๐๐๐๐ก๐๐ ๐ฃ๐๐๐๐๐๐๐ ๐ข๐๐ ๐ฆ๐ ๐ก๐๐๐๐ก๐๐ ๐ฃ๐๐๐๐๐๐๐ (๐๐๐๐๐) Systematic variance, in ANOVA, is mean differences between groups Null hypothesis: group means are same In this case, systematic variance would be small Thus, F would be small
192
ANOVA logic Simplest model we can fit to data is grand mean (of DV)
We try to improve on this prediction by creating a more complex model Parameters include intercept (b0) and one or more regression coefficients (b1, b2, etc.) Bigger regression coefficients = bigger differences between groups If between group differences large, model better fit to data than grand mean If model fit is better than grand mean, then between-group differences are significant
193
Total sum of squares (SST)
This shows the total amount of variation within the data Grand mean on DV subtracted from each observationโs value on DV Total degrees of freedom for SST: N-1 Total amount of variation around our grand mean in our data.
194
Model sum of squares (SSM)
This shows how much variance the linear model explains Calculate difference between mean of each group and grand mean, square this value (each value), then multiply it by the number of participants in the group Add the values for each group together Degrees of freedom: k โ 1, where k is number of groups Variance explained by group membership
195
Residual sum of squares (SSR)
This shows differences in scores that arenโt explained by model (i.e., arenโt explained by between-group differences) Calculated by subtracting the group mean from each score, squaring this value, and then adding all of the values together Degrees of freedom = N โ k, where k = number of groups and N is overall sample size Error part of ANOVA โ unsystematic variance Any variance not accounted for by group membership
196
Mean squares To get a mean square value, divide sum of squares value by its degrees of freedom Mean square model (MSM) = SSM/k-1 Mean square residual (MSR) = SSR/N - k
197
F ratio Calculated using mean square values:
Degrees of freedom for F: (k-1), (N โ k) If F is statistically significant, group means differ by more than they would if null hypothesis were true F is omnibus test: only tells you whether group means differ significantly: thereโs a difference somewhere Doesnโt tell you which means differ from one another Need post-hoc tests to determine this F-test problematic because it only tells us that group means differ significantly, doesnโt tell us which groups differed significantly Post-hoc tests lets you look at which groups differed significantly from one another
198
Post-hoc tests Pairwise comparisons to compare all groups to one another All incorporate correction so that Type I error rate is controlled (at about .05) Example: Bonferroni correction (very conservative): use significance level( usually .05) ฮฑ/n, where n is number of comparisons So, if we have 3 groups and we want to keep ฮฑ at .05 across all comparisons, each comparison will have ฮฑ = .017 Restricts alpha level for each comparisons
199
Post-hoc tests Lots of options for post hoc tests in SPSS
Some notes on the more common ones: Least significant difference (LSD): doesnโt control Type I error very well Bonferroniโs and Tukeyโs: control Type I error rate, but lack statistical power (too conservative) REGWQ: controls Type I error and has high power, but only works if sample sizes equal across groups Games-Howell: less control of Type I error, but good for unequal sample sizes and unequal variance across groups Dunnettโs T3: good control of Type I error, works if unequal variance across groups
200
Assumptions of ANOVA Homogeneity of variance: can check with Leveneโs test If Leveneโs significant and homogeneity of variance assumption violated, need to use corrected F ratio Brown-Forsyth F Welchโs F Provided group sizes equal, ANOVA works ok if normality assumption violated somewhat If group sizes not equal, ANOVA biased if data non-normal Non-parametric alternative to ANOVA: Kruskal-Wallis test (book covers in detail)
201
Steps for doing ANOVA
202
Effect sizes for ANOVA R2: SSM/SST
When applied to ANOVA, value called eta squared, ฮท2 Somewhat biased because itโs based on sample only: doesnโt adjust for looking at effect size in population SPSS reports partial eta squared, but only for factorial ANOVA: SSB/SSB+ SSE Better effect size measure for ANOVA: omega-squared (ฯ2 ; SPSS will not measure for you) ๐ 2 = ๐๐ ๐ โ( ๐๐ ๐ ) ๐๐ ๐
๐๐ ๐ + ๐๐ ๐
eta squared โ Not preferred method for ANOVA Top: model variance โ error variance Bottom: error
203
One-way ANOVA in SPSS IV: Counterproductive work behavior (CWB) scale that varied in its response anchors: control, infrequent, & frequent DV: self-reported CWB
204
One-way ANOVA in SPSS
205
One-way ANOVA in SPSS IV into Factor box
Select post-hoc โฆ leads into next slide
206
One-way ANOVA in SPSS
207
One-way ANOVA in SPSS Main ANOVA: f-value
Look at post-hoc tests to see which groups significantly differ from one another People on the frequent scale reported more CWB than the traditional scale (1st grouping/top) Frequent scale more CWB than infrequent scale (2nd grouping/middle) Duplicate values at the bottom of Multiple Comparisons box
208
One-way ANOVA in SPSS Calculating omega-squared:
๐ 2 = 21.49โ = .025 Suggestions for interpreting ๐ 2 : .01 = small .06 = medium .14 = large
209
Analyze > Compare means > One-way ANOVA
210
Do not report the same comparison twice.
Write down the differences so you wonโt mistakenly list comparisons twice.
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.