Correlation and regression

Correlation and regression
Week 4

Associational research
Looks at the relationship between two variables Usually continuous variables No manipulation of IV Correlation coefficient shows relationship between 2 variables Regression: equation used to predict outcome value based on predictor value Multiple regression: same, but uses more than 1 predictor

What is a correlation? Know that statistical model is:
𝑜𝑢𝑡𝑐𝑜𝑚𝑒 𝑖 = 𝑚𝑜𝑑𝑒𝑙 + 𝑒𝑟𝑟𝑜𝑟 𝑖 For correlation, this can be expressed as: 𝑜𝑢𝑡𝑐𝑜𝑚𝑒 𝑖 = 𝑏 𝑥 𝑖 + 𝑒𝑟𝑟𝑜𝑟 𝑖 Simplified: outcome is predicted from predictor variable and some error b = Pearson product-moment correlation, or r

Covariance Covariance: extent to which 2 variables covary with one another Shows how much deviation with one variable is associated with deviation in the second variable

Covariance example

Covariance Positive covariance: As one variable deviates from mean, other variable deviates in same direction Negative covariance: As one variable deviates from mean, other variable deviates in opposite direction Problem with covariance: depends on scales variables measured on Can’t be compared across measures Need standardized covariance to compare across measures

Correlation Standardized measure of covariance
Known as Pearson’s product-moment correlation, r

Correlation example From previous table:

Correlation Values range from -1 to +1
+1: perfect positive correlation: as one variable increases, other increases by proportionate amount -1: perfect negative correlation: as one variable increases, other decreases by proportionate amount 0: no relationship. As one variable changes, other stays the same

Positive correlation

Negative correlation

Small correlation

Correlation significance
Significance tested using t-statistic 𝑡 𝑟 = 𝑟 𝑁−2 1− 𝑟 2

Correlation and causality
Correlation DOES NOT imply causality!!! Only shows us that 2 variables are related to one another Why correlation doesn’t show causality: 3rd variable problem: some other variable (not measured) responsible for observed relationship No way to determine directionality: does a cause b, or does b cause a?

Before running a correlation…

Bivariate correlation in SPSS

Note on pairwise & listwise deletion
Pairwise deletion: removes cases from analysis on an analysis-by-analysis basis 3 variables: A, B, & C Correlation matrix between A, B, & C Case 3 is missing data on variable B, but not on A or C Case 3 will be excluded from correlation between B & C, and A & B, but not from correlation beteween A & C Advantage: keep more of your data Disadvantage: not all analyses will include the same cases: can bias results

Note on pairwise & listwise deletion
Listwise deletion: removes cases from analysis if they are missing data on any variable under consideration 3 variables: A, B, & C Correlation matrix between A, B, & C Case 3 is missing data on variable B, but not on A or C Case 3 will be excluded from correlation between B & C, A & B, and A & C Advantage: less prone to bias Disadvantage: don’t get to keep as much data Usually a better option than pairwise

Correlation output

Interpreting correlations
Look at statistical significance Also, look at size of correlation: +/- .10: small correlation +/- .30: medium correlation +/- .50: large correlation

Coefficient of determination, R2
Amount of variance in one variable shared by other variable Example: pretend R2 between cognitive ability and job performance is .25 Interpretation: 25% of variance in cognitive ability shared by variance in job performance Slightly incorrect but easier way to think of it: 25% of the variance in job performance is accounted for by cognitive ability

Spearman’s correlation coefficient
Also called Spearman’s rho (ρ) Non-parametric Based on ranked, not interval or ratio, data Good for minimizing effect of outliers and getting around normality issues Ranks data (lowest to highest score) Then, uses Pearson’s r formula on ranked data

Kendall’s tau (τ) Non-parametric correlation Also ranks data
Better than Spearman’s rho if: Small data set Large number of tied ranks More accurate representation of correlation in population than Spearman’s rho

Point-biserial correlations
Used when one of the two variables is a truly dichotomous variable (male/female, dead/alive) In SPSS: Code one category of dichotomous variable as 0, and the other as 1 Run normal Pearson’s r Example: point-biserial correlation of .25 between species (0=cat & 1=dog) and time spent on the couch Interpretation: a one unit increase in the category (i.e., from cats to dogs) is associated with a .25 unit increase in time spent on couch

Biserial correlation Used when one variable is a “continuous dichotomy” Example: passing exam vs. failing exam Knowledge of subject is continuous variable: some people pass exam with higher grade than others Formula to convert point-biserial to biserial: P1=proportion of cases in category 1 P2=proportion of cases in category 2 y is from z-table: find value roughly equivalent to split between largest and smallest proportion See table on p. 887 in book

Biserial correlation Example:
Correlation between time spent studying for medical boards and outcome of test (pass/fail) was % of test takers passed. 𝑟 𝑏 = ∗ = .46

Partial correlation Correlation between two variables when the effect of a third variable has been held constant Controls for effect of third variable on both variables Rationale: if third variable correlated (shares variance) with 2 variables of interest, correlation between these 2 variables won’t be accurate unless effect of 3rd variable is controlled for

Partial correlation Obtain by going to Analyze-correlate-Partial
Choose variables of interest to correlate Choose variable to control

Semi-partial (part) correlations
Partial correlation: control for effect that 3rd variable has on both variables Semi-partial correlation: control for effect that 3rd variable has on one variable Useful for predicting outcome using combination of predictors

Calculating effect size
Can square Pearson’s correlation to get R2: proportion of variance shared by variables Can also square Spearman’s rho to get R2s: proportion of variance in ranks shared by variables Can’t square Kendall’s tau to get proportion of variance shared by variables

Regression Used to predict value of one variable (outcome) from value of another variable (predictor) Linear relationship Yi=(bo+b1x1)+ei = outcome = intercept: value of outcome (Y) when predictor (X) = 0 = slope of line: shows direction & strength of relationship = value of predictor (x) = deviation of predicted outcome from actual outcome

Regression 𝑏 𝑜 and 𝑏 1 are regression coefficients
Negative 𝑏 1 : negative relationship between predictor and criterion Positive 𝑏 1 : positive relationship between predictor and criterion Will sometimes see β 𝑜 and β 1 instead: these are standardized regression coefficients Put values in standard deviation units

Regression

Regression Regression example:
Pretend we have the following regression equation: Exam grade (Y) = (Hours spent studying) + error If we know that someone spends 10 hours studying for the test, what is the best prediction of their exam grade we can make? Exam grade = 45 + (.35*10) = 80

Estimating model Difference between actual outcome and outcome predicted by data

Estimating model Total error in model = ( 𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑖 − 𝑚𝑜𝑑𝑒𝑙 𝑖 ) 2
Called sum of squared residuals (SSR) Large SSR: Model not a good fit to data; small = good fit Ordinary least squares (OLS) regression: used to define model that minimizes sum of squared residuals

Estimating model Total sum of squares (SST): Total sum of squared differences between observed data and mean value of Y Model sum of squares (SSM): Improvement in prediction as result of using regression model rather than mean

Estimating model Proportion of improvement due to use of model rather than mean: 𝑅 2 = 𝑆𝑆 𝑀 𝑆𝑆 𝑇 Also is indicator of variance shared by predictor and outcome F-ratio: statistical test for determining whether model describes data significantly better than mean 𝐹= 𝑀𝑆 𝑀 𝑀𝑆 𝑅

Individual predictors
b should be significantly different from 0 0 would indicate that for every 1 unit change in x, y wouldn’t change Can test difference between b and null hypothesis (b = 0) using t-test 𝑡= 𝑏 𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑆𝐸 𝑏

Multiple regression Week 5

Outliers in regression
Outlier can affect regression coefficient

Outliers in regression
Residual: difference between actual value of outcome and predicted value of outcome Large residuals: poorly-fitting regression model Small residuals: regression model good fit Unstandardized residual: difference between actual and predicted outcome value, measured in same units as outcome Standardized residual: Residuals converted to z-scores Studentized residual: unstandardized residual divided by estimate of standard deviation

Influential cases Influential case: value that strongly influences regression model parameter estimates Cook’s distance: measure of overall influence of case on the model Values larger than 1 = problem Leverage: shows influence of observed value of outcome variable over predicted values of outcome Average leverage = (k + 1)/n, where k is number of predictors and n is sample size Problematic values: (3(k + 1)/n)

Influential cases DFBETA: compares regression coefficient when case is excluded from the model to regression coefficient when case is included in the model Problematic if values larger than 2(√n) Mahalanobis distance: measures distance of case from mean of predictor variable(s) Chi square distribution with degrees of freedom equal to number of predictors Significant value = problem

Independent errors Durbin-Watson test: tests whether adjacent residuals are correlated Value of 2: residuals uncorrelated Value larger than 2: negative correlation between residuals Value smaller than 2: positive correlation between residuals Values greater than 3 or less than 1 problematic

Assumptions of linear regression models
Additivity and linearity: outcome linearly related to additive combination of predictors Independent errors: uncorrelated residuals Homoscedasticity: at all levels of predictor, should be equal variance of residuals Normally distributed errors (residuals) Predictors uncorrelated with external variables: external variables = variables not included in model that influence outcome variable

Assumptions of linear regression models
Predictors must be quantitative, or categorical with only 2 categories Can dummy-code variables if more than 2 categories Outcomes quantitative and continuous No perfect multicollinearity: No perfect linear relationship between predictor pairs Non-zero variance: predictors need to vary

Multiple regression Incorporates multiple predictors into regression model Predictors should be chosen based on theory/previous research Not useful to chuck lots of random predictors into model to see what happens

Semi-partial correlation
Foundation of multiple regression Measures relationship between predictor and outcome, controlling for relationship between that predictor and other predictors in the model Shows unique contribution of predictor in explaining variance in outcome

Reasons for multiple regression
Want to explain greater amount of variance in outcome “What factors influence adolescent drug use? Can we predict it better?” Want to look at set of predictors in relation to outcome Very useful: human behavior rarely determined by just one thing “How much do recruiter characteristics and procedural justice predict job satisfaction once hired?”

Reasons for multiple regression
Want to see if adding another predictor (or set of predictors) will improve prediction above and beyond known set of predictors “Will adding a job knowledge test to current battery of selection tests improve prediction of job performance?” Want to see if predictor(s) significantly related to outcome after controlling for effect of other predictors “Is need for cognition related to educational attainment, after controlling for socioeconomic status?”

Entering predictors Hierarchical regression Forced entry Stepwise
Known predictors entered into model first New/untested predictors added into models next Good for assessing incremental validity Forced entry All predictors forced into model at same time Stepwise DON’T USE IT! Adds predictors based upon amount of variance explained Atheoretical & capitalizes on error/chance variation

Multicollinearity Perfect collinearity: one predictor has perfect correlation with another predictor Can’t get unique estimates of regression coefficients: both variables share same variance Lower levels of multicollinearity common

Multicollinearity Problems with multicollinearity:
Untrustworthy bs due to increase in standard error-more variable across samples Limits R: If two variables highly correlated, they share a lot of variance. Each will then account for very little unique variance in the outcome Adding predictor to model that’s correlated strongly with existing predictor won’t increase R by much even if on it’s own it’s strongly related to outcome Can’t determine importance of predictors: since variance shared between predictors, which accounts for more variance in outcome?

Multicollinearity Example: You’re trying to predict social anxiety using emotional intelligence and number of friends as predictors What if emotional intelligence and number of friends are related?

Emotional intelligence
Multicollinearity Social anxiety Number of friends Emotional intelligence Both explain this variance in outcome

Multicollinearity Could have high R accompanied by very small bs
Variance inflation factor (VIF): evaluates linear relationship between predictor and other predictors Largest VIF greater than 10: problem Average VIF greater than 1: problem Calculate this by adding up VIF values across predictors, and then dividing by number of predictors Tolerance: reciprocal of VIF (1/VIF) Below .10: major problem Below .20: potential problem

Multicollinearity Many psychological variables are slightly correlated
Likely to run into big multicollinearity problems if you include 2 predictors measuring the same, or very similar, constructs Examples: Cognitive ability and problem-solving 2 different conscientiousness measures Job knowledge and a situational interview Scores on 2 different anxiety measures

Homoscedasticity Can plot zpred (standardized predicted values of DV based on model) against zresid (standardized residuals)

Homoscedasticity Should look like a random scatter of values

Multiple regression in SPSS

Regression output R: Correlation between actual outcome values, and values predicted by regression model R2: Proportion of variance in outcome predicted by model Adjusted R2: estimate of value in population (adjusted for shrinkage that tends to occur in cross-validated model due to sampling error)

Regression output F-test: compares variance explained by model to variance unaccounted for by model (error) Shows whether predictions based on model are more accurate than predictions made using mean

Regression output Beta (b) values: change in outcome associated with a one-unit change in a predictor Standardized beta (β) values: beta values expressed as standard deviations

Practice time! The following tables show the results of a regression model predicting Excel training performance using 5 variables: self-efficacy (Setotal), Excel use (Rexceluse), Excel formula use (Rformulause), cognitive ability (WPTQ), and task-switching IAT score (TSA_score)

Interpret this…

And this…

And finally this

Moderation Week 6 and 7

Categorical variables
When categorical variable has 2 categories (male/female, dead/alive, employed/not employed), can put it directly into regression When categorical variable has more than 2 categories (freshman/sophomore/junior/senior, entry level/first line supervisor/manager), can’t input it directly into regression model Have to dummy code categorical variable

Dummy variables: represent group membership using zeroes and ones Have to create a series of new variables Number of variables=number of categories - 1 Example: freshman/sophomore/junior/senior

Eight steps in creating and using dummy coded variables in regression: 1. Count number of groups in variable and subtract 1 2. Create as many new variables as needed based on step 1 3. Choose one of groups as baseline to compare all other groups against Usually this will be the control group or the majority group 4. Assign values of 0 to all members of baseline group for all dummy variables

5. For first dummy variable, assign 1 to members of the first group that you want to compare against baseline group. Members of all other groups get a 0. 6. For second dummy variable, assign 1 to all members of second group you want to compare against baseline group. Members of all other groups get a 0. 7. Repeat this for all dummy variables. 8. When running regression, put all dummy variables in same block

Example: One variable with 4 categories: Freshman, sophomore, junior, senior.

Each dummy variable is included in the regression output Regression coefficient for each dummy variable shows change in outcome that results when moving from baseline (0) to category being compared (1): difference in outcome between baseline group and other group Example: Compared to freshmen, seniors’ attitudes towards college scores are 1.94 points higher Significant t-value: group coded as 1 for that dummy variable significantly different on outcome than baseline group

Moderation Relationship between 2 variables depends on the level of a third variable Interaction between predictors in model

Moderation Many research questions deal with moderation!
Example: In I/O psychology, moderation important for evaluating predictive invariance Does the relationship between a selection measure and job performance vary depending on demographic group (Male vs. female, White vs. Black, etc.)? Example: In clinical/counseling, moderation important for evaluating risk for mental illness Does the relationship between exposure to a stressful situation and subsequent mental illness diagnosis vary depending on the individual’s social support network?

Moderation How you would want to think about testing for Moderation in SPSS: include predictor and moderator and create third variable that multiplies moderator and predictor All 3 variables predict the outcome

Moderation 𝑌 𝑖 = 𝑏 𝑜 + 𝑏 1 𝐴 𝑖 + 𝑏 2 𝐵 𝑖 + 𝑏 3 𝐴𝐵 𝑖 + 𝑒 𝑖
𝑌 𝑖 = 𝑏 𝑜 + 𝑏 1 𝐴 𝑖 + 𝑏 2 𝐵 𝑖 + 𝑏 3 𝐴𝐵 𝑖 + 𝑒 𝑖 Basic regression equation with minor change: 𝐴𝐵 𝑖 Outcome depends on Intercept ( 𝑏 𝑜 ) Score on variable A ( 𝑏 1 𝐴 𝑖 ), and relationship between variable A and Y Score on variable B ( 𝑏 2 𝐵 𝑖 ), and relationship between variable B and Y Interaction (multiplication) between scores on variables A and B (𝑏 3 𝐴𝐵 𝑖 ), and relationship between AB and Y

Moderation Moderator variables can be either categorical (low conscientiousness/high conscientiousness; male vs. female, etc.) or continuous (conscientiousness scores from 1-7) Categorical: can visualize interaction as two different regression lines, one for each group, which vary in slope (and possibly in intercept)

Moderation

Moderation Continuous moderator: visualize in 3-dimensional space: more complex relationship between moderator and predictor variable Slope of one predictor changes as values of moderator change Pick a few values of moderator and generate graphs for easier interpretation Simple slopes analysis: picks out few levels of predictor and moderator and look at our slopes

Moderation Prior to analysis, need to grand-mean center predictors
Doing makes interactions easier to interpret (why we center) Regression coefficients show relationship between predictor and criterion when other predictor equals 0 Not all variables have meaningful 0 in context of study: age, intelligence, etc. Could end up trying to interpret effects based on non-existing score (such as the level of job performance for person with intelligence score of 0) Once interactions are factored in, interpretation becomes increasingly problematic Also reduces nonessential multicollinearity (i.e., correlations due to the way that the variables were scaled) Categorical: predictor only Continuous: both predictor and moderator Interpreting regression coefficients when no meaningful zero skews our predictions Without centering, you may end up predicting outcomes based on non-existent score Centering will put variables on new metric with a mean of 0 doesn’t help with multicollinearity much Any nuisance correlation will be taken care of through centering

Moderation Grand mean centering: subtract mean of variable from all scores on that variable Centered variables used to calculate interaction term Creates interaction variable Don’t center categorical predictors Just make sure it is scaled 0 and 1 Don’t center outcome/dependent variable Centering only applies to predictors Only center continuous predictor variables

Moderation For centered variable, value of 0 represents the mean value on the predictor Since transformation is linear, doesn’t change regression model substantially Interpretation of regression coefficients easier Without centering: interaction = how outcome changes with one-unit increase in moderator when predictor = 0 With centering: interaction = how outcome changes with one-unit increase in moderator when predictor = mean 0 is mean of our new variable Doesn’t change SD of variable Allowing to compare regression lines at the mean Small interaction term: not much of a difference Big interaction: moderator had big difference in terms of outcome Simple slopes analysis creates graph to allow us to see relationship between predictor and outcome at different levels of the moderator

Grand mean centering Run descriptives for all variable to center
Centered video games variable Transform > Compute variable > Create new variable Video_centered > subtracted the mean

Moderation Steps for moderation in SPSS:
1. Grand-mean center continuous predictor(s) 2. Enter both predictor variables into first block 3. Enter interaction term in second block Doing it this way makes it easier to look at R2 change 4. Run regression and look at results 5. If interaction term significant: Categorical predictor: Line graph between predictor and DV, with a different line for each category Continuous predictor: Simple slopes analysis Hierarchical regression Predictor in first block > include centered rather than raw variable > block 2 interaction term (new variable created by multiplying predictor and moderator together) Run regression and look at results Interaction term is significant, not finished (p value significance indicates another step) Simple slopes analysis

Simple slopes analysis
Basic idea: values of outcome (Y) calculated for different levels of predictor and moderator: low, medium, and high Usually defined as -1 SD, mean, + 1 SD Recommend using online calculator for these (can be done by hand, but it’s a pain)

Example: Aggression = (.17*video) + (.76*callous) + (.027(video*callous) For 1 SD below on video games at low levels of callous unemotionality: (.17* )+(.76* )+(.027*( * ) = 33.29 Would do this 8 more times so that you had values of aggression at low, medium, and high levels of callous unemotionality and video game playing Value of outcome is calculated for different levels of the predictor and the moderator (low – 1 SD below mean, med - mean, and high – 1 SD above mean) Plug in values for low, med, and high Process* is an add-on available afhayes.com or processmacro.org For personal copy of SPSS

Every possible interaction calculated

Creating interaction term
Predictor and moderator and multiply together (use centered variables; unless categorical)

Entering variables 1st block predictor and moderator
2nd block interaction term

Entering variables Hierarchical option to see how much R2 changes

Output .027 is saying that when predictor is at it’s mean there is a difference of .027 in aggression as we move from one level of callousness to the next level of callousness positive: higher callousness is the more of a relationship there will be between video game playing and aggression Negative: as callousness decreases, weaker relationship between video game playing and aggression Next step is simple slope analysis b/c of significance

For ppl low in callousness, aggression levels didn’t really change High in callousness increases aggression If you have significant interaction, always do a simple slope analysis – makes life easier!

Rescale graph Right click > Select Format Axis > Change min value to 0

Interaction between Attractiveness (predictor) and Support (outcome) with Gender as moderator
First step: center variables Mean of attractiveness (descriptives) Centered mean Name variable > move attract into numeric expression box and enter Attractiveness > Ok Interaction term Transform > compute Variable > rename Attract_x_gender > move attract_cent > multiplication sign * > move gender over Attract_centered * Gender Can now run moderated analysis Regression Analyze > Regression > Linear > Support as dependent (don’t center dependent!!!) > 1st block predictors only (centered attract and gender) > Click Next > move over interaction term into 2nd block > Ok Running regression with 3 different things predicting support (attraction, gender, and interaction between attract and gender) Significant interaction: regression lines of groups cross at some point. Cross or distinctly not parallel means significant interaction Slope analysis is next step!

Research Designs Comparing Groups
Week 8

Quasi-experimental designs

Quasi-experiments No random assignment
Goal is still to investigate relationship between proposed causal variable and an outcome What they have: Manipulation of cause to force it to happen before outcome Assess covariation of cause and effect What they don’t have: Limited in ability to rule out alternative explanations But design features can improve this

One group posttest only design
Problems: No pretest: did anything change? No control group: what would have happened if IV not manipulated? Doesn’t control for threats to internal validity X O1 Do not use this!! Only measures one group Graph: x=manipulation, o=observation

One group posttest only design
Example: An organization implemented a new pay-for- performance system, which replaced its previous pay-by- seniority system. A researcher was brought in after this implementation to administer a job satisfaction survey

One group pretest-posttest design
Adding pretest allows assessment of whether change occurred Major threats to internal validity: Maturation: change of participants due to natural causes History: change due to historical event (recession, etc.) Testing: desensitizing participants to the test, using the same pretest for posttest O1 X O2 O1=pretest, O2=post-test

One group pretest-posttest design
Example: An organization wanted to implement a new pay-for-performance system to replace its pay-by- seniority system. A researcher was brought in to administer a job satisfaction questionnaire before the pay system change, and again after the pay system change

Removed treatment design
Treatment given, and then removed 4 measurements of DV: 2 pretests, and 2 posttests If treatment affects DV, DV should go back to its pre- treatment level after treatment removed Unlikely that threat to validity would follow this same pattern Problem: assumes that treatment can be removed with no lingering effects May not be possible or ethical (i.e., ethical conundrum: taking away schizophrenic patients’ medicine treatment; possibility conundrum: therapy for depression, benefits would still be experienced) ) O1 X O2 O3 O4 X = treatment was removed

Removed treatment design
Example: A researcher wanted to evaluate whether exposure to TV reduced memory capacity. Participants first completed a memory recall task, then completed the same task while a TV plays a sitcom in the background. After a break, participants again complete the memory task while the TV plays in the background, then complete it again with the TV turned off.

Repeated treatment design
O1 X O2 O3 O4 Treatment introduced, removed, and then re-introduced Threat to validity would have to follow same schedule of introduction and removal-very unlikely Problem: treatment effects may not go away immediately Very good at controlling for threats to validity

Repeated treatment design
Example: A researcher wanted to investigate whether piped-in classical music decreased employee stress. She administered a stress survey, and then piped in music. One week later, stress was measured again. The music was then removed, and stress was measured again one week later. The music was then piped in again, and stress was measured a final time one week later.

Posttest-only with nonequivalent groups
NR X O1 O2 Participants not randomly assigned to groups One group receives treatment, one does not DV measured for both groups Big validity threat: selection NR = not randomly assigned

Posttest-only with nonequivalent groups
Example: An organization wants to implement a policy against checking after 6pm in an effort to reduce work-related stress. The organization assigns their software development department to implement the new policy, while the sales department does not implement the new policy. After 2 months, employees in both departments complete a work stress scale.

Untreated control group with pretest and posttest
NR O1 X O2 Pretest and posttest data gathered on same experimental units Pretest allows for assessment of selection bias Also allows for examination of attrition Same as previous, just giving each group pretest

Untreated control group with pretest and posttest
Example: A community is experimenting with a new outpatient treatment program for meth addicts. Current treatment recipients had the option to participate (experimental group) or not participate (control group). Current daily use of meth was collected for all individuals. Those in the experimental group completed the new program, while those in the control group did not. Following the program, participants in both groups were asked to provide estimates of their current daily use of meth.

Switching replications
NR O1 X O2 O3 Treatment eventually administered to group that originally served as control Problems: May not be possible to remove treatment from one group Can lead to compensatory rivalry Switching the treatment

Switching replications
Example: An organization implemented a new reward program to reduce absences. After a month of no absences, employees were…The manufacturing organization from the previous scenario removed the reward program from the Ohio plant, and implemented it in the Michigan plant. Absences were gathered and compared 1 month later.

Reversed-treatment control group
Control group given treatment that should have opposite effect of that given to treatment group Rules out many potential validity threats Problems: may not be feasible (pay/performance, what’s the opposite?) or ethical NR O1 X+ O2 X-

Reversed-treatment control group
Example: A researcher wanted to investigate the effect of mood on academic test performance. All participants took a pre-test of critical reading ability. The treatment group was put in a setting which stimulated positive mood (calming music, lavender scent, tasty snacks) while the control group was put in a setting which stimulated negative mood (annoying children’s show music, sulfur scent, no snacks). Participants then completed the critical reading test again in their respective settings.

Randomized experimental designs

Randomized experimental designs
Participants randomly assigned to groups Random assignment: any procedure that assigns units to conditions based on chance alone, where each unit has a nonzero probability of being assigned to any condition NOT random sampling! Random sampling concerns how sample obtained Random assignment concerns how sample assigned to different experimental conditions

Why random assignment? Researchers in natural sciences can rigorously control extraneous variables People are tricky. Social scientists can’t exert much control. Can’t mandate specific level of cognitive ability, exposure to violent TV in childhood, attitude towards women, etc. Random assignment to conditions reduces chances that some unmeasured third variable led to observed covariation between presumed cause and effect

Why random assignment? Example: what if you assigned all participants who signed up in the morning to be in the experimental group for a memory study, and all those who signed up in the afternoon to be in the control group? And those who signed up in the morning had an average age of 55 and those who signed up in the afternoon had an average age of 27? Could difference between experimental and control groups be attributed to manipulation?

Random assignment Since participants randomly assigned to conditions, expectation that groups are equal prior to experimental manipulations Any observed difference attributable to experimental manipulation, not third variable Doesn’t prevent all threats to validity Just ensures they’re distributed equally across conditions so they aren’t confounded with treatment

Random assignment Doesn’t ensure groups are equal
Just ensures expectation that they are equal No obvious reason why they should differ But they still could Example: By random chance, average age of control group may be higher than average age of experimental group

Random assignment Random assignment guarantees equality of groups, on average, over many experiments Does not guarantee that any one experiment which uses random assignment will have equivalent groups Within any one study, groups likely to differ due to sampling error But, if random assignment process was conducted over infinite number of groups, average of all means for treatment and control groups would be equal

Random assignment If groups do differ despite random assignment, those differences will affect results of study But, any differences due to chance, not to way in which individuals assigned to conditions Confounding variables unlikely to correlate with treatment condition

Posttest-only control group design
X O Random assignment to conditions (R) Experimental group given treatment/IV manipulation (X) Outcome measured for both groups (O)

Posttest-only control group design
Example: Participants assigned to control group (no healthy eating seminar) or treatment group (90 minute healthy eating seminar) 6 months later, participants given questionnaire assessing healthy eating habits Scores on questionnaire compared for control group and treatment group

Problems with posttest-only control group design
No pretest If attrition occurs, can’t see if those who left were any different than those who completed study No pretest makes it difficult to assess change on outcome

Pretest-posttest control group design
X O Randomly assigned to conditions Given pretest (P) measuring outcome variable One group given treatment/IV manipulation Outcome measured for both groups Variation: can randomly assign after pretest

Pretest-posttest control group design
Example: Randomly assign undergraduate student participants to control group and treatment group Give pretest on attitude towards in-state tuition for undocumented students Control group watches video about history of higher education for 20 minutes, while treatment group watches video explaining challenges faced by undocumented students in obtaining college degree Give posttest on attitude towards in-state tuition for undocumented students

Factorial designs Have 2 or more independent variables 3 advantages:
Naming logic: # of levels in IV1 x # of levels in IV2 x …# of levels in IV X 3 advantages: Require fewer participants since each participant receives treatment related to 2 or more IVs Treatment combinations can be evaluated Interactions can be tested

Factorial designs R XA1B1 O XA1B2 XA2B1 XA2B2 For 2x2 design:
Randomly assign to conditions (there are 4) Each condition represents 1 of 4 possible IV combinations Measure outcome Variables: A and B Levels: 1 and 2 First row: XA1B1 is level 1 of A and level 1 of B Second row: XA1B2 is level 1 of A and level 2 of B

Factorial designs Example:
2 IVs of interest: room temperature (cool/hot) and noise level (quiet/noisy) DV = number of mistakes made in basic math calculations Randomly assign to 1 of 4 groups: Quiet/cool Quiet/hot Noisy/cool Noisy/hot Measure number of mistakes made in math calculations Compare means across groups using factorial ANOVA

Factorial designs 2 things we can look for with these designs:
Main effects: average effects of IV across treatment levels of other IV Did participants do worse in the noisy than quiet conditions? Did participants do worse in the hot than cool conditions Main effect can be misleading if there is a moderator variable Interaction: Relationship between one IV and DV depends on level of other IV Noise level positively related to number of errors made, but only if room hot When looking at main effect of one variable, you ignore the other variable – looking at each IV on it’s on and how it relates to DV

Within-subjects randomized experimental design
Participants randomly assigned to either order 1 or order 2 Participants in order 1 receive condition 1, then condition 2 Participants in order 2 receive condition 2, then condition 1 Having different orders prevents order effects Having participants in more than 1 condition reduces error variance R Order 1 Condition 1 O1 Condition 2 O2 Order 2 Same people are in both conditions – two conditions are the same due to all people experience both conditions; reduces selection bias and increases statistical power

Within-subjects randomized experimental design
Example: Participants randomly assigned to order 1 or order 2 Participants in order 1 reviewed resumes with the applicant’s picture attached and made hiring recommendations. They then reviewed resumes without pictures and made hiring recommendations. Participants in order 2 reviewed resumes without pictures and made hiring recommendations. They then reviewed resumes with the applicant’s picture attached and made hiring recommendations. Very important to counterbalance the order. Do not want participants doing tasks or experiencing treatment in the exact same order.

Data analysis

With 2 groups Need to compare 2 group means to determine if they are significantly different from one another If groups independent, use independent samples t-test If participants in one group are different from the participants in the other group If repeated measures design, use repeated measures t- test

With 3 or more groups Still need to compare group means to determine if they are significantly different If only 1 IV, use a one-way ANOVA If 2 or more IVs, use a factorial ANOVA If groups are not independent, use repeated measures ANOVA

Design practice Research question:
Does answering work-related communication ( s, phone calls) after normal working hours affect work-life balance? Design BOTH a randomized experiment AND a quasi- experiment to evaluate your research question For each design (random and quasi): Operationalize variables and develop a hypothesis(es) Name and explain the experimental design as it will be used to test your hypothesis(es) Name and explain one threat to internal validity in your design This was worth extra credit 

Comparing means Week 9

Comparing means 2 primary ways to evaluate mean differences between groups: t-tests ANOVAs Which one you use will depend on how many groups you want to compare, and how many IVs you have 2 groups, 1 IV, 1 DV: t-test 3 or more groups, 1 or more IVs, 1 DV: ANOVA One-way ANOVA if only 1 IV Factorial ANOVA if 2 or more IVs

t-tests Used to compare means on one DV between 2 groups
Do men and women differ in their levels of job autonomy? Do students who take a class online and students who take the same class face-to-face have different scores on the final test? Do individuals report higher levels of positive affect in the morning than they report in the evening? Do individuals given a new anti-anxiety medication report different levels of anxiety than individuals given a placebo? Plenty of situations in where we need to compare two groups with the DV

t-tests 2 different options for t-tests:
Independent samples t-test: individuals in group 1 are not the same as individuals in group 2 Do self-reported organizational citizenship behaviors differ between men and women? Repeated measures t-test: individuals in group 1 are the same as individuals in group 2 Do individuals report different levels of job satisfaction when surveyed on Friday than they do when surveyed on Monday?

A note on creating groups
Beware of dichotomizing a continuous variable in order to make 2 groups Example: everyone who scored a 50% or below on a test goes in group 1, and everyone who scored 51% or higher goes in group 2 Causes several problems People with very similar scores around cut point may end up in separate groups Reduces statistical power Increases chances of spurious effects Relevant for t-tests and ANOVA Ex: satisfaction with course and test scores, want to compare high to low test scores (dichotomized test scores) – when artificially dichot a continuous variable can create problems 1. Ppl with very similar scores can end up in different groups 2. Reduces statistical power (anytime you work with a categorical variable, you will reduce stat power) – difficult to find significant result 3.Does not fit with the way the variable was collected T-tests are good when you have a categorical variable on it’s own – do not dichotomize variable just so you can do a t-test

t-tests and the linear model
t-test is just linear model with one binary predictor variable 𝑌 𝑖 = 𝑏 0 + 𝑏 1 𝑥 1 + 𝑒 𝑖 Predictor has 2 categories (male/female, control/experimental) Dummy variable: 0=baseline group, 1 = experimental/comparison group 𝑏 0 is equal to mean of group coded 0 𝑏 1 is equal to difference between group means T-test is no difference with a regression model with one predictor and 2 categories

Rationale for t - test 2 sample means collected-need to see how much they differ If samples from same population, expect means to be roughly equivalent Large differences unlikely to occur due to chance When we do a t-test, we compare difference between sample means to difference we would expect if null hypothesis was true (difference = 0)

Rationale for t-test Standard error = gauge of differences between means likely to occur due to chance alone Small standard error: expect similar means if both samples from same population Large standard error: expect somewhat different means even if both samples from same population t-test evaluates whether observed difference between means is larger than would be expected, based on standard error, if samples from same population If there is a difference between our means and it’s large enough that it would be significant, then we would reject the null Standard error: Gauge of the difference b/w means if the change was due to chance alone Small SE: Similar means if both samples came from the same population Large SE: Even if sample came from same population, we would expect to see differences between the means; ussualy happens with very small sample, measure variables poorly

Rationale for t-test Top half of equation = model
Bottom half of equation = error

Independent samples t-test
Use when each sample contains different individuals Look at ratio of between-group difference in means to estimate of total standard error for both groups Variance sum law: variance of difference between 2 independent variables = sum of their variances Use sample standard deviations to calculate standard error for each population’s sampling distribution

Assuming that sample sizes are equal: 𝑡= 𝑋 1 − 𝑋 𝑠2 1 𝑁 𝑠2 2 𝑁 2 Top half: difference between means Bottom half: each sample’s variance divided by its sample size Top half: ( 𝑋 1 ) mean of group 1 – mean of group 2 Bottom half: ( 𝑠2 1 ) variance for sample 1/( 𝑁 1 )sample size sample 1 + variance for sample 1/sample size sample 1

If sample sizes are not equal, need to use pooled variance, which weights variance for each sample to account for sample size differences Pooled variance: Important: Sample that is bigger would have an undue influence of the variance estimate if you didn’t weight the samples

Equation for independent samples t-test with different sample sizes: Differences between groups Error Pooled variance/sample size

Paired samples/repeated measures t-test
Use when same people are in both samples Average difference between scores at measurement 1 and measurement 2: 𝐷 Shows systematic variation between measurements Difference that we would expect between measurements if null hypothesis true: 𝜇 𝐷 Since null hypothesis says that difference = 0, this cancels out Measure of error = standard error of differences: 𝑠 𝐷 𝑁 Use anytime you have the same people in the same sample ( 𝐷 = average difference between scores) 𝜇 𝐷 = 0 which cancels out of the equation

Paired samples/repeated measures t-test
= 0 and cancels out (what we would expect to see if the null were true)

Assumptions of t-tests
Both types of t-tests are parametric and assume normality of sampling distribution For repeated measures, refers to sampling distribution of differences Data on DV have to be measured at interval level Can’t be nominal or ordinal Independent samples t-test assumes variances of each population equivalent (homogeneity of variance) Also assumes scores in each sample independent of scores in other sample

Assumptions of t-tests
Independent samples t-tests will automatically do Levene’s test for you If Levene’s not significant, homogeneity of variance assumption met: interpret first line of output (equal variances assumed) If Levene’s is significant, homogeneity of variance assumption not met: interpret second line out output (equal variances not assumed)

Independent samples t-test example
DV = Number of items skipped on ability test Group 1: Took test in unproctored setting Group 2: took test in proctored setting

DV into test variable IV into grouping variable spot (do not worry about ??) Pop-up window asking how groups were coded > Select continue

1st line of output: statistics broken up by groups Next line is actual t-test: 1st two boxees are Levene’s test (DO NOT INTERPRET AS T-TEST if Levene’s test is significant!!!) Interpret 2nd line of output (equal variances not assumed) – t = 7.650, df = , p < .000 (use 2nd significance value; p never equal to 0 – SPSS reads p=.000 as p < .000) Use first three boxes on 2nd line AFTER Levene’s test

Analyze > compare means > independent sample t-test
Day 1 hygiene score over into test variables box > move gender into grouping variable box > Select define groups and set Group 1 as 0 and Group 2 as 1 > Select continue > select OK

Look at Levene’s test: if non-sign. Look at 1st line for t-test
Difference is significant p< .05 Negative t values subtracts Female from Male mean (whatever second group was had a higher mean than the first group; hint: look at group statistics at group means) Women had higher hygiene scores than men Dft = (n1 + n2) – 2

Need to report effect size Can convert to r: r = √(7.65*7.65)/(7.65*7.65) ))= .184 Values taken from Slide 21 Independent samples t-test (proctored v. unproctored)

More commonly use d: 𝑑 = 𝑋 1 − 𝑋 2 𝑠 2 d = ( )/1.431 = 0.23 Note on d: Book shows d calculation using only 1 sd In practice, more common to use pooled standard deviation Interpretation (Cohen, 1988): .20 = small, .50 = medium, .80 = large Negative d means that 𝑋 2 larger than 𝑋 1 Values taken from Slide 21 Independent samples t-test (proctored v. unproctored) SPSS will not find your d (tip: lots of t-tests, use excel spreadsheet and will calculate Cohen’s d)

Repeated measures t-test example
DV = Perceptions of procedural justice Measurement 1: Participants took one type of Implicit Association Test (task-switching ability) Measurement 2: Participants took traditional cognitive ability test (WPT-Q) Rationale behind repeated measures is the same as independent t-test (difference is sample content) Repeated used because procedural justice perceptions were measured for both measurements

Paired samples t-test is the same as repeated measures t-test

Variable 1 is scores for time 1 (move over 1st) and variable 2 is scores for time 2 (move over 2nd) > Select ok

Paired Samples Correlations: looking at the correlation -group time 1 and time 2; correlation is strong here Really interested with Paired Samples Test: Levene’s test not necessary because our means came from the same population/samples 1st line (column) is mean difference between groups: tells us that 2nd measurement had a higher mean t-value is -9.74, meaning 2nd measure had higher mean df is a little bit different because we only have 1 sample (dftr = N-1)

Repeated measures t-test effect sizes
Still need to calculate effect sizes Problem with r in repeated measures t-test: tends to over-estimate effect size Better off using d with repeated measures designs: better estimate of effect size Formula for repeated measures d = (D – μD)/S r not the best choice for repeated measures design because of overestimation of effect size

Comparing the t-tests If you have the same people in both groups, ALWAYS use repeated measures t-test (or you violate one of the assumptions of the independent t-test) Non-independence of errors violates assumptions of independent samples t-test Power is higher in repeated measures t-test Reduces error variance by quite a bit since same participants are in both samples Different people in groups brings a certain amount of error because they bring their own idiosyncrasies. Increase random error with independent t-test Repeated measures t-test allows us to make do with fewer participants

Use one participant per row

Fear scores were higher (saw the real spider) for the second measure than the first measure
When writing up result that you have same participants in both results – make it very clear that it was a repeated measures design and that the same people were used for measure 1 and measure 2

One-way ANOVA ANOVA = analysis of variance
One-way ANOVA allows us to compare means on a single DV across more than 2 groups

Why we need ANOVA Doing multiple t-tests (control vs. group 1, control vs. group 2, etc.) on data inflates the Type I error rate beyond acceptable levels Familywise error rate assuming α = .05 for each test: 1 – (.95)n n = number of comparisons being made So, with 3 comparisons, overall α = .143 With 4 comparisons, overall α = .185 α = .05, psychology standard Multiple comparisons using same DV, increase error rate Greatly increases chances of Type I error if you do a bunch a t-tests

ANOVA and the linear model
Mathematically, ANOVA and regression are the same thing! ANOVA output: F-ratio: comparison of systematic to unsystematic variance Same as F ratio in regression: shows improvement in prediction of outcome gained by using model as compared to just using mean Only difference between ANOVA and regression: predictor is categorical variable with more than 2 categories Exactly the same as using dummy variables in regression Linear model with # of predictors equal to number of groups - 1

ANOVA and the linear model
Intercept (b0) will be equal to the mean of the baseline group (group coded as 0 in all dummy variables Regression coefficient b1 will be equal to the difference in means between baseline group and group 1 Regression coefficient b2 will be equal to the difference in means between baseline group and group 2

F ratio 𝐹= 𝑠𝑦𝑠𝑡𝑒𝑚𝑎𝑡𝑖𝑐 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑢𝑛𝑠𝑦𝑠𝑡𝑒𝑚𝑎𝑡𝑖𝑐 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 (𝑒𝑟𝑟𝑜𝑟)
𝐹= 𝑠𝑦𝑠𝑡𝑒𝑚𝑎𝑡𝑖𝑐 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑢𝑛𝑠𝑦𝑠𝑡𝑒𝑚𝑎𝑡𝑖𝑐 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 (𝑒𝑟𝑟𝑜𝑟) Systematic variance, in ANOVA, is mean differences between groups Null hypothesis: group means are same In this case, systematic variance would be small Thus, F would be small

ANOVA logic Simplest model we can fit to data is grand mean (of DV)
We try to improve on this prediction by creating a more complex model Parameters include intercept (b0) and one or more regression coefficients (b1, b2, etc.) Bigger regression coefficients = bigger differences between groups If between group differences large, model better fit to data than grand mean If model fit is better than grand mean, then between-group differences are significant

Total sum of squares (SST)
This shows the total amount of variation within the data Grand mean on DV subtracted from each observation’s value on DV Total degrees of freedom for SST: N-1 Total amount of variation around our grand mean in our data.

Model sum of squares (SSM)
This shows how much variance the linear model explains Calculate difference between mean of each group and grand mean, square this value (each value), then multiply it by the number of participants in the group Add the values for each group together Degrees of freedom: k – 1, where k is number of groups Variance explained by group membership

Residual sum of squares (SSR)
This shows differences in scores that aren’t explained by model (i.e., aren’t explained by between-group differences) Calculated by subtracting the group mean from each score, squaring this value, and then adding all of the values together Degrees of freedom = N – k, where k = number of groups and N is overall sample size Error part of ANOVA – unsystematic variance Any variance not accounted for by group membership

Mean squares To get a mean square value, divide sum of squares value by its degrees of freedom Mean square model (MSM) = SSM/k-1 Mean square residual (MSR) = SSR/N - k

F ratio Calculated using mean square values:
Degrees of freedom for F: (k-1), (N – k) If F is statistically significant, group means differ by more than they would if null hypothesis were true F is omnibus test: only tells you whether group means differ significantly: there’s a difference somewhere Doesn’t tell you which means differ from one another Need post-hoc tests to determine this F-test problematic because it only tells us that group means differ significantly, doesn’t tell us which groups differed significantly Post-hoc tests lets you look at which groups differed significantly from one another

Post-hoc tests Pairwise comparisons to compare all groups to one another All incorporate correction so that Type I error rate is controlled (at about .05) Example: Bonferroni correction (very conservative): use significance level( usually .05) α/n, where n is number of comparisons So, if we have 3 groups and we want to keep α at .05 across all comparisons, each comparison will have α = .017 Restricts alpha level for each comparisons

Post-hoc tests Lots of options for post hoc tests in SPSS
Some notes on the more common ones: Least significant difference (LSD): doesn’t control Type I error very well Bonferroni’s and Tukey’s: control Type I error rate, but lack statistical power (too conservative) REGWQ: controls Type I error and has high power, but only works if sample sizes equal across groups Games-Howell: less control of Type I error, but good for unequal sample sizes and unequal variance across groups Dunnett’s T3: good control of Type I error, works if unequal variance across groups

Assumptions of ANOVA Homogeneity of variance: can check with Levene’s test If Levene’s significant and homogeneity of variance assumption violated, need to use corrected F ratio Brown-Forsyth F Welch’s F Provided group sizes equal, ANOVA works ok if normality assumption violated somewhat If group sizes not equal, ANOVA biased if data non-normal Non-parametric alternative to ANOVA: Kruskal-Wallis test (book covers in detail)

Steps for doing ANOVA

Effect sizes for ANOVA R2: SSM/SST
When applied to ANOVA, value called eta squared, η2 Somewhat biased because it’s based on sample only: doesn’t adjust for looking at effect size in population SPSS reports partial eta squared, but only for factorial ANOVA: SSB/SSB+ SSE Better effect size measure for ANOVA: omega-squared (ω2 ; SPSS will not measure for you) 𝜔 2 = 𝑆𝑆 𝑀 −( 𝑑𝑓 𝑀 ) 𝑀𝑆 𝑅 𝑆𝑆 𝑇 + 𝑀𝑆 𝑅 eta squared – Not preferred method for ANOVA Top: model variance – error variance Bottom: error

One-way ANOVA in SPSS IV: Counterproductive work behavior (CWB) scale that varied in its response anchors: control, infrequent, & frequent DV: self-reported CWB

One-way ANOVA in SPSS

One-way ANOVA in SPSS IV into Factor box
Select post-hoc … leads into next slide

One-way ANOVA in SPSS

One-way ANOVA in SPSS Main ANOVA: f-value
Look at post-hoc tests to see which groups significantly differ from one another People on the frequent scale reported more CWB than the traditional scale (1st grouping/top) Frequent scale more CWB than infrequent scale (2nd grouping/middle) Duplicate values at the bottom of Multiple Comparisons box

One-way ANOVA in SPSS Calculating omega-squared:
𝜔 2 = 21.49− = .025 Suggestions for interpreting 𝜔 2 : .01 = small .06 = medium .14 = large

Analyze > Compare means > One-way ANOVA

Do not report the same comparison twice.
Write down the differences so you won’t mistakenly list comparisons twice.

Correlation and regression

Similar presentations

Presentation on theme: "Correlation and regression"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Correlation and regression

Similar presentations

Presentation on theme: "Correlation and regression"— Presentation transcript:

Similar presentations

About project

Feedback