# Research Support Center Chongming Yang

## Presentation on theme: "Research Support Center Chongming Yang"— Presentation transcript:

Research Support Center Chongming Yang
SPSS Workshop Research Support Center Chongming Yang

Causal Inference If A, then B, under condition C
If A, 95% Probability B, under condition C

Student T Test (William S. Gossett’s pen name = student)
Assumptions Small Sample Normally Distributed t distributions: t = [ x - μ ] / [ s / sqrt( n ) ] df = degrees of freedom=number of independent observations

Type of T Tests One sample Two independent samples Paired
test against a specific (population) mean Two independent samples compare means of two independent samples that represent two populations Paired compare means of repeated samples

One Sample T Test Conceputally convert sample mean to t score and examine if t falls within acceptable region of distribution

Two Independent Samples

Paired Observation Samples
d = difference value between first and second observations

Multiple Group Issues Groups A B C comparisons
AB AC BC Joint Probability that one differs from another .95*.95*.95 = .91

Analysis of Variance (ANOVA)
Completely randomized groups Compare group variances to infer group mean difference Sources of Total Variance Within Groups Between Groups F distribution SSB = between groups sum squares SSW = within groups sum squares

Fisher-Snedecor Distribution

F Test Null hypothesis: 𝑥 1 = 𝑥 2 = 𝑥 3 ...= 𝑥 𝑛
Given df1 and df2, and F value, Determine if corresponding probability is within acceptable distribution region

Issues of ANOVA Indicates some group difference
Does not reveal which two groups differ Needs other tests to identify specific group difference Hypothetical comparisons Contrast No Hypothetical comparisons Post Hoc ANOVA has been replaced by multiple regressions, which can also be replaced by General Linear Modeling (GLM)

Multiple Linear Regression
Causes 𝑥 cab be continuous or categorical Effect 𝑦 is continuous measure Mild causal terms  predictors Objective  identify important 𝑥

Assumptions of Linear Regression
Y and X have linear relations Y is continuous or interval & unbounded expected or mean of  = 0  = normally distributed not correlated with predictors Predictors should not be highly correlated No measurement error in all variables

Least Squares Solution
Choose 𝛽 0 , 𝛽 1 , 𝛽 2 , 𝛽 3 ,... 𝛽 𝑘 to minimize the sum of square of difference between observed 𝑦 𝑖 and model estimated/predicted 𝑦 𝑖 Through solving many equations

Explained Variance in 𝑦

Standard Error of 𝛽

T Test significant of 𝛽 t = 𝛽 / SE𝛽
If t > a critical value & p <.05 Then 𝛽 is significantly different from zero

Confidence Intervals of 𝛽

Standardized Coefficient (𝛽𝑒𝑡𝑎)
Make 𝛽s comparable among variables on the same scale (standardized scores)

Interpretation of 𝛽 If x increases one unit, y increases 𝛽 unit, given other values of X

Model Comparisons Complete Model: Reduced Model: Test F = Msdrop / MSE
MS = mean square MSE = mean square error

Variable Selection Select significant from a pool of predictors
Stepwise undesirable, see Forward Backward (preferable)

Dummy-coding of Nominal 𝑥
R = Race(1=white, 2=Black, 3=Hispanic, 4=Others) R d1 d2 d3 Include all dummy variables in the model, even if not every one is significant.

Interaction Create a product term X2X3
Include X2 and X3 even effects are not significant Interpret interaction effect: X2 effect depends on the level of X3.

Plotting Interaction Write out model with main and interaction effects, Use standardized coefficient Plug in some plausible numbers of interacting variables and calculate y Use one X for X dimension and Y value for the Y dimension See examples

Diagnostic Linear relation of predicted and observed (plotting
Collinearity Outliers Normality of residuals (save residual as new variable)

Repeated Measures (MANOVA, GLM)
Measure(s) repeated over time Change in individual cases (within)? Group differences (between, categorical x)? Covariates effects (continuous x)? Interaction between within and between variables?

Assumptions Normality
Sphericity: Variances are equal across groups so that Total sum of squares can be partitioned more precisely into Within subjects Between subjects Error

Model 𝜇 = grand mean 𝜋 𝑖 = constant of individual i
𝜏 𝑗 = constant of jth treatment 𝜀 𝑖𝑗 = error of i under treatment j 𝜋𝜏 = interaction

F Test of Effects F = MSbetween / Mswithin (simple repeated)
F = Mstreatment / Mserror (with treatment) F = Mswithin / Msinteraction (with interaction)

Four Types Sum-Squares
Type I  balanced design Type II  adjusting for other effects Type III  no empty cell unbalanced design Type VI  empty cells

Exercise Copy data to spss syntax window, select and run Run Repeated measures GLM