# Assumptions. “Essentially, all models are wrong, but some are useful” George E.P. Box Your model has to be wrong… … but that’s o.k. if it’s illuminating!

## Presentation on theme: "Assumptions. “Essentially, all models are wrong, but some are useful” George E.P. Box Your model has to be wrong… … but that’s o.k. if it’s illuminating!"— Presentation transcript:

Assumptions

“Essentially, all models are wrong, but some are useful” George E.P. Box Your model has to be wrong… … but that’s o.k. if it’s illuminating!

Linear Model Assumptions Absence of Collinearity Normality of Errors Homoskedasticity of Errors No influential data points Independence

Linear Model Assumptions Absence of Collinearity Normality of Errors Homoskedasticity of Errors No influential data points Independence

Absence of Collinearity Baayen (2008: 182)

Absence of Collinearity Baayen (2008: 182)

Where does collinearity come from? …most often, correlated predictor variables Demo

What to do?

Linear Model Assumptions Absence of Collinearity Normality of Errors Homoskedasticity of Errors No influential data points Independence

Baayen (2008: 189-190)

DFbeta (…and much more) Leave-one-out Influence Diagnostics

Winter & Matlock (2013)

Linear Model Assumptions Absence of Collinearity Normality of Errors Homoskedasticity of Errors No influential data points Independence

Normality of Error The error (not the data!) is assumed to be normally distributed So, the residuals should be normally distributed

xmdl = lm(y ~ x) hist(residuals(xmdl)) ✔

qqnorm(residuals(xmdl)) qqline(residuals(xmdl)) ✔

qqnorm(residuals(xmdl)) qqline(residuals(xmdl)) ✗

Linear Model Assumptions Absence of Collinearity Normality of Errors Homoskedasticity of Errors No influential data points Independence

Homoskedasticity of Error The error (not the data!) is assumed to have equal variance across the predicted values So, the residuals should have equal variance across the predicted values

WHAT TO IF NORMALITY/HOMOSKEDASTI CITY IS VIOLATED?  Either: nothing + report the violation  Or: report the violation + transformations

Two types of transformations Linear Transformations Nonlinear Transformations Leave shape of the distribution intact (centering, scaling) Do change the shape of the distribution

Before transformation

After transformation Still bad…. …. but better!! Still bad…. …. but better!!

Assumptions Absence of Collinearity Normality of Errors Homoskedasticity of Errors No influential data points Independence

Normality of Errors Homoskedasticity of Errors (Histogram of Residuals) Q-Q plot of Residuals Residual Plot Assumptions

Absence of Collinearity No influential data points Independence Normality of Errors Homoskedasticity of Errors Assumptions

Absence of Collinearity Normality of Errors Homoskedasticity of Errors No influential data points Independence Assumptions

What is independence?

Rep 1 Rep 2 Rep 3 Item #1 Subject Common experimental data Item...

Rep 1 Rep 2 Rep 3 Item #1 Subject Common experimental data Pseudoreplication = Disregarding Dependencies Pseudoreplication = Disregarding Dependencies Item...

Subject1Item1 Subject1Item2 Subject1Item3… Subject2Item1 Subject2Item2 Subject3Item3 ….… Machlis et al. (1985) “ pooling fallacy ” Hurlbert (1984) “pseudoreplication”

Hierarchical data is everywhere Typological data (e.g., Bell 1978, Dryer 1989, Perkins 1989; Jaeger et al., 2011) Organizational data Classroom data

Germa n French English Spanish Italian Swedish Norwegian Finnish Hungarian Turkish Romanian

Germa n French English Spanish Italian Swedish Norwegian Finnish Hungarian Turkish Romanian

Class 1Class 2 Hierarchical data is everywhere

Class 1Class 2 Hierarchical data is everywhere

Class 1Class 2 Hierarchical data is everywhere

Intraclass Correlation (ICC) Hierarchical data is everywhere

Simulation for 16 subjects pseudoreplication items analysis Type I error rate

Interpretational Problem: What’s the population for inference? Interpretational Problem: What’s the population for inference?

Violating the independence assumption makes the p-value… …meaningless

S1 S2

S1 S2

That’s it (for now)

Download ppt "Assumptions. “Essentially, all models are wrong, but some are useful” George E.P. Box Your model has to be wrong… … but that’s o.k. if it’s illuminating!"

Similar presentations