Presentation is loading. Please wait.

Presentation is loading. Please wait.

Assumptions. “Essentially, all models are wrong, but some are useful” George E.P. Box Your model has to be wrong… … but that’s o.k. if it’s illuminating!

Similar presentations


Presentation on theme: "Assumptions. “Essentially, all models are wrong, but some are useful” George E.P. Box Your model has to be wrong… … but that’s o.k. if it’s illuminating!"— Presentation transcript:

1 Assumptions

2 “Essentially, all models are wrong, but some are useful” George E.P. Box Your model has to be wrong… … but that’s o.k. if it’s illuminating!

3 Linear Model Assumptions Absence of Collinearity Normality of Errors Homoskedasticity of Errors No influential data points Independence

4 Linear Model Assumptions Absence of Collinearity Normality of Errors Homoskedasticity of Errors No influential data points Independence

5 Absence of Collinearity Baayen (2008: 182)

6 Absence of Collinearity Baayen (2008: 182)

7 Where does collinearity come from? …most often, correlated predictor variables Demo

8 What to do?

9 Linear Model Assumptions Absence of Collinearity Normality of Errors Homoskedasticity of Errors No influential data points Independence

10 Baayen (2008: )

11

12

13

14

15 DFbeta (…and much more) Leave-one-out Influence Diagnostics

16 Winter & Matlock (2013)

17 Linear Model Assumptions Absence of Collinearity Normality of Errors Homoskedasticity of Errors No influential data points Independence

18 Normality of Error The error (not the data!) is assumed to be normally distributed So, the residuals should be normally distributed

19 xmdl = lm(y ~ x) hist(residuals(xmdl)) ✔

20 qqnorm(residuals(xmdl)) qqline(residuals(xmdl)) ✔

21 qqnorm(residuals(xmdl)) qqline(residuals(xmdl)) ✗

22 Linear Model Assumptions Absence of Collinearity Normality of Errors Homoskedasticity of Errors No influential data points Independence

23 Homoskedasticity of Error The error (not the data!) is assumed to have equal variance across the predicted values So, the residuals should have equal variance across the predicted values

24

25

26

27

28 WHAT TO IF NORMALITY/HOMOSKEDASTI CITY IS VIOLATED?  Either: nothing + report the violation  Or: report the violation + transformations

29 Two types of transformations Linear Transformations Nonlinear Transformations Leave shape of the distribution intact (centering, scaling) Do change the shape of the distribution

30

31

32 Before transformation

33 After transformation Still bad…. …. but better!! Still bad…. …. but better!!

34 Assumptions Absence of Collinearity Normality of Errors Homoskedasticity of Errors No influential data points Independence

35 Normality of Errors Homoskedasticity of Errors (Histogram of Residuals) Q-Q plot of Residuals Residual Plot Assumptions

36 Absence of Collinearity No influential data points Independence Normality of Errors Homoskedasticity of Errors Assumptions

37 Absence of Collinearity Normality of Errors Homoskedasticity of Errors No influential data points Independence Assumptions

38

39 What is independence?

40 Rep 1 Rep 2 Rep 3 Item #1 Subject Common experimental data Item...

41 Rep 1 Rep 2 Rep 3 Item #1 Subject Common experimental data Pseudoreplication = Disregarding Dependencies Pseudoreplication = Disregarding Dependencies Item...

42 Subject1Item1 Subject1Item2 Subject1Item3… Subject2Item1 Subject2Item2 Subject3Item3 ….… Machlis et al. (1985) “ pooling fallacy ” Hurlbert (1984) “pseudoreplication”

43 Hierarchical data is everywhere Typological data (e.g., Bell 1978, Dryer 1989, Perkins 1989; Jaeger et al., 2011) Organizational data Classroom data

44 Germa n French English Spanish Italian Swedish Norwegian Finnish Hungarian Turkish Romanian

45 Germa n French English Spanish Italian Swedish Norwegian Finnish Hungarian Turkish Romanian

46 Class 1Class 2 Hierarchical data is everywhere

47 Class 1Class 2 Hierarchical data is everywhere

48 Class 1Class 2 Hierarchical data is everywhere

49

50 Intraclass Correlation (ICC) Hierarchical data is everywhere

51 Simulation for 16 subjects pseudoreplication items analysis Type I error rate

52 Interpretational Problem: What’s the population for inference? Interpretational Problem: What’s the population for inference?

53 Violating the independence assumption makes the p-value… …meaningless

54 S1 S2

55 S1 S2

56 That’s it (for now)


Download ppt "Assumptions. “Essentially, all models are wrong, but some are useful” George E.P. Box Your model has to be wrong… … but that’s o.k. if it’s illuminating!"

Similar presentations


Ads by Google