Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation Note: Homework Due Thursday.

Similar presentations


Presentation on theme: "1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation Note: Homework Due Thursday."— Presentation transcript:

1 1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation Note: Homework Due Thursday

2 2 2-Factor with Repeated Measure -- Model type subject within type time type by time interaction NOTES: type and time are both fixed effects in the current example - we say “subject is nested within type” - Expected Mean Squares given on page 1032

3 3 The GLM Procedure Dependent Variable: conc Sum of Source DF Squares Mean Square F Value Pr > F Model 17 57720.50000 3395.32353 110.87 <.0001 Error 32 980.00000 30.62500 Corrected Total 49 58700.50000 R-Square Coeff Var Root MSE conc Mean 0.983305 6.978545 5.533986 79.30000 Source DF Type III SS Mean Square F Value Pr > F type 1 40.50000 40.50000 1.32 0.2587 subject(type) 8 3920.00000 490.00000 16.00 <.0001 time 4 34288.00000 8572.00000 279.90 <.0001 type*time 4 19472.00000 4868.00000 158.96 <.0001 2-Factor Repeated Measures – ANOVA Output

4 4 2-factor Repeated Measures Source Type III Expected Mean Square type Var(Error) + 5 Var(subject(type)) + Q(type,type*time) subject(type) Var(Error) + 5 Var(subject(type)) time Var(Error) + Q(time,type*time) type*time Var(Error) + Q(type*time) The GLM Procedure Tests of Hypotheses for Mixed Model Analysis of Variance Dependent Variable: conc Source DF Type III SS Mean Square F Value Pr > F * type 1 40.500000 40.500000 0.08 0.7810 Error 8 3920.000000 490.000000 Error: MS(subject(type)) * This test assumes one or more other fixed effects are zero. Source DF Type III SS Mean Square F Value Pr > F subject(type) 8 3920.000000 490.000000 16.00 <.0001 * time 4 34288 8572.000000 279.90 <.0001 type*time 4 19472 4868.000000 158.96 <.0001 Error: MS(Error) 32 980.000000 30.625000

5 5 NOTE: Since time x type interaction is significant, and since these are fixed effects we DO NOT test main effects – we compare cell means (using MSE).5 1 2 3 4 C 37 63 85 140 76 T 55 81 134 80 42 Cell Means

6 6 The write-up related to the SAS output should be something like the following. Note, that even though we get a significant variance component due to subject(within group) I did not estimate the variance component itself. (I did not give this particular variance component estimation formula.) Note also that since there is a significant interaction between the fixed effects type and time, we do not test the main effects.

7 7 Dealing with Normality/Equal Variance Issues Normalizing Transformations: - log - square root - Box-Cox transformations Note: the normalizing transformations sometimes also produce variance stabilization

8 8 Nonparametric “ANOVA” Man-Whitney U – for comparing 2 samples Kruskal-Wallis Test – for comparing >2 samples Friedman’s Test – nonparametric alternative to randomized complete block/ 1-factor repeated measures design

9 HistogramHistogram displays distribution of 1 variable Scatter Diagram (Scatterplot) Scatter Diagram (Scatterplot) displays joint distribution of 2 variables plots data as “points” in the“x-y plane.”

10 10

11 11

12 Association Between Two Variables – indicates that knowing one helps in predicting the other Linear Association – our interest in this course – points “swarm” about a line Correlation Analysis – measures the strength of linear association

13 13

14 14 (association)

15 Regression Analysis We want to predict the dependent variable - response variable using the independent variable - explanatory variable - predictor variable DependentVariable(Y) Independent Variable (X) More than one independent variable – Multiple Regression

16 16 11.7 Correlation Analysis

17 Correlation Coefficient - measures linear association -1 0 +1 perfect no perfect negative linear positive relationship relationship relationship

18 Positive Correlation - - high values of one variable are associated with high values of the other Examples: - father’s height, son’s height - daily grade, final grade r = 0.93 for plot on the left 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 3210

19 19 EXAMS I and II

20 Negative Correlation - - high with low, low with high Examples: - car age, selling price - days absent, final grade r = - 0.89 for plot shown here 1 2 3 4 5 6 7 1 2 3 4 5 6 7 43210

21 21

22 Zero Correlation - - no linear relationship Examples: - height, IQ score r = 0.0 for plot here 1 2 3 4 5 6 7 1 2 3 4 5 6 7 543210

23 23

24 24 -.75, 0,.5,.99

25 25

26 26 Calculating the Correlation Coefficient

27 27 Notation: So --

28 28 Study Time Exam (hours) Score (X) (Y) 10 92 15 81 12 84 20 74 8 85 16 80 14 84 22 80 The data below are the study times and the test scores on an exam given over the material covered during the two weeks. Find r

29 29 DATA one; INPUT time score; DATALINES; 10 92 15 81 12 84 20 74 8 85 16 80 14 84 22 80 ; PROC CORR; Var score time; TITLE ‘Study Time by Score'; RUN; PROC PLOT; PLOT time*score; RUN; PROC GPLOT; PLOT time*score; RUN;

30 30 The CORR Procedure 2 Variables: score time Simple Statistics Variable N Mean Std Dev Sum Minimum Maximum score 8 82.50000 5.18239 660.00000 74.00000 92.00000 time 8 14.62500 4.74906 117.00000 8.00000 22.00000 Pearson Correlation Coefficients, N = 8 Prob > |r| under H0: Rho=0 score time score 1.00000 -0.77490 0.0239 time -0.77490 1.00000 0.0239 Study Time by Score

31 31 Plot of score*time. Legend: A = 1 obs, B = 2 obs, etc. score ‚ ‚ 92 ˆ A ‚ 91 ˆ ‚ 90 ˆ ‚ 89 ˆ ‚ 88 ˆ ‚ 87 ˆ ‚ 86 ˆ ‚ 85 ˆ A ‚ 84 ˆ A A ‚ 83 ˆ ‚ 82 ˆ ‚ 81 ˆ A ‚ 80 ˆ A A ‚ 79 ˆ ‚ 78 ˆ ‚ 77 ˆ ‚ 76 ˆ ‚ 75 ˆ ‚ 74 ˆ A ‚ Šƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒ 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 time

32 32

33 33 Rejection Region Test Statistic t > t  2 or t <  t  /2 df  n  2 Testing Statistical Significance of Correlation Coefficient

34 34 Correlation Between Study Time and Score H 0 : There is No Correlation Between Study Time and Score H a : There is a Correlation Between Study Time and Score Rejection Region Test Statistic Conclusion P-value

35 35

36 36 Correlation measures the strength of the linear relationship between two variables. Correlation requires that both variables be quantitative. r does not change when we change the units of measurement of x, y, or both. Correlation makes no distinction between explanatory and response variables. The correlation coefficient is not resistant to outliers. Properties of Correlations

37 37 The CORR Procedure Pearson Correlation Coefficients, N = 20 Prob > |r| under H0: Rho=0 math reading math 1.00000 0.87207 <.0001 reading 0.87207 1.00000 <.0001 Math vs Reading Scores

38 38 The CORR Procedure Pearson Correlation Coefficients, N = 20 Prob > |r| under H0: Rho=0 math reading math 1.00000 0.27198 0.2460 reading 0.27198 1.00000 0.2460 Math vs Reading Scores with Outlier

39 39 Pearson Correlation Coefficients, N = 14 Prob > |r| under H0: Rho=0 math reading math 1.00000 -0.1973 0.5194 reading -0.1973 1.00000 0.5194

40 40 Pearson Correlation Coefficients, N = 14 Prob > |r| under H0: Rho=0 math reading math 1.00000 0.53211 0.0502 reading 0.53211 1.00000 0.0502

41 41 Divorce Rate (per 1000) % in prison on Drug Offenses

42 IMPORTANT NOTE: Correlation DOES NOT Imply Causation strong association between 2 variables is not enough to justify conclusions about cause and effect best way to get evidence that X causes Y is through a controlled experiment

43 43 11.1-5 Regression Analysis

44 44

45 45 Goal of Regression Analysis: Predict Y from knowledge of X For data such as the Father-Son data, it seems reasonable to assume a model of the form i.e. the conditional means of Y given x follow a straight line

46 46 Alternative mathematical expression for the “regression model”: In practice, we want to estimate this line from the data.

47

48

49

50

51

52

53 Which line is “closest” to the points ?

54 Criterion for measuring “closeness” --- the sum of squared vertical distances from the points to the line Regression (Least Squares) Line --- the line for which this sum-of-squared distance is a minimum

55 55 Notation Theoretical Model Regression line

56 56 Data we write

57 57 NOTE: - this is a calculus problem

58 58 Least Squares Estimates Computation Formula

59 59 Study Time Exam (hours) Score (X) (Y) 10 92 15 81 12 84 20 74 8 85 16 80 14 84 22 80 The data below are the study times and the test scores on an exam given over the material covered during the two weeks. Find the equation of the regression line for prediction exam score from study time.

60 60 The GLM Procedure Dependent Variable: score Sum of Source DF Squares Mean Square F Value Pr > F Model 1 112.8883610 112.8883610 9.02 0.0239 Error 6 75.1116390 12.5186065 Corrected Total 7 188.0000000 R-Square Coeff Var Root MSE score Mean 0.600470 4.288684 3.538164 82.50000 Source DF Type I SS Mean Square F Value Pr > F time 1 112.8883610 112.8883610 9.02 0.0239 Source DF Type III SS Mean Square F Value Pr > F time 1 112.8883610 112.8883610 9.02 0.0239 Standard Parameter Estimate Error t Value Pr > |t| Intercept 94.86698337 4.30408629 22.04 <.0001 time -0.84560570 0.28159265 -3.00 0.0239 PROC GLM; MODEL score=time; RUN;


Download ppt "1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation Note: Homework Due Thursday."

Similar presentations


Ads by Google