Presentation is loading. Please wait.

Presentation is loading. Please wait.

6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is.

Similar presentations


Presentation on theme: "6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is."— Presentation transcript:

1

2

3

4 6-1 Introduction To Empirical Models

5 Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is related to x by the following straight-line relationship: where the slope and intercept of the line are called regression coefficients. The simple linear regression model is given by where  is the random error term.

6 6-1 Introduction To Empirical Models

7 We think of the regression model as an empirical model. Suppose that the mean and variance of  are 0 and  2, respectively, then The variance of Y given x is 6-1 Introduction To Empirical Models

8 The true regression model is a line of mean values: where  1 can be interpreted as the change in the mean of Y for a unit change in x. Also, the variability of Y at a particular value of x is determined by the error variance,  2. This implies there is a distribution of Y-values at each x and that the variance of this distribution is the same at each x. 6-1 Introduction To Empirical Models

9

10 A Multiple Linear Regression Model:

11 6-2 Simple Linear Regression 6-2.1 Least Squares Estimation The case of simple linear regression considers a single regressor or predictor x and a dependent or response variable Y. The expected value of Y at each level of x is a random variable: We assume that each observation, Y, can be described by the model

12 6-2 Simple Linear Regression 6-2.1 Least Squares Estimation Suppose that we have n pairs of observations (x 1, y 1 ), (x 2, y 2 ), …, (x n, y n ). The method of least squares is used to estimate the parameters,  0 and  1 by minimizing the sum of the squares of the vertical deviations in Figure 6-6.

13 6-2 Simple Linear Regression 6-2.1 Least Squares Estimation Using Equation 6-8, t he n observations in the sample can be expressed as The sum of the squares of the deviations of the observations from the true regression line is

14 6-2 Simple Linear Regression 6-2.1 Least Squares Estimation

15 6-2 Simple Linear Regression 6-2.1 Least Squares Estimation

16 6-2 Simple Linear Regression 6-2.1 Least Squares Estimation

17 6-2 Simple Linear Regression 6-2.1 Least Squares Estimation

18 6-2 Simple Linear Regression 6-2.1 Least Squares Estimation

19 6-2 Simple Linear Regression 6-2.1 Least Squares Estimation

20 6-2 Simple Linear Regression 6-2.1 Least Squares Estimation

21 6-2 Simple Linear Regression 6-2.1 Least Squares Estimation

22 6-2 Simple Linear Regression Sums of Squares and Cross-products Matrix The Sums of squares and cross-products matrix is a convenient way to summarize the quantities needed to do the hand calculations in regression. It also plays a key role in the internal calculations of the computer. It is outputed from PROC REG and PROC GLM if the XPX option is included on the model statement. The elements are X’X InterceptXY n X Y

23 6-2 Simple Linear Regression 6-2.1 Least Squares Estimation

24 6-2 Simple Linear Regression Regression Assumptions and Model Properties

25 6-2 Simple Linear Regression Regression Assumptions and Model Properties

26 6-2 Simple Linear Regression Regression and Analysis of Variance

27 6-2 Simple Linear Regression Regression and Analysis of Variance

28 6-2 Simple Linear Regression 6-2.2 Testing Hypothesis in Simple Linear Regression Use of t-Tests Suppose we wish to test An appropriate test statistic would be

29 6-2 Simple Linear Regression 6-2.2 Testing Hypothesis in Simple Linear Regression Use of t-Tests We would reject the null hypothesis if

30 6-2 Simple Linear Regression 6-2.2 Testing Hypothesis in Simple Linear Regression Use of t-Tests Suppose we wish to test An appropriate test statistic would be

31 6-2 Simple Linear Regression 6-2.2 Testing Hypothesis in Simple Linear Regression Use of t-Tests We would reject the null hypothesis if

32 6-2 Simple Linear Regression 6-2.2 Testing Hypothesis in Simple Linear Regression Use of t-Tests An important special case of the hypotheses of Equation 6-23 is These hypotheses relate to the significance of regression. Failure to reject H 0 is equivalent to concluding that there is no linear relationship between x and Y.

33 6-2 Simple Linear Regression 6-2.2 Testing Hypothesis in Simple Linear Regression Use of t-Tests

34 6-2 Simple Linear Regression 6-2.2 Testing Hypothesis in Simple Linear Regression Use of t-Tests

35 6-2 Simple Linear Regression

36 The Analysis of Variance Approach

37 6-2 Simple Linear Regression 6-2.2 Testing Hypothesis in Simple Linear Regression The Analysis of Variance Approach

38 6-2 Simple Linear Regression 6-2.3 Confidence Intervals in Simple Linear Regression

39 6-2 Simple Linear Regression 6-2.3 Confidence Intervals in Simple Linear Regression

40 6-2 Simple Linear Regression

41

42 6-2.4 Prediction of Future Observations

43 6-2 Simple Linear Regression

44 6-2.4 Prediction of Future Observations

45 6-2 Simple Linear Regression 6-2.5 Checking Model Adequacy Fitting a regression model requires several assumptions. 1.Errors are uncorrelated random variables with mean zero; 2.Errors have constant variance; and, 3.Errors be normally distributed. The analyst should always consider the validity of these assumptions to be doubtful and conduct analyses to examine the adequacy of the model

46 6-2 Simple Linear Regression 6-2.5 Checking Model Adequacy The residuals from a regression model are e i = y i - ŷ i, where y i is an actual observation and ŷ i is the corresponding fitted value from the regression model. Analysis of the residuals is frequently helpful in checking the assumption that the errors are approximately normally distributed with constant variance, and in determining whether additional terms in the model would be useful.

47 6-2 Simple Linear Regression 6-2.5 Checking Model Adequacy

48 6-2 Simple Linear Regression 6-2.5 Checking Model Adequacy

49 6-2 Simple Linear Regression 6-2.5 Checking Model Adequacy

50 6-2 Simple Linear Regression 6-2.5 Checking Model Adequacy

51 6-2 Simple Linear Regression 6-2.5 Checking Model Adequacy

52 6-2 Simple Linear Regression Example 6-1 OPTIONS NOOVP NODATE NONUMBER LS=140; DATA ex61; INPUT salt area @@; LABEL salt='Salt Conc' area='Roadway area'; CARDS; 3.80.19 5.90.15 14.10.57 10.40.4 14.60.7 14.50.67 15.10.63 11.90.47 15.50.75 9.30.6 15.60.78 20.80.81 14.60.78 16.60.69 25.61.3 20.91.05 29.91.52 19.61.06 31.31.74 32.71.62 ods graphics on; PROC REG DATA=EX61; MODEL SALT=AREA/XPX R; DATA EX61N; AREA=1.25; OUTPUT; DATA EX61N1; SET EX61 EX61N; ods graphics off; PROC REG DATA=EX61N1; MODEL SALT=AREA/CLM CLI; /* CLM for (100-??) % confidence limits for the expected value of the dependent variable, CLI for (100-??) % confidence limits for an individual predicted value */ TITLE 'CIs FOR MEAN RESPONSE AND FUTURE OBSERVATION'; RUN; QUIT;

53 6-2 Simple Linear Regression Linear Regression of SALT vs AREA The REG Procedure Model: MODEL1 Dependent Variable: salt Salt Conc Number of Observations Read 20 Number of Observations Used 20 ----------------------------------------------------------------------------------------- Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 1130.14924 1130.14924 352.46 <.0001 Error 18 57.71626 3.20646 Corrected Total 19 1187.86550 Root MSE 1.79066 R-Square 0.9514 Dependent Mean 17.13500 Adj R-Sq 0.9487 Coeff Var 10.45030 Parameter Estimates Parameter Standard Variable Label DF Estimate Error t Value Pr > |t| Intercept Intercept 1 2.67655 0.86800 3.08 0.0064 area Roadway area 1 17.54667 0.93463 18.77 <.0001 Linear Regression of SALT vs AREA The REG Procedure Model: MODEL1 Model Crossproducts X'X X'Y Y'Y Variable Label Intercept area salt Intercept Intercept 20 16.48 342.7 area Roadway area 16.48 17.2502 346.793 salt Salt Conc 342.7 346.793 7060.03 -----------------------------------------------------------------------------------------

54 6-2 Simple Linear Regression SAS 시스템 The REG Procedure Model: MODEL1 Dependent Variable: salt Salt Conc Output Statistics Dependent Predicted Std Error Std Error Student Cook's Obs Variable Value Mean Predict Residual Residual Residual -2-1 0 1 2 D 1 3.8000 6.0104 0.7152 -2.2104 1.642 -1.346 | **| | 0.172 2 5.9000 5.3085 0.7464 0.5915 1.628 0.363 | | | 0.014 3 14.1000 12.6781 0.4655 1.4219 1.729 0.822 | |* | 0.025 4 10.4000 9.6952 0.5633 0.7048 1.700 0.415 | | | 0.009 5 14.6000 14.9592 0.4168 -0.3592 1.741 -0.206 | | | 0.001 6 14.5000 14.4328 0.4255 0.0672 1.739 0.0386 | | | 0.000 7 15.1000 13.7309 0.4395 1.3691 1.736 0.789 | |* | 0.020 8 11.9000 10.9235 0.5194 0.9765 1.714 0.570 | |* | 0.015 9 15.5000 15.8365 0.4063 -0.3365 1.744 -0.193 | | | 0.001 10 9.3000 13.2045 0.4518 -3.9045 1.733 -2.253 | ****| | 0.173 11 15.6000 16.3629 0.4025 -0.7629 1.745 -0.437 | | | 0.005 12 20.8000 16.8893 0.4006 3.9107 1.745 2.241 | |**** | 0.132 13 14.6000 16.3629 0.4025 -1.7629 1.745 -1.010 | **| | 0.027 14 16.6000 14.7837 0.4195 1.8163 1.741 1.043 | |** | 0.032 15 25.6000 25.4872 0.5985 0.1128 1.688 0.0668 | | | 0.000 16 20.9000 21.1005 0.4527 -0.2005 1.732 -0.116 | | | 0.000 17 29.9000 29.3475 0.7639 0.5525 1.620 0.341 | | | 0.013 18 19.6000 21.2760 0.4571 -1.6760 1.731 -0.968 | *| | 0.033 19 31.3000 33.2077 0.9451 -1.9077 1.521 -1.254 | **| | 0.304 20 32.7000 31.1021 0.8449 1.5979 1.579 1.012 | |** | 0.147 Sum of Residuals 0 Sum of Squared Residuals 57.71626 Predicted Residual SS (PRESS) 70.97373

55 6-2 Simple Linear Regression

56

57

58

59 CIs FOR MEAN RESPONSE AND FUTURE OBSERVATION The REG Procedure Model: MODEL1 Dependent Variable: salt Salt Conc Number of Observations Read 21 Number of Observations Used 20 Number of Observations with Missing Values 1 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 1130.14924 1130.14924 352.46 <.0001 Error 18 57.71626 3.20646 Corrected Total 19 1187.86550 Root MSE 1.79066 R-Square 0.9514 Dependent Mean 17.13500 Adj R-Sq 0.9487 Coeff Var 10.45030 Parameter Estimates Parameter Standard Variable Label DF Estimate Error t Value Pr > |t| Intercept Intercept 1 2.67655 0.86800 3.08 0.0064 area Roadway area 1 17.54667 0.93463 18.77 <.0001

60 6-2 Simple Linear Regression CIs FOR MEAN RESPONSE AND FUTURE OBSERVATION The REG Procedure Model: MODEL1 Dependent Variable: salt Salt Conc Output Statistics Dependent Predicted Std Error Obs Variable Value Mean Predict 95% CL Mean 95% CL Predict Residual 1 3.8000 6.0104 0.7152 4.5079 7.5129 1.9594 10.0614 -2.2104 2 5.9000 5.3085 0.7464 3.7404 6.8767 1.2328 9.3843 0.5915 3 14.1000 12.6781 0.4655 11.7002 13.6561 8.7911 16.5652 1.4219 4 10.4000 9.6952 0.5633 8.5117 10.8788 5.7514 13.6390 0.7048 5 14.6000 14.9592 0.4168 14.0835 15.8350 11.0966 18.8218 -0.3592 6 14.5000 14.4328 0.4255 13.5389 15.3267 10.5660 18.2996 0.0672 7 15.1000 13.7309 0.4395 12.8075 14.6544 9.8572 17.6047 1.3691 8 11.9000 10.9235 0.5194 9.8322 12.0147 7.0064 14.8406 0.9765 9 15.5000 15.8365 0.4063 14.9829 16.6902 11.9789 19.6942 -0.3365 10 9.3000 13.2045 0.4518 12.2553 14.1538 9.3246 17.0845 -3.9045 11 15.6000 16.3629 0.4025 15.5173 17.2086 12.5070 20.2189 -0.7629 12 20.8000 16.8893 0.4006 16.0477 17.7310 13.0343 20.7444 3.9107 13 14.6000 16.3629 0.4025 15.5173 17.2086 12.5070 20.2189 -1.7629 14 16.6000 14.7837 0.4195 13.9023 15.6652 10.9198 18.6477 1.8163 15 25.6000 25.4872 0.5985 24.2297 26.7447 21.5206 29.4538 0.1128 16 20.9000 21.1005 0.4527 20.1495 22.0516 17.2201 24.9809 -0.2005 17 29.9000 29.3475 0.7639 27.7427 30.9523 25.2575 33.4375 0.5525 18 19.6000 21.2760 0.4571 20.3156 22.2364 17.3933 25.1587 -1.6760 19 31.3000 33.2077 0.9451 31.2221 35.1934 28.9538 37.4616 -1.9077 20 32.7000 31.1021 0.8449 29.3271 32.8772 26.9424 35.2619 1.5979 21. 24.6099 0.5647 23.4236 25.7962 20.6652 28.5545. Sum of Residuals 0 Sum of Squared Residuals 57.71626 Predicted Residual SS (PRESS) 70.97373

61 6-2 Simple Linear Regression 6-2.6 Correlation and Regression The sample correlation coefficient between X and Y is

62 6-2 Simple Linear Regression 6-2.6 Correlation and Regression The sample correlation coefficient is also closely related to the slope in a linear regression model

63 6-2 Simple Linear Regression

64

65 6-2.6 Correlation and Regression It is often useful to test the hypotheses The appropriate test statistic for these hypotheses is Reject H 0 if |t 0 | > t  /2,n-2.

66 6-2 Simple Linear Regression OPTIONS NOOVP NODATE NONUMBER LS=80; DATA ex61; INPUT salt area @@; LABEL salt='Salt Conc' area='Roadway area'; CARDS; 3.80.19 5.90.15 14.10.57 10.40.4 14.60.7 14.50.67 15.10.63 11.90.47 15.50.75 9.30.6 15.60.78 20.80.81 14.60.78 16.60.69 25.61.3 20.91.05 29.91.52 19.61.06 31.31.74 32.71.62 PROC CORR DATA=EX61; VAR SALT AREA; TITLE 'Correlation between SALT and AREA'; PROC REG; MODEL salt=area/XPX; TITLE 'Linear Regression of SALT vs AREA'; RUN; QUIT; Example 6-1 (Continued)

67 6-2 Simple Linear Regression Correlation between SALT and AREA CORR 프로시저 2 개의 변수 : salt area 단순 통계량 변수 N 평균 표준편차 합 최솟값 최댓값 salt 20 17.13500 7.90691 342.70000 3.80000 32.70000 area 20 0.82400 0.43954 16.48000 0.15000 1.74000 단순 통계량 변수 레이블 salt Salt Conc area Roadway area 피어슨 상관 계수, N = 20 H0: Rho=0 가정하에서 Prob > |r| salt area salt 1.00000 0.97540 Salt Conc <.0001 area 0.97540 1.00000 Roadway area <.0001

68 6-2 Simple Linear Regression Linear Regression of SALT vs AREA The REG Procedure Model: MODEL1 Model Crossproducts X'X X'Y Y'Y Variable Label Intercept area salt Intercept Intercept 20 16.48 342.7 area Roadway area 16.48 17.2502 346.793 salt Salt Conc 342.7 346.793 7060.03 The REG Procedure Model: MODEL1 Dependent Variable: salt Salt Conc Number of Observations Read 20 Number of Observations Used 20 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 1130.14924 1130.14924 352.46 <.0001 Error 18 57.71626 3.20646 Corrected Total 19 1187.86550 Root MSE 1.79066 R-Square 0.9514 Dependent Mean 17.13500 Adj R-Sq 0.9487 Coeff Var 10.45030 Parameter Estimates Parameter Standard Variable Label DF Estimate Error t Value Pr > |t| Intercept Intercept 1 2.67655 0.86800 3.08 0.0064 area Roadway area 1 17.54667 0.93463 18.77 <.0001


Download ppt "6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is."

Similar presentations


Ads by Google