Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter Outline EMPIRICAL MODELS 11-2 SIMPLE LINEAR REGRESSION 11-3 PROPERTIES OF THE LEAST SQUARES ESTIMATORS 11-4 SOME COMMENTS ON USES OF REGRESSION.

Similar presentations


Presentation on theme: "Chapter Outline EMPIRICAL MODELS 11-2 SIMPLE LINEAR REGRESSION 11-3 PROPERTIES OF THE LEAST SQUARES ESTIMATORS 11-4 SOME COMMENTS ON USES OF REGRESSION."— Presentation transcript:

1

2 Chapter Outline EMPIRICAL MODELS 11-2 SIMPLE LINEAR REGRESSION 11-3 PROPERTIES OF THE LEAST SQUARES ESTIMATORS 11-4 SOME COMMENTS ON USES OF REGRESSION (CD ONLY) 11-5 HYPOTHESIS TESTS IN SIMPLE LINEAR REGRESSION 11-5.1 Use of t-Tests 11-5.2 Analysis of Variance Approach to Test Significance of Regression 11-6 CONFIDENCE INTERVALS 11-6.1 Confidence Intervals on the Slope and Intercept 11-6.2 Confidence Interval on the Mean Response 11-7 PREDICTION OF NEW OBSERVATIONS 11-8 ADEQUACY OF THE REGRESSION MODEL 11-8.1 Residual Analysis 11-8.2 Coefficient of Determination (R2) 11-9 TRANSFORMATIONS TO A STRAIGHT LINE 11-11 CORRELATION

3 EMPIRICAL MODELS Experimental, tentative, experiential (they provided considerable empirical evidence to support their argument ) جريبي, اختباري Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis is a statistical technique that is very useful for these types of problems. For example, in a chemical process, suppose that the yield of the product is related to the process- operating temperature. Regression analysis can be used to build a model to predict yield at a given temperature level. This model can also be used for process optimization, such as finding the level of temperature that maximizes yield, or for process control purposes. As an illustration, consider the data in Table 11-1. In this table y is the purity of oxygen produced in a chemical distillation process, and x is the percentage of hydrocarbons that are present in the main condenser of the distillation unit. Figure 11-1 presents a scatter diagram

4 Scatter diagram of Table 11-1. This is just a graph on which each (xi, yi) pair is represented as a point plotted in a two-dimensional coordinate system. This scatter diagram was produced by Minitab, and we selected an option that shows dot diagrams of the x and y variables along the top and right margins of the graph, respectively, making it easy to see the distributions of the individual variables (box plots or histograms could also be selected). Inspection of this scatter diagram indicates that, although no simple curve will pass exactly through all the points, there is a strong indication that the points lie scattered randomly around a straight line.???

5 EMPIRICAL MODELS Therefore, it is probably reasonable to assume that the mean of the random variable Y is related to x by the following straight-line relationship: Where the slope and intercept of the line are called regression coefficients. While the mean of Y is a linear function of x, the actual observed value y does not fall exactly on a straight line. The appropriate way to generalize this to a Probabilistic Linear Model is to assume that the expected value of Y is a linear function of x, but that for a fixed value of x the actual value of Y is determined by the mean value function (the linear model) plus a random error term, say, where is the random error term. We will call this model the simple linear regression model, a. because it has only one independent variable or regressor.. Sometimes a model like this will arise from a theoretical relationship. At other times, we will have no theoretical knowledge of the relationship between x and y, and the choice of the model is based on inspection of a scatter diagram, such as we did with the oxygen purity data. We then think of the regression model as an empirical model.

6 Derivation Suppose that the mean

7

8 Conclusion

9 SIMPLE LINEAR REGRESSION another look

10 Figure 11-3 Deviations of the data from the estimated regression model.

11 Method Of Least Squares.

12

13 The residual describes the error in the fit of the model to the i th observation of y i

14 xiyixiyi xi2xi2 89.10990.9801 90.8311.0404 105.14451.3225 120.92461.6641 141.22582.1316 128.4521.8496 76.20330.7569 112.87711.5129 154.1012.4025 131.111.96 111.31261.4161 106.3981.3225 88.74880.9604 90.43541.0201 99.73351.2321 108.4681.44 117.7951.5876 123.30121.7424 135.82142.0449 82.96350.9025

15 Example home work !!!! But add 2 to column 1 and 2 to Colum 2 compare equation

16

17

18

19 Is the total sum of squares of the response variable y

20 Example 9475113110172155193181240Y 3340.54335.5222015.59.41.6X a)Fit the simple linear regression model using least squares method b)Find an estimator of σ 2 c)Predict wear when viscosity x = 30 d)Obtain the fitted value of y when x = 22 and calculate the corresponding residual

21

22 Properties of the least square estimators

23 Hypothesis test in simple linear regression An important part of assessing the adequacy of a linear regression model is testing statistical hypotheses about the model parameters and constructing certain confidence intervals. Hypothesis testing in simple linear regression is discussed in this section, and Section 11-6 presents methods for constructing confidence intervals. To test hypotheses about the slope and intercept of the regression model, we must make the additional assumption that the error component in the model,, is normally distributed. Thus, the complete assumptions are that the errors are normally and independently distributed with mean zero and variance, abbreviated NID(0,).

24

25

26 Hypothesis test in simple linear regression We would reject the null hypothesis if

27 Special Case Accept the null hypothesis is equivalent to conclude that there is no linear relationship between x and y. Accept the null hypothesis

28 Reject the null hypothesis

29 Example

30

31

32 Other example

33

34

35 Regression analysis is used to investigate and model the relationship between a response variable and one or more predictors. Minitab provides least squares, nonlinear, orthogonal, partial least squares, and logistic regression procedures: · Use least squares procedures when your response variable is continuous. · Use nonlinear regression when you cannot adequately model the relationship with linear parameters.nonlinear regression · Use orthogonal regression when the response and predictor both contain measurement error.orthogonal regression · Use partial least squares regression when your predictors are highly correlated or outnumber your observations. · Use logistic regression when your response variable is categorical. Both least squares and logistic regression methods estimate parameters in the model so that the fit of the model is optimized. Least squares methods minimize the sum of squared errors to obtain parameter estimates, whereas Minitab's logistic regression obtains maximum likelihood estimates of the parameters. Partial least squares (PLS) extracts linear combinations of the predictors to minimize prediction error. See Partial Least Squares Overview for more information.Partial Least Squares Overview Use the table below to select a procedure:

36 Use... To...Response type Estimation method Regressionperform simple or multiple least squares regressioncontinuousleast squares General Regressionperform simple, multiple regression or polynomial least squares regression with continuous and categorical predictors, with no need to create indicator variables continuousleast squares Stepwiseperform stepwise, forward selection, or backward elimination to identify a useful subset of predictors continuousleast squares Best Subsets identify subsets of the predictors based on the maximum R criterion continuousleast squares Fitted Line Plot perform linear and polynomial regression with a single predictor and plot a regression line through the data continuousleast squares Nonlinear Regressionperform simple or multiple regression using the nonlinear function of your choice continuousleast squares Orthogonal Regressionperform orthogonal regression with one response and one predictor continuousorthogonal PLSperform regression with ill-conditioned data ill-conditioned datacontinuousbiased, non-least squares Binary Logistic perform logistic regression on a response with only two possible values, such as presence or absence categoricalmaximum likelihood Ordinal Logistic perform logistic regression on a response with three or more possible values that have a natural order, such as none, mild, or severe categoricalmaximum likelihood Nominal Logistic perform logistic regression on a response with three or more possible values that have no natural order, such as sweet, salty, or sour categoricalmaximum likelihood

37 New lecture

38 Analysis of Variance Approach to Test Significance of Regression A method called the analysis of variance can be used to test for significance of regression. The procedure partitions the total variability in the response variable into meaningful components as the basis for the test. The analysis of variance identity is as follows:

39 Analysis of Variance Approach to Test Significance of Regression A method called the analysis of variance can be used to test the significance of regression The analysis of variance identity can be written as follow: SS R : Regression sum of squares SS E : Error sum of squares Total corrected sum of squares SS T = SS R + SS E

40

41 Analysis of Variance Approach to Test Significance of Regression

42 Example

43 Find ANOVA table

44

45

46 Confidence Intervals on the Slope and Intercept In addition to point estimates of the slope and intercept, it is possible to obtain confidence interval estimates of these parameters. The width of these confidence intervals is a measure of the overall quality of the regression line. If the error terms, i, in the regression model are normally and independently distributed,

47 example

48 Confidence Intervals on the Mean Response

49

50

51

52 11-7 Prediction Of New Observations

53 Prediction of New Observations

54

55

56 11-8 ADEQUACY OF THE REGRESSION MODEL Fitting a regression model requires several assumptions. 1. Estimation of the model parameters requires the assumption that the errors are uncorrelated random variables with mean zero and constant variance. 2. Tests of hypotheses and interval estimation require that the errors be normally distributed. 3. In addition, we assume that the order of the model is correct; that is, if we fit a simple linear regression model, we are assuming that the phenomenon actually behaves in a linear or first-order manner. The analyst should always consider the validity of these assumptions to be doubtful and conduct analyses to examine the adequacy of the model that has been tentatively entertained. In this section we discuss methods useful in this respect.

57 11-8.1 Residual Analysis The residuals from a regression model are, where yi is an actual observation and is the corresponding fitted value from the regression model. Analysis of the residuals is frequently helpful in checking the assumption that the errors are approximately normally distributed with constant variance, and in determining whether additional terms in the model would be useful. As an approximate check of normality, the experimenter can construct a frequency histogram of the residuals or a normal probability plot of residuals. Many computer programs will produce a normal probability plot of residuals, and since the sample sizes in regression are often too small for a histogram to be meaningful, the normal probability plotting method

58

59 a)Satisfactory b)Funnel Variance increases with x c) double bow Inequality of variance d) Nonlinear Model inadequacy

60 example

61

62 For the oxygen purity regression model, R 2 = 0.877; model accounts for 87.7% of the variability in the data

63 11-8.2 Coefficient of Determination(R2)

64

65

66 11-9 TRANSFORMATIONS TO A STRAIGHT LINE We occasionally find that the straight-line regression model is inappropriate because the true regression function is nonlinear. Sometimes nonlinearity is visually determined from the scatter diagram, and sometimes, because of prior experience or underlying theory, we know in advance that the model is nonlinear. Occasionally, a scatter diagram will exhibit an apparent nonlinear relationship between Y and x. In some of these situations, a nonlinear function can be expressed as a straight line by using a suitable transformation. Such nonlinear models are called intrinsically linear.

67

68

69 11-11 CORRELATION


Download ppt "Chapter Outline EMPIRICAL MODELS 11-2 SIMPLE LINEAR REGRESSION 11-3 PROPERTIES OF THE LEAST SQUARES ESTIMATORS 11-4 SOME COMMENTS ON USES OF REGRESSION."

Similar presentations


Ads by Google