Probability & Statistics for Engineers & Scientists, by Walpole, Myers, Myers & Ye ~ Chapter 11 Notes Class notes for ISE 201 San Jose State University.
Published byModified over 5 years ago
Presentation on theme: "Probability & Statistics for Engineers & Scientists, by Walpole, Myers, Myers & Ye ~ Chapter 11 Notes Class notes for ISE 201 San Jose State University."— Presentation transcript:
1 Probability & Statistics for Engineers & Scientists, by Walpole, Myers, Myers & Ye ~ Chapter 11 NotesClass notes for ISE 201San Jose State UniversityIndustrial & Systems Engineering Dept.Steve Kennedy1
2 Simple Linear Regression If there is a linear relationship between an independent variable x and a dependent variable Y, then where is the intercept and is the slope of the linear relationship, and is the random error, assumed to be normally distributed with mean = 0 and variance 2 .Residuals: Given regression data points [(xi, yi), i = 1, 2, ..., n], if yihat = a + bxi is the estimate of yi using the linear model, then the residual ei is given byThe residual for each data point is the distance of the point from the line in the y direction.We will use the "least squares" technique to minimize the sum of the squares of the residuals.
3 Least Squares MethodWe wish to find a and b to minimize the sum of the squares of the errors (residuals), SSE. To minimize, differentiate with respect to a and b, and set each result to 0. This generates two simultaneous equations (called normal equations) & two unknowns.Solving for a and b, we get anda & b are the coefficients of the "best fit" straight line through the data points that minimize SSE.
4 Coefficient of Determination (R2) The coefficient of determination R2 is a measure of the proportion of variability explained by the fitted model, and thus a measure of the quality of the linear fit.Recall from the previous slide that SSE is the sum of the squares of the errors (residuals), or the amount of variation unexplained by the straight line.SST, the total sum of squares, is the total variability in the data.Then R2 (the square of the correlation coefficient) is defined asR2 tells us the percent of the total variation in the data explained by the straight line relationship.If R2 1, all points are very close to the line.
5 Data Transformations for Regression If the relationship between the variables is other than linear, we can first transform either the dependent or independent variable or both, and then perform a linear regression on the transformed variables. If, for example, we have:Exponential: If y = ex, use y* = ln y, and regress y* against x.Power: If y = x, use y* = ln y and x* = ln x, and regress y* against x*.Reciprocal: If y = + (1/x), use x* = 1/x, and regress y against x*.Hyperbolic: If y = x/( + x), use y* = 1/y and use x* = 1/x, and regress y* against x*.
6 Multiple Linear Regression In a multiple linear regression model, we have k independent variables, x1, x2, ..., xk. The model isThe least-squares estimates of the coefficients can be calculated as with simple linear regression, except that there are k + 1 simultaneous equations to solve (use matrix inversion).R2 still describes the goodness of the linear relationship.Multiple linear regression can also be used to calculate the least squares coefficients for a polynomial model of the form by first calculating the square, cube, etc., of the independent variable and then doing a multiple linear regression.