# Probability & Statistics for Engineers & Scientists, by Walpole, Myers, Myers & Ye ~ Chapter 11 Notes Class notes for ISE 201 San Jose State University.

## Presentation on theme: "Probability & Statistics for Engineers & Scientists, by Walpole, Myers, Myers & Ye ~ Chapter 11 Notes Class notes for ISE 201 San Jose State University."— Presentation transcript:

Probability & Statistics for Engineers & Scientists, by Walpole, Myers, Myers & Ye ~ Chapter 11 Notes Class notes for ISE 201 San Jose State University Industrial & Systems Engineering Dept. Steve Kennedy 1

Simple Linear Regression
If there is a linear relationship between an independent variable x and a dependent variable Y, then where  is the intercept and  is the slope of the linear relationship, and  is the random error, assumed to be normally distributed with mean  = 0 and variance 2 . Residuals: Given regression data points [(xi, yi), i = 1, 2, ..., n], if yihat = a + bxi is the estimate of yi using the linear model, then the residual ei is given by The residual for each data point is the distance of the point from the line in the y direction. We will use the "least squares" technique to minimize the sum of the squares of the residuals.

Least Squares Method We wish to find a and b to minimize the sum of the squares of the errors (residuals), SSE. To minimize, differentiate with respect to a and b, and set each result to 0. This generates two simultaneous equations (called normal equations) & two unknowns. Solving for a and b, we get and a & b are the coefficients of the "best fit" straight line through the data points that minimize SSE.

Coefficient of Determination (R2)
The coefficient of determination R2 is a measure of the proportion of variability explained by the fitted model, and thus a measure of the quality of the linear fit. Recall from the previous slide that SSE is the sum of the squares of the errors (residuals), or the amount of variation unexplained by the straight line. SST, the total sum of squares, is the total variability in the data. Then R2 (the square of the correlation coefficient) is defined as R2 tells us the percent of the total variation in the data explained by the straight line relationship. If R2  1, all points are very close to the line.

Data Transformations for Regression
If the relationship between the variables is other than linear, we can first transform either the dependent or independent variable or both, and then perform a linear regression on the transformed variables. If, for example, we have: Exponential: If y = ex, use y* = ln y, and regress y* against x. Power: If y = x, use y* = ln y and x* = ln x, and regress y* against x*. Reciprocal: If y =  + (1/x), use x* = 1/x, and regress y against x*. Hyperbolic: If y = x/( + x), use y* = 1/y and use x* = 1/x, and regress y* against x*.

Multiple Linear Regression
In a multiple linear regression model, we have k independent variables, x1, x2, ..., xk. The model is The least-squares estimates of the coefficients can be calculated as with simple linear regression, except that there are k + 1 simultaneous equations to solve (use matrix inversion). R2 still describes the goodness of the linear relationship. Multiple linear regression can also be used to calculate the least squares coefficients for a polynomial model of the form by first calculating the square, cube, etc., of the independent variable and then doing a multiple linear regression.

Download ppt "Probability & Statistics for Engineers & Scientists, by Walpole, Myers, Myers & Ye ~ Chapter 11 Notes Class notes for ISE 201 San Jose State University."

Similar presentations