Presentation on theme: "Linear Regression (C7-9 BVD). * Explanatory variable goes on x-axis * Response variable goes on y-axis * Don’t forget labels and scale * Statplot 1 st."— Presentation transcript:
Linear Regression (C7-9 BVD)
* Explanatory variable goes on x-axis * Response variable goes on y-axis * Don’t forget labels and scale * Statplot 1 st option, specify lists, Zoom 9 * Direction: Positive slope or negative slope * Unusual points – outliers, influential points * Shape – straight or curves * Scatter – weak, moderate, strong
* Correlation measures the strength of linear association between two variables. r =( Σz x z y )/(n-1) * r is always between -1 and 1, inclusive. Negative r means a negative direction, positive r means positive direction. If r is close to 0, that indicates a weaker association, and near 1 or -1 indicates a stronger association. If r is exactly 1 or -1, that means the data are exactly linear. * Correlation is affected a lot by outliers: always look at the scatterplot and residual plot to make sure a linear model makes sense!
* Use a “hat” over y variable to indicate it is a model’s prediction vs. an actual data value. * Use words for y and x variables * y = mx + b but statisticians may use b 1 for slope and b 0 for y-intercept, and the calculator may use b for slope and a for y-intercept * To find a predicted y, plug x into equation and find y! * Extrapolation, or using your model to make predictions for x’s far from the x’s used to make the model, is likely to lead to inaccurate or worthless predictions.
* Residual = actual y value for a given x minus y value model predicts for a given x * Positive residual means model underestimated, negative residual means model overestimated * Least Squares Regression is a procedure to get line of best fit that minimizes the sum of the squares of residuals. * LinReg L 1,L 2,Y 1 in calculator (make sure diagnostics are on to get r and r 2 )
* When doing Linear Regression, always check the scatterplot (DUSS) to make sure shape is basically straight and ALSO check residuals plot for any patterns. * RESID list is automatically generated by calculator whenever you do LINREG. It is under 2 nd Stat Edit. Be careful, it overwrites itself if you do a new LINREG. Graph by changing Y-list in Statplot to RESID. * You can also store the RESID list in another list like L 3 for easy viewing of the residuals. * If you find standard deviation of residuals (1-var stats would work), it represents the approximate size of a “typical” prediction error (how far off the model will typically be for a given x).
* If you have stats (mean, standard deviation of x and y) instead of data, use these formulas to find line of best fit: * b 1 = rS y /S x * b 0 = mean y – b 1 (mean x) * If you want to switch explanatory and response, you can use these equations with reversed x and y means and standard deviations to find the new model.
* ___% of the variation in y (context) is accounted for by the regression line of y on x (context).
* Outliers are points far from the other points. * Outliers that are far away in the y-direction but not x-direction have large residuals and may artificially raise r. * Outliers that are far away in the x-direction pull the regression line toward them and may artificially lower r. These are called influential points. * Consider removing influential points before regression and considering them separately. * Also consider alternative models if appropriate. * Do not assume large r means a linear model is THE best model for the data or that it implies a cause/effect relationship between the variables.