Download presentation

Presentation is loading. Please wait.

Published byArmani Rockwell Modified over 5 years ago

1
Linear Regression (C7-9 BVD)

2
* Explanatory variable goes on x-axis * Response variable goes on y-axis * Don’t forget labels and scale * Statplot 1 st option, specify lists, Zoom 9 * Direction: Positive slope or negative slope * Unusual points – outliers, influential points * Shape – straight or curves * Scatter – weak, moderate, strong

3
* Correlation measures the strength of linear association between two variables. r =( Σz x z y )/(n-1) * r is always between -1 and 1, inclusive. Negative r means a negative direction, positive r means positive direction. If r is close to 0, that indicates a weaker association, and near 1 or -1 indicates a stronger association. If r is exactly 1 or -1, that means the data are exactly linear. * Correlation is affected a lot by outliers: always look at the scatterplot and residual plot to make sure a linear model makes sense!

4
* Use a “hat” over y variable to indicate it is a model’s prediction vs. an actual data value. * Use words for y and x variables * y = mx + b but statisticians may use b 1 for slope and b 0 for y-intercept, and the calculator may use b for slope and a for y-intercept * To find a predicted y, plug x into equation and find y! * Extrapolation, or using your model to make predictions for x’s far from the x’s used to make the model, is likely to lead to inaccurate or worthless predictions.

5
* Residual = actual y value for a given x minus y value model predicts for a given x * Positive residual means model underestimated, negative residual means model overestimated * Least Squares Regression is a procedure to get line of best fit that minimizes the sum of the squares of residuals. * LinReg L 1,L 2,Y 1 in calculator (make sure diagnostics are on to get r and r 2 )

6
* When doing Linear Regression, always check the scatterplot (DUSS) to make sure shape is basically straight and ALSO check residuals plot for any patterns. * RESID list is automatically generated by calculator whenever you do LINREG. It is under 2 nd Stat Edit. Be careful, it overwrites itself if you do a new LINREG. Graph by changing Y-list in Statplot to RESID. * You can also store the RESID list in another list like L 3 for easy viewing of the residuals. * If you find standard deviation of residuals (1-var stats would work), it represents the approximate size of a “typical” prediction error (how far off the model will typically be for a given x).

7
* If you have stats (mean, standard deviation of x and y) instead of data, use these formulas to find line of best fit: * b 1 = rS y /S x * b 0 = mean y – b 1 (mean x) * If you want to switch explanatory and response, you can use these equations with reversed x and y means and standard deviations to find the new model.

8
* ___% of the variation in y (context) is accounted for by the regression line of y on x (context).

9
* Outliers are points far from the other points. * Outliers that are far away in the y-direction but not x-direction have large residuals and may artificially raise r. * Outliers that are far away in the x-direction pull the regression line toward them and may artificially lower r. These are called influential points. * Consider removing influential points before regression and considering them separately. * Also consider alternative models if appropriate. * Do not assume large r means a linear model is THE best model for the data or that it implies a cause/effect relationship between the variables.

Similar presentations

© 2020 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google