Presentation on theme: "Correlation with a Non - Linear Emphasis Day 2. Correlation measures the strength of the linear association between 2 quantitative variables. Before."— Presentation transcript:
Correlation measures the strength of the linear association between 2 quantitative variables. Before you use correlation, you must check several conditions:
Quantitative Variables Condition: Are both variables quantitative? Straight Enough Condition: Is the form of the scatterplot straight enough that a linear relationship makes sense? If the relationship is not linear, the correlation will be misleading. Outlier Condition: Outliers can distort the correlation dramatically. If an outlier is present it is often good to report the correlation with and without that point.
A hidden variable that stands behind a relationship and determines it by simultaneously affecting the other two variables is called a lurking (confounding) variable. Scatterplots and correlation coefficients NEVER prove causation.
Don’t ever assume the relationship is linear just because the correlation coefficient is high. In order to determine whether a relationship is linear or not linear, we must always look at the residual plot.
Residuals A residual is the vertical distance between a data point and the graph of a regression equation.
The Residual is positive if the data point is above the graph. negative if the data point is below the graph. Is 0 only when the graph passes through the data point.
What should you look for to tell if it is not linear?...... Sometimes a high “r” value for linear regression is deceptive. You must look at the scatter plot AND you must look at the residual pattern it makes. If the residuals have a curved pattern then it is NOT linear.
To prove linearity A scatterplot of the residuals vs. the x- values should be the most boring scatterplot you’ve ever seen. It shouldn’t have any interesting features, like a direction or a shape. It should stretch horizontally, with about the same amount of scatter throughout. It should show no bends. It should show no outliers.
Some Non Linear Regression Shapes…… Positive Quadratic Regression: Negative Quadratic Regression:
More Non Linear Regression Shapes…… Positive Exponential Regression: Negative Exponential Regression:
Quadratic and Exponential on GDC…… Quadratic: Exponential:
Example……The scatter plot could possibly be linear. You must check the residual pattern. xy 516.3 109.7 158.1 204.2 451.9 253.4 601.3
Change y-list to resid after running a linear correlation regression – 2 nd stat resid: Notice the curved pattern in the residuals.
NOTE!!!!!! Just because the curved pattern on the residuals looks like a quadratic we cannot determine that until we check the “r” value of other curved functions and see how well the data fits. You should also consider “real-life” implications when deciding.
When you see that the residuals are curved you must check the correlation coefficient for the exponential and the quadratic to choose the stronger correlation. A check on the exponential regression yield an r – value of -0.956. (Strong Negative but check out the quadratic….)
This is a quadratic regression….. Equation: y=.00946x² - 0.839x+18.5 r = 0.966 This value is even stronger than the exponential.
Example 2……Is it linear? xy 01 -30.125 -40.0625 38 416 532
Look at the residuals…… There is a curved pattern in the residuals. It is NOT linear – it is either quadratic or exponential. (Positive) Use the “r” value to help you decide.
And the Winner is….. Here is the equation you should use for predictions: y = 1(2) x