 # Correlation with a Non - Linear Emphasis Day 2.  Correlation measures the strength of the linear association between 2 quantitative variables.  Before.

## Presentation on theme: "Correlation with a Non - Linear Emphasis Day 2.  Correlation measures the strength of the linear association between 2 quantitative variables.  Before."— Presentation transcript:

Correlation with a Non - Linear Emphasis Day 2

 Correlation measures the strength of the linear association between 2 quantitative variables.  Before you use correlation, you must check several conditions:

 Quantitative Variables Condition: Are both variables quantitative?  Straight Enough Condition: Is the form of the scatterplot straight enough that a linear relationship makes sense? If the relationship is not linear, the correlation will be misleading.  Outlier Condition: Outliers can distort the correlation dramatically. If an outlier is present it is often good to report the correlation with and without that point.

 A hidden variable that stands behind a relationship and determines it by simultaneously affecting the other two variables is called a lurking (confounding) variable.  Scatterplots and correlation coefficients NEVER prove causation.

 Don’t ever assume the relationship is linear just because the correlation coefficient is high.  In order to determine whether a relationship is linear or not linear, we must always look at the residual plot.

Residuals  A residual is the vertical distance between a data point and the graph of a regression equation.

The Residual is  positive if the data point is above the graph.  negative if the data point is below the graph.  Is 0 only when the graph passes through the data point.

What should you look for to tell if it is not linear?......  Sometimes a high “r” value for linear regression is deceptive. You must look at the scatter plot AND you must look at the residual pattern it makes.  If the residuals have a curved pattern then it is NOT linear.

To prove linearity  A scatterplot of the residuals vs. the x- values should be the most boring scatterplot you’ve ever seen.  It shouldn’t have any interesting features, like a direction or a shape.  It should stretch horizontally, with about the same amount of scatter throughout.  It should show no bends.  It should show no outliers.

Some Non Linear Regression Shapes……  Positive Quadratic Regression:  Negative Quadratic Regression:

More Non Linear Regression Shapes……  Positive Exponential Regression:  Negative Exponential Regression:

Example……The scatter plot could possibly be linear. You must check the residual pattern. xy 516.3 109.7 158.1 204.2 451.9 253.4 601.3

 Change y-list to resid after running a linear correlation regression – 2 nd stat resid:  Notice the curved pattern in the residuals.

NOTE!!!!!!  Just because the curved pattern on the residuals looks like a quadratic we cannot determine that until we check the “r” value of other curved functions and see how well the data fits.  You should also consider “real-life” implications when deciding.

 When you see that the residuals are curved you must check the correlation coefficient for the exponential and the quadratic to choose the stronger correlation.  A check on the exponential regression yield an r – value of -0.956. (Strong Negative but check out the quadratic….)

This is a quadratic regression…..  Equation: y=.00946x² - 0.839x+18.5 r = 0.966 This value is even stronger than the exponential.

Example 2……Is it linear? xy 01 -30.125 -40.0625 38 416 532

Look at the residuals……  There is a curved pattern in the residuals. It is NOT linear – it is either quadratic or exponential. (Positive)  Use the “r” value to help you decide.

And the Winner is…..  Here is the equation you should use for predictions: y = 1(2) x