2 Scatter Plots and Correlation A scatter plot (or scatter diagram) is used to show the relationship between two variablesCorrelation analysis is used to measure strength of the association (linear relationship) between two variablesOnly concerned with strength of the relationshipNo causal effect is implied
3 Scatter Plot Examples Linear relationships Nonlinear relationships y y
6 Correlation Coefficient The sample correlation coefficient r is used to measure the strength of the linear relationship in the sample observations
7 Features of r Range between -1 and 1 The closer to -1, the stronger the negative linear relationshipThe closer to 1, the stronger the positive linear relationshipThe closer to 0, the weaker the linear relationship
8 Examples of Approximate r Values yyyxxxr = - 1r = - 0.6r = 0yyxxr = 0.3r = +1
9 Calculating the Correlation Coefficient Sample correlation coefficient:or the algebraic equivalent:where:r = Sample correlation coefficientn = Sample sizex = Value of the explanatory variabley = Value of the response variable
10 Calculation Example Tree Height Trunk Diameter y x xy y2 x2 35 8 280 12256449944124018127718972933619810893660137803600169211474511495202512151126122601144=321=73=3142=14111=713
11 Calculation Example (continued) r = 0.886 → relatively strong positive Tree Height,yr = → relatively strong positivelinear association between x and yTrunk Diameter, x
12 Introduction to Regression Analysis Regression analysis is used to:Predict the value of a response variable based on the value of at least one explanatory variableExplain the impact of changes in an explanatory variable on the response variableResponse variable: the variable we wish to explainExplanatory variable: the variable used to explain the response variable
13 Simple Linear Regression Model Only one explanatory variable, xRelationship between x and y is described by a linear functionChanges in y are assumed to be caused by changes in x
14 Types of Regression Models Positive Linear RelationshipRelationship NOT LinearNegative Linear RelationshipNo Relationship
15 RegressionFitting the data: finding the equation for the straight line that does the best job of reproducing the data.
16 RegressionResidual: Difference between measured and calculated Y-values – let’s call it ‘e’
17 RegressionThe sum of the residuals always equals 0.
18 Simple linear regression uses the least regression method Simple linear regression uses the least regression method. A least squares regression line is the line which produces the smallest value of the sum of the squares of the residuals
19 The Least Squares Equation The formula for b is:algebraic equivalent: