Presentation is loading. Please wait.

Presentation is loading. Please wait.

Scatter Diagrams and Linear Correlation

Similar presentations

Presentation on theme: "Scatter Diagrams and Linear Correlation"— Presentation transcript:

1 Scatter Diagrams and Linear Correlation
Chapter 1-3 single variable data Examples or two variables: age of person vs. time to master cell phone task , grade point average vs. time studying, grade point average vs. time playing video games, amount of smoking vs. rate of lung cancer Scatter diagram: (x,y) data plotted as individual points x – explanatory variable (independent) y – response variable (dependent) Evaluate scatterplot data y vs x values – shows relationship between 2 quantitative variables measured on the same individual


3 Scatter Diagrams and Linear Correlation
Look at overall pattern Any striking deviation (outliers)? Describe by a) form (linear or curved) b) direction - positively associated +slope negatively associated – slope c) strength - how closely do points follow form Examples: age of person vs. time to master cell phone task , grade point average vs. time studying, grade point average vs. time playing video games, amount of smoking vs. rate of lung cancer

4 Degrees of correlation

5 Scatter Diagrams and Linear Correlation
Tips for drawing scatterplot Scale axis: intervals for each axis must be the same; scale can be different for each axis Label both axis Adopt a scale that uses entire grid (do not compress plot into 1 corner of grid

6 Scatter Diagrams and Linear Correlation
Correlation coefficient (r) Assesses strength and direction of linear relationship between x and y. Unit less -1≤ r ≤ r = -1 or 1 perfect correlation (all points exactly on the line) Closer to 1or -1; better line describes relationship; better fit of data r > 0 positive association at x, y  r < 0 negative association a x , y  x and y are interchangeable in calculating r r does not change if either (or both) variables have unit changes (inches to cm, or F to C)

7 Linear and non-linear correlations

8 Scatter Diagrams and Linear Correlation
r = 1 Σ( x-x y-y_) n sx sy Using TI-83 ex p.129 (number of police vs. muggings) Cautions : Association does not imply causation Lurking variables may play rate r only good for linear models Correlation between averages higher than between individual point.

9 Scatter Diagrams and Linear Correlation
Facts No distinction between x and y variable. The value of r is unaffected by switching x and y Both x and y must be quantitative Only good for linear relationships Not resistant to outliers Correlation or r is not a complete description of 2-variable data, the x and y standard deviations and means should be included HW: p131 2,4,6,8 a,b,c, 10 a,b,c, 12 a,b,c For “c” use calculator to compute r

10 4.2 Least Squares Regression
Method for finding a line (best fit) that summarizes the relationship between 2 variables a x (explanatory) and y (response) Use the line to predict value of y for a given x Must have specific response variable y and explanatory variable x (cannot switch like r)

11 4.2 Least Squares Regression
Least Squares Regression Line (LSRL) Minimizes square of error (y-values) Error = observed –predicted value Σ(y-ŷ)2 (y actual value, ŷ is predicted value) (ŷ is called y hat) Line of y on x that makes the sum of the squares of data points to fitted line as small as possible

12 4.2 Least Squares Regression
LSRL Equation ŷ = a + bx ŷ predicted value of y Slope b = r(sy/sx) y – intercept a = y – bx x and y are means for all x and y data, respectively and are on the LSLR (x, y) sy sx are std. deviations of x,y data r correlation

13 4.2 Least Squares Regression
TI-83 – enter data into L1, L2 (x,y) Use STAT CALC , select #8:LinReg(a+bx) to get the best fit required Slope: important for interpretation of data Rate of change of y for each increase of x Intercept – may not be practically important for problems.

14 4.2 Least Squares Regression
Plot LSLR: using formula ŷ = a + bx find 2 values on the line. (x1, ŷ1) and (x2, ŷ2) make sure x1 and x2 are near opposite ends of the data Influential observations and outliers Influential – extreme in the x-direction if we remove an influential point it will affect the LSLR significantly Outliers – extreme in the y-direction does not significantly change the LSLR

15 Coefficient of Determination
r2 – coefficient of determination r – describes the strength and direction of a straight line relationship r2 - fraction of variation in values of y that is explained by LSRL of y on x r = 1, r2 = 1 perfect correlation 100% of the variation explained by LSRL r = 0.7, r2 = about 49% of y is explained by LSLR

16 Residuals Residuals – difference between observed value and predicted value Residual = y –ŷ Mean of least square residuals = 0 Residual plots – scatterplot of regression residuals against explanatory variable (x) Useful in accessing fit of regression line i.e. do we have a straight line? Linear –uniform scatter Curved indicates relationship not linear Increasing/ decreasing indicates predicting of y will be less accurate for larger x


Download ppt "Scatter Diagrams and Linear Correlation"

Similar presentations

Ads by Google