Presentation on theme: "Scatter Diagrams and Linear Correlation"— Presentation transcript:
1 Scatter Diagrams and Linear Correlation Chapter 1-3 single variable dataExamples or two variables: age of person vs. time to master cell phone task , grade point average vs. time studying, grade point average vs. time playing video games, amount of smoking vs. rate of lung cancerScatter diagram: (x,y) data plotted as individual pointsx – explanatory variable (independent)y – response variable (dependent)Evaluate scatterplot datay vs x values – shows relationship between 2 quantitative variables measured on the same individual
3 Scatter Diagrams and Linear Correlation Look at overall patternAny striking deviation (outliers)?Describe by a) form (linear or curved) b) direction - positively associated +slope negatively associated – slope c) strength - how closely do points follow formExamples: age of person vs. time to master cell phone task , grade point average vs. time studying, grade point average vs. time playing video games, amount of smoking vs. rate of lung cancer
5 Scatter Diagrams and Linear Correlation Tips for drawing scatterplotScale axis: intervals for each axis must be the same; scale can be different for each axisLabel both axisAdopt a scale that uses entire grid (do not compress plot into 1 corner of grid
6 Scatter Diagrams and Linear Correlation Correlation coefficient (r)Assesses strength and direction of linear relationship between x and y.Unit less-1≤ r ≤ r = -1 or 1 perfect correlation (all points exactly on the line)Closer to 1or -1; better line describes relationship; better fit of datar > 0 positive association at x, y r < 0 negative association a x , y x and y are interchangeable in calculating rr does not change if either (or both) variables have unit changes (inches to cm, or F to C)
8 Scatter Diagrams and Linear Correlation r = 1 Σ( x-x y-y_) n sx syUsing TI-83 ex p.129 (number of police vs. muggings)Cautions : Association does not imply causationLurking variables may play rater only good for linear modelsCorrelation between averages higher than between individual point.
9 Scatter Diagrams and Linear Correlation FactsNo distinction between x and y variable. The value of r is unaffected by switching x and yBoth x and y must be quantitativeOnly good for linear relationshipsNot resistant to outliersCorrelation or r is not a complete description of 2-variable data, the x and y standard deviations and means should be includedHW: p131 2,4,6,8 a,b,c, 10 a,b,c, 12 a,b,c For “c” use calculator to compute r
10 4.2 Least Squares Regression Method for finding a line (best fit) that summarizes the relationship between 2 variables a x (explanatory) and y (response)Use the line to predict value of y for a given xMust have specific response variable y and explanatory variable x (cannot switch like r)
11 4.2 Least Squares Regression Least Squares Regression Line (LSRL)Minimizes square of error (y-values)Error = observed –predicted value Σ(y-ŷ)2 (y actual value, ŷ is predicted value) (ŷ is called y hat)Line of y on x that makes the sum of the squares of data points to fitted line as small as possible
12 4.2 Least Squares Regression LSRL Equation ŷ = a + bxŷ predicted value of ySlope b = r(sy/sx)y – intercept a = y – bxx and y are means for all x and y data, respectively and are on the LSLR (x, y)sy sx are std. deviations of x,y datar correlation
13 4.2 Least Squares Regression TI-83 – enter data into L1, L2 (x,y)Use STAT CALC , select #8:LinReg(a+bx) to get the best fit requiredSlope: important for interpretation of dataRate of change of y for each increase of xIntercept – may not be practically important for problems.
14 4.2 Least Squares Regression Plot LSLR: using formula ŷ = a + bx find 2 values on the line.(x1, ŷ1) and (x2, ŷ2) make sure x1 and x2 are near opposite ends of the dataInfluential observations and outliersInfluential – extreme in the x-direction if we remove an influential point it will affect the LSLR significantlyOutliers – extreme in the y-direction does not significantly change the LSLR
15 Coefficient of Determination r2 – coefficient of determinationr – describes the strength and direction of a straight line relationshipr2 - fraction of variation in values of y that is explained by LSRL of y on xr = 1, r2 = 1 perfect correlation 100% of the variation explained by LSRLr = 0.7, r2 = about 49% of y is explained by LSLR
16 ResidualsResiduals – difference between observed value and predicted valueResidual = y –ŷMean of least square residuals = 0Residual plots – scatterplot of regression residuals against explanatory variable (x)Useful in accessing fit of regression line i.e. do we have a straight line?Linear –uniform scatterCurved indicates relationship not linearIncreasing/ decreasing indicates predicting of y will be less accurate for larger x