Presentation on theme: "Chapter 3 Section 3.1 Examining Relationships. Continue to ask the preliminary questions familiar from Chapter 1 and 2 What individuals do the data describe?"— Presentation transcript:
Continue to ask the preliminary questions familiar from Chapter 1 and 2 What individuals do the data describe? What are the variables? How are the variables measured? Are all the variables quantitative or is at least one a categorical variable? Do you want to explore the nature of the relationship or do you think some of the variables explain or cause the changes in others?
bivariate involving two variables, especially, when attempting to show a correlation between two variables, the analysis is said to be bivariate
When working with bivariate data, each variable plays a different role. One variable is the explanatory or predictor variable, while the other is the response variable.
Bivariate data is graphed on a scatterplot with an x-axis (horizontal) and y-axis (vertical). The explanatory variable is graphed on the horizontal, and the response variable is graphed on the vertical. A scatterplot is a picture of the association between two variables.
Tips for drawing scatterplots Scale the horizontal and vertical axes. The intervals must be uniform. If the scale does not begin at zero use the // symbol to indicate a break Label both axes If given a grid, use a scale so that your plot utilizes the whole grid. Don’t compress the plot into one corner of the grid.
To analyze a scatterplot, describe the data in terms of: Direction (positive or negative) Form (linear, clustered, curve) Scatter or strength ( recognize positive or negative association and linear patterns) Outlier (deviation from the overall pattern)
Lesson 3.2 Correlation Correlation is given by the following equation: Correlation measures the direction and strength of the linear relationship between two quantitative variables. It is the average of the products of the standardized values.
The correlation computed from the sample data measures the direction and strength of the linear relationship between two quantitative variables. The symbol for the sample correlation coefficient is r. The range of the correlation is from -1 to +1. When r is close to +1, there is a strong positive linear relationship between the variables. When r is close to -1, there is a strong negative relationship between the variables. When there is no linear relationship or only a weak relationship, the value of r will be close to 0. The correlation is not resistant. It is strongly affected by outliers.
If women always married men who were two years older than themselves, what would be the correlation between the ages of husband and wife?
The gas mileage of an automobile first increases and then decreases as the speed increases. This relationship is very regular as shown by the following data on speed (miles per hour) and the mileage (miles per gallon): Speed: 20 30 40 50 60 MPG: 24 28 30 28 24 Make a scatter plot; calculate r.
LEAST-SQUARES REGRESSION Given a scatter plot, one must be able to draw the line of best fit. Best fit means that the sum of the squares of the vertical distances from each point to the line is minimized.
When the scatterplot appears linear, the line of best fit is the Least-Squares Regression Line (LSRL).
Equation of the Least-Squares Regression Line (LSRL) is read “y-hat” and means the predicted value of y. a is the y-intercept. b is the slope. is on the LSRL.
Equation for the slope of the LSRL: r is the correlation coefficient. s x if the standard deviation of x. s y is the standard deviation of y.
Equation for the y-intercept of the LSRL : a is the y-intercept. is the mean of the y-values. is the mean of the x-values.
R 2 The Coefficient of Determination It is, also, known as the coefficient of variation. The coefficient of determination, r 2, is the fraction of the variation in the values of y that is explained by the least-squares regression of y on x.
When you report r, give r 2 as a measure of how successful the regression was in explaining the response. When you see r, square it to get a better feel for the strength of the response. Example: r =.7, r 2 =.49. r 2 =.49 means that 49% of the variation in y is explained by the least squares regression of y on x.
The correlation between math and verbal SAT scores for this class was.66. What percent of the variation in the verbal scores is explained by the math scores?
In a study of the effect of temperature on household heating bills, an investigator said, “Our research shows that about 70% of the variability in the heating units used by a particular house over the years can be explained by outside temperature.” Explain what the investigator meant by this statement. According to this study, what is the correlation between outside temperature and heating bills?
RESIDUALS Residual = The mean of the least squares residuals always equals zero. (taking into account round-off error) An effective tool for testing the goodness of fit of a regression line to a bivariate data set is the residual plot.
RESIDUAL PLOT The residual plot displays the scatterplot of the points If the residual plot shows a random dispersion with no apparent pattern, the LSRL fits the data. If the residual plot shows a curved pattern or fanned pattern, the LSRL is not a good summary for the data