 # AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables.

## Presentation on theme: "AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables."— Presentation transcript:

AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables

Basic Terms Response Variable: Measures an outcome of a study. Response Variable: Measures an outcome of a study. Explanatory Variable: Helps explain or influences changes in a response variable. Explanatory Variable: Helps explain or influences changes in a response variable. Scatterplot: Shows the relationship between two quantitative variables measured on the same individuals (one variable on each axis). Scatterplot: Shows the relationship between two quantitative variables measured on the same individuals (one variable on each axis). We are examining relationships and associations. DO NOT ASSUME that the explanatory variable causes a change in the response variable. We are examining relationships and associations. DO NOT ASSUME that the explanatory variable causes a change in the response variable.

Interpreting a Scatterplot Just like with univariate data, we are looking for an overall pattern and for deviation from that pattern. Just like with univariate data, we are looking for an overall pattern and for deviation from that pattern. Overall pattern Overall pattern –Direction: Negative or positive association –Form: curved or linear? Are there clusters? –Strength: How closely do the points follow a clear form? Deviations Deviations –Outliers: Individual value that falls outside of the overall pattern.

Correlation Correlation (r) measures the direction and strength of the linear relationship between two quantitative variables. Correlation (r) measures the direction and strength of the linear relationship between two quantitative variables. –Does not distinguish between explanatory and response variables (i.e. r would stay the same if you switched the x and y axes) –r has no units of measurement (correlation will not change if you change the units for either of the two variables)

Correlation (+) r indicates a positive association (+) r indicates a positive association –As one variable increases, so does the other. (-) r indicates a negative association (-) r indicates a negative association –As one variable increases, the other decreases. r is always between -1 and 1. r is always between -1 and 1. –If r is close to zero, then the linear relationship is weak. –If r is close to 1 or -1, then the linear relationship is strong.

Correlation

Regression line A line that describes how a response variable, y, changes as an explanatory variable, x, changes. It is often used to predict y given x. A line that describes how a response variable, y, changes as an explanatory variable, x, changes. It is often used to predict y given x. –y = a + bx b  slope: the amount by which y changes on average when x changes one unit. b  slope: the amount by which y changes on average when x changes one unit. a  y-intercept a  y-intercept

Making predictions with the regression line Interpolation Interpolation –Estimating predicted values between known values. (Good ) (Good ) Extrapolation Extrapolation –Predicting values outside the range of values used to make the regression line. (Bad  ) (Bad  )

Least-Squares Regression Line Line that makes the sum of the squared vertical distances between the data points and the line as small as possible. Line that makes the sum of the squared vertical distances between the data points and the line as small as possible. –ŷ = a + bx(ŷ  y-hat) Slope: b = r(s y /s x ) Slope: b = r(s y /s x ) Passes through the point Passes through the point

Example An SRS of 50 families has provided the following statistics An SRS of 50 families has provided the following statistics –# of children in the family Mean: 2.1, std dev: 1.4 Mean: 2.1, std dev: 1.4 –Annual Gross Income Mean: \$34,250, std dev: \$10,540 Mean: \$34,250, std dev: \$10,540 –r =.75 Write the equation for the least squares regression line that can be used to predict gross income based on # of children. Write the equation for the least squares regression line that can be used to predict gross income based on # of children. –Be sure to define your variables.

Residuals Residual: The difference between an observed value of the response variable and the value predicted by the regression line. Residual: The difference between an observed value of the response variable and the value predicted by the regression line. –Residual = observed y – predicted y = y – ŷ = y – ŷ Standard deviation of the residuals:

How well does the line fit the data? To answer this question, you must look at two things. To answer this question, you must look at two things. 1. Residual plot: scatterplot of the regression residuals plotted against (usually) the explanatory variable. 1. Residual plot: scatterplot of the regression residuals plotted against (usually) the explanatory variable. –If the regression line represents the pattern of data well, then… The residual plot will show no pattern. The residual plot will show no pattern. The residuals will be relatively small. The residuals will be relatively small.

How well does the line fit the data? 2. Coefficient of Determination: r 2 2. Coefficient of Determination: r 2 –The fraction (%) of the variation in the values of y that is explained by the least squares regression line of y on x. Template: Template: – r 2 % of the variation in (y-variable) is explained by the least squares regression line with (x-variable).

Other Considerations Outlier: Observation that lies outside the overall pattern (may or may not have a large residual). Outlier: Observation that lies outside the overall pattern (may or may not have a large residual). Influential Point: Observation which, if removed, would greatly change the statistical calculation. Influential Point: Observation which, if removed, would greatly change the statistical calculation. Lurking variable: An additional variable that may influence the relationship between the explanatory and response variables. Lurking variable: An additional variable that may influence the relationship between the explanatory and response variables.

Correlation v. Causation The goal of a study or experiment is often to establish causation…a direct cause and effect link. The goal of a study or experiment is often to establish causation…a direct cause and effect link. –Lurking variables make establishing causation difficult. Common response: Observed association between two variables, x and y, is explained by a lurking variable, z. Both x and y change in response to changes in z. Common response: Observed association between two variables, x and y, is explained by a lurking variable, z. Both x and y change in response to changes in z.

Correlation v. Causation Confounding: Occurs when the effects of two or more variables on a response variable cannot be distinguished from each other, (often occurs in an observational study). Confounding: Occurs when the effects of two or more variables on a response variable cannot be distinguished from each other, (often occurs in an observational study).

Establishing Causation w/o an Experiment 1. The association is strong. 1. The association is strong. 2. The association is consistent. 2. The association is consistent. 3. Larger values of the explanatory variable are associated with stronger responses. 3. Larger values of the explanatory variable are associated with stronger responses. 4. The alleged cause precedes the effect in time. 4. The alleged cause precedes the effect in time. 5. The alleged cause is plausible. 5. The alleged cause is plausible.

Non-linear relationships If data follows a non-linear form, we can sometimes transform the data to become linear. By doing so we can then perform the same analyses that we do for linear data. (regression line, correlation, r 2, residual plot). If data follows a non-linear form, we can sometimes transform the data to become linear. By doing so we can then perform the same analyses that we do for linear data. (regression line, correlation, r 2, residual plot). What are the most common non-linear models for bivariate data? What are the most common non-linear models for bivariate data?

Transforming non-linear data. Exponential model Exponential model –y = ab x –For each unit increase in x, y is multiplied by constant, b. To transform to linearity, plot log y against x on the coordinate plane. Then perform a linear regression. To transform to linearity, plot log y against x on the coordinate plane. Then perform a linear regression. –log y = a + bx OR ln y = a + bx

Transforming non-linear data Power Model Power Model –y = ax b –Often used when trying to use a one- dimensional variable (e.g. length), to predict a multi-dimensional variable (e.g. area, volume, weight) To transform to linearity, plot log y against log x on the coordinate plane. Then perform a linear regression. To transform to linearity, plot log y against log x on the coordinate plane. Then perform a linear regression. –log y = a + b(log x) OR ln y = a + b(ln x)

Analyzing the relationship between categorical variables A two-way table is used to compare categorical variables. A two-way table is used to compare categorical variables. Marginal distribution: Analyzing the totals for one of the variables by itself. Marginal distribution: Analyzing the totals for one of the variables by itself. Conditional distribution: The distribution of the response variable for each value of the explanatory variable. Conditional distribution: The distribution of the response variable for each value of the explanatory variable.

Download ppt "AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables."

Similar presentations