AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables.

Slides:



Advertisements
Similar presentations
Chapter 3 Examining Relationships Lindsey Van Cleave AP Statistics September 24, 2006.
Advertisements

Chapter 3 Bivariate Data
Chapter 6: Exploring Data: Relationships Lesson Plan
Scatter Diagrams and Linear Correlation
Chapter 2: Looking at Data - Relationships /true-fact-the-lack-of-pirates-is-causing-global-warming/
Looking at Data-Relationships 2.1 –Scatter plots.
CHAPTER 3 Describing Relationships
Ch 2 and 9.1 Relationships Between 2 Variables
Basic Practice of Statistics - 3rd Edition
Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We.
Chapter 5 Regression. Chapter outline The least-squares regression line Facts about least-squares regression Residuals Influential observations Cautions.
ASSOCIATION: CONTINGENCY, CORRELATION, AND REGRESSION Chapter 3.
2.4: Cautions about Regression and Correlation. Cautions: Regression & Correlation Correlation measures only linear association. Extrapolation often produces.
Looking at data: relationships - Caution about correlation and regression - The question of causation IPS chapters 2.4 and 2.5 © 2006 W. H. Freeman and.
Chapter 6: Exploring Data: Relationships Chi-Kwong Li Displaying Relationships: Scatterplots Regression Lines Correlation Least-Squares Regression Interpreting.
Chapter 3: Examining relationships between Data
Chapter 6: Exploring Data: Relationships Lesson Plan Displaying Relationships: Scatterplots Making Predictions: Regression Line Correlation Least-Squares.
Ch 3 – Examining Relationships YMS – 3.1
Chapter 3 concepts/objectives Define and describe density curves Measure position using percentiles Measure position using z-scores Describe Normal distributions.
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
Chapter 3 Section 3.1 Examining Relationships. Continue to ask the preliminary questions familiar from Chapter 1 and 2 What individuals do the data describe?
Chapter 2 Looking at Data - Relationships. Relations Among Variables Response variable - Outcome measurement (or characteristic) of a study. Also called:
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
BPS - 3rd Ed. Chapter 51 Regression. BPS - 3rd Ed. Chapter 52 u Objective: To quantify the linear relationship between an explanatory variable (x) and.
Chapter 5 Regression BPS - 5th Ed. Chapter 51. Linear Regression  Objective: To quantify the linear relationship between an explanatory variable (x)
Chapters 8 & 9 Linear Regression & Regression Wisdom.
Verbal SAT vs Math SAT V: mean=596.3 st.dev=99.5 M: mean=612.2 st.dev=96.1 r = Write the equation of the LSRL Interpret the slope of this line Interpret.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.2 Least-Squares.
Examining Bivariate Data Unit 3 – Statistics. Some Vocabulary Response aka Dependent Variable –Measures an outcome of a study Explanatory aka Independent.
CHAPTER 5 Regression BPS - 5TH ED.CHAPTER 5 1. PREDICTION VIA REGRESSION LINE NUMBER OF NEW BIRDS AND PERCENT RETURNING BPS - 5TH ED.CHAPTER 5 2.
Chapter 5 Regression. u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We can then predict.
AP STATISTICS LESSON 4 – 2 ( DAY 1 ) Cautions About Correlation and Regression.
Chapter 3-Examining Relationships Scatterplots and Correlation Least-squares Regression.
Lecture 5 Chapter 4. Relationships: Regression Student version.
Chapter 2 Examining Relationships.  Response variable measures outcome of a study (dependent variable)  Explanatory variable explains or influences.
Business Statistics for Managerial Decision Making
^ y = a + bx Stats Chapter 5 - Least Squares Regression
CHAPTER 3 Describing Relationships
Describing Relationships
Notes Chapter 7 Bivariate Data. Relationships between two (or more) variables. The response variable measures an outcome of a study. The explanatory variable.
Stat 1510: Statistical Thinking and Concepts REGRESSION.
Response Variable: measures the outcome of a study (aka Dependent Variable) Explanatory Variable: helps explain or influences the change in the response.
CHAPTER 5: Regression ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
AP Statistics Review Day 1 Chapters 1-4. AP Exam Exploring Data accounts for 20%-30% of the material covered on the AP Exam. “Exploratory analysis of.
Describing Relationships. Least-Squares Regression  A method for finding a line that summarizes the relationship between two variables Only in a specific.
1. Analyzing patterns in scatterplots 2. Correlation and linearity 3. Least-squares regression line 4. Residual plots, outliers, and influential points.
CHAPTER 3 Describing Relationships
Chapter 4.2 Notes LSRL.
Statistics 101 Chapter 3 Section 3.
CHAPTER 3 Describing Relationships
Cautions About Correlation and Regression
LSRL Least Squares Regression Line
Cautions about Correlation and Regression
Chapter 2: Looking at Data — Relationships
CHAPTER 3 Describing Relationships
Unit 4 Vocabulary.
Least-Squares Regression
Looking at data: relationships - Caution about correlation and regression - The question of causation IPS chapters 2.4 and 2.5 © 2006 W. H. Freeman and.
Least-Squares Regression
CHAPTER 3 Describing Relationships
Warmup A study was done comparing the number of registered automatic weapons (in thousands) along with the murder rate (in murders per 100,000) for 8.
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Warm-up: Pg 197 #79-80 Get ready for homework questions
Chapters Important Concepts and Terms
Honors Statistics Review Chapters 7 & 8
Review of Chapter 3 Examining Relationships
Correlation/regression using averages
Presentation transcript:

AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables

Basic Terms Response Variable: Measures an outcome of a study. Response Variable: Measures an outcome of a study. Explanatory Variable: Helps explain or influences changes in a response variable. Explanatory Variable: Helps explain or influences changes in a response variable. Scatterplot: Shows the relationship between two quantitative variables measured on the same individuals (one variable on each axis). Scatterplot: Shows the relationship between two quantitative variables measured on the same individuals (one variable on each axis). We are examining relationships and associations. DO NOT ASSUME that the explanatory variable causes a change in the response variable. We are examining relationships and associations. DO NOT ASSUME that the explanatory variable causes a change in the response variable.

Interpreting a Scatterplot Just like with univariate data, we are looking for an overall pattern and for deviation from that pattern. Just like with univariate data, we are looking for an overall pattern and for deviation from that pattern. Overall pattern Overall pattern –Direction: Negative or positive association –Form: curved or linear? Are there clusters? –Strength: How closely do the points follow a clear form? Deviations Deviations –Outliers: Individual value that falls outside of the overall pattern.

Correlation Correlation (r) measures the direction and strength of the linear relationship between two quantitative variables. Correlation (r) measures the direction and strength of the linear relationship between two quantitative variables. –Does not distinguish between explanatory and response variables (i.e. r would stay the same if you switched the x and y axes) –r has no units of measurement (correlation will not change if you change the units for either of the two variables)

Correlation (+) r indicates a positive association (+) r indicates a positive association –As one variable increases, so does the other. (-) r indicates a negative association (-) r indicates a negative association –As one variable increases, the other decreases. r is always between -1 and 1. r is always between -1 and 1. –If r is close to zero, then the linear relationship is weak. –If r is close to 1 or -1, then the linear relationship is strong.

Correlation

Regression line A line that describes how a response variable, y, changes as an explanatory variable, x, changes. It is often used to predict y given x. A line that describes how a response variable, y, changes as an explanatory variable, x, changes. It is often used to predict y given x. –y = a + bx b  slope: the amount by which y changes on average when x changes one unit. b  slope: the amount by which y changes on average when x changes one unit. a  y-intercept a  y-intercept

Making predictions with the regression line Interpolation Interpolation –Estimating predicted values between known values. (Good ) (Good ) Extrapolation Extrapolation –Predicting values outside the range of values used to make the regression line. (Bad  ) (Bad  )

Least-Squares Regression Line Line that makes the sum of the squared vertical distances between the data points and the line as small as possible. Line that makes the sum of the squared vertical distances between the data points and the line as small as possible. –ŷ = a + bx(ŷ  y-hat) Slope: b = r(s y /s x ) Slope: b = r(s y /s x ) Passes through the point Passes through the point

Example An SRS of 50 families has provided the following statistics An SRS of 50 families has provided the following statistics –# of children in the family Mean: 2.1, std dev: 1.4 Mean: 2.1, std dev: 1.4 –Annual Gross Income Mean: $34,250, std dev: $10,540 Mean: $34,250, std dev: $10,540 –r =.75 Write the equation for the least squares regression line that can be used to predict gross income based on # of children. Write the equation for the least squares regression line that can be used to predict gross income based on # of children. –Be sure to define your variables.

Residuals Residual: The difference between an observed value of the response variable and the value predicted by the regression line. Residual: The difference between an observed value of the response variable and the value predicted by the regression line. –Residual = observed y – predicted y = y – ŷ = y – ŷ Standard deviation of the residuals:

How well does the line fit the data? To answer this question, you must look at two things. To answer this question, you must look at two things. 1. Residual plot: scatterplot of the regression residuals plotted against (usually) the explanatory variable. 1. Residual plot: scatterplot of the regression residuals plotted against (usually) the explanatory variable. –If the regression line represents the pattern of data well, then… The residual plot will show no pattern. The residual plot will show no pattern. The residuals will be relatively small. The residuals will be relatively small.

How well does the line fit the data? 2. Coefficient of Determination: r 2 2. Coefficient of Determination: r 2 –The fraction (%) of the variation in the values of y that is explained by the least squares regression line of y on x. Template: Template: – r 2 % of the variation in (y-variable) is explained by the least squares regression line with (x-variable).

Other Considerations Outlier: Observation that lies outside the overall pattern (may or may not have a large residual). Outlier: Observation that lies outside the overall pattern (may or may not have a large residual). Influential Point: Observation which, if removed, would greatly change the statistical calculation. Influential Point: Observation which, if removed, would greatly change the statistical calculation. Lurking variable: An additional variable that may influence the relationship between the explanatory and response variables. Lurking variable: An additional variable that may influence the relationship between the explanatory and response variables.

Correlation v. Causation The goal of a study or experiment is often to establish causation…a direct cause and effect link. The goal of a study or experiment is often to establish causation…a direct cause and effect link. –Lurking variables make establishing causation difficult. Common response: Observed association between two variables, x and y, is explained by a lurking variable, z. Both x and y change in response to changes in z. Common response: Observed association between two variables, x and y, is explained by a lurking variable, z. Both x and y change in response to changes in z.

Correlation v. Causation Confounding: Occurs when the effects of two or more variables on a response variable cannot be distinguished from each other, (often occurs in an observational study). Confounding: Occurs when the effects of two or more variables on a response variable cannot be distinguished from each other, (often occurs in an observational study).

Establishing Causation w/o an Experiment 1. The association is strong. 1. The association is strong. 2. The association is consistent. 2. The association is consistent. 3. Larger values of the explanatory variable are associated with stronger responses. 3. Larger values of the explanatory variable are associated with stronger responses. 4. The alleged cause precedes the effect in time. 4. The alleged cause precedes the effect in time. 5. The alleged cause is plausible. 5. The alleged cause is plausible.

Non-linear relationships If data follows a non-linear form, we can sometimes transform the data to become linear. By doing so we can then perform the same analyses that we do for linear data. (regression line, correlation, r 2, residual plot). If data follows a non-linear form, we can sometimes transform the data to become linear. By doing so we can then perform the same analyses that we do for linear data. (regression line, correlation, r 2, residual plot). What are the most common non-linear models for bivariate data? What are the most common non-linear models for bivariate data?

Transforming non-linear data. Exponential model Exponential model –y = ab x –For each unit increase in x, y is multiplied by constant, b. To transform to linearity, plot log y against x on the coordinate plane. Then perform a linear regression. To transform to linearity, plot log y against x on the coordinate plane. Then perform a linear regression. –log y = a + bx OR ln y = a + bx

Transforming non-linear data Power Model Power Model –y = ax b –Often used when trying to use a one- dimensional variable (e.g. length), to predict a multi-dimensional variable (e.g. area, volume, weight) To transform to linearity, plot log y against log x on the coordinate plane. Then perform a linear regression. To transform to linearity, plot log y against log x on the coordinate plane. Then perform a linear regression. –log y = a + b(log x) OR ln y = a + b(ln x)

Analyzing the relationship between categorical variables A two-way table is used to compare categorical variables. A two-way table is used to compare categorical variables. Marginal distribution: Analyzing the totals for one of the variables by itself. Marginal distribution: Analyzing the totals for one of the variables by itself. Conditional distribution: The distribution of the response variable for each value of the explanatory variable. Conditional distribution: The distribution of the response variable for each value of the explanatory variable.