Sections 3.1 - 3.3 Review.

Slides:



Advertisements
Similar presentations
Least Squares Regression
Advertisements

Haroon Alam, Mitchell Sanders, Chuck McAllister- Ashley, and Arjun Patel.
Lesson Least-Squares Regression. Knowledge Objectives Explain what is meant by a regression line. Explain what is meant by extrapolation. Explain.
Least-Squares Regression Section 3.3. Why Create a Model? There are two reasons to create a mathematical model for a set of bivariate data. To predict.
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
Regression Regression relationship = trend + scatter
Examining Bivariate Data Unit 3 – Statistics. Some Vocabulary Response aka Dependent Variable –Measures an outcome of a study Explanatory aka Independent.
AP Statistics HW: p. 165 #42, 44, 45 Obj: to understand the meaning of r 2 and to use residual plots Do Now: On your calculator select: 2 ND ; 0; DIAGNOSTIC.
Linear Regression Day 1 – (pg )
^ y = a + bx Stats Chapter 5 - Least Squares Regression
Chapter 8 Linear Regression. Fat Versus Protein: An Example 30 items on the Burger King menu:
Response Variable: measures the outcome of a study (aka Dependent Variable) Explanatory Variable: helps explain or influences the change in the response.
Chapters 8 Linear Regression. Correlation and Regression Correlation = linear relationship between two variables. Summarize relationship with line. Called.
Describing Bivariate Relationships. Bivariate Relationships When exploring/describing a bivariate (x,y) relationship: Determine the Explanatory and Response.
1. Analyzing patterns in scatterplots 2. Correlation and linearity 3. Least-squares regression line 4. Residual plots, outliers, and influential points.
Warm-up Get a sheet of computer paper/construction paper from the front of the room, and create your very own paper airplane. Try to create planes with.
Sections 3.3 & 3.4 Quiz tomorrow.
Topics
Inference for Regression
Describing Relationships
Bring project data to enter into Fathom
Unit 4 LSRL.
Chapter 4.2 Notes LSRL.
Chapter 3: Describing Relationships
Regression and Correlation
Warm-up: This table shows a person’s reported income and years of education for 10 participants. The correlation is .79. State the meaning of this correlation.
SCATTERPLOTS, ASSOCIATION AND RELATIONSHIPS
The following data represents the amount of Profit (in thousands of $) made by a trucking company dependent on gas prices. Gas $
Suppose the maximum number of hours of study among students in your sample is 6. If you used the equation to predict the test score of a student who studied.
residual = observed y – predicted y residual = y - ŷ
Chapter 3: Describing Relationships
^ y = a + bx Stats Chapter 5 - Least Squares Regression
Chapter 3: Describing Relationships
Chapter 3 Describing Relationships Section 3.2
Least Squares Regression
Chapter 3: Describing Relationships
Review of Chapter 3 Examining Relationships
Objectives (IPS Chapter 2.3)
Warm-up: This table shows a person’s reported income and years of education for 10 participants. The correlation is .79. State the meaning of this correlation.
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Least-Squares Regression
Least-Squares Regression
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Warmup A study was done comparing the number of registered automatic weapons (in thousands) along with the murder rate (in murders per 100,000) for 8.
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Section 3.2: Least Squares Regressions
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Algebra Review The equation of a straight line y = mx + b
A medical researcher wishes to determine how the dosage (in mg) of a drug affects the heart rate of the patient. Find the correlation coefficient & interpret.
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapters Important Concepts and Terms
Chapter 3: Describing Relationships
9/27/ A Least-Squares Regression.
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Review of Chapter 3 Examining Relationships
Chapter 3: Describing Relationships
Presentation transcript:

Sections 3.1 - 3.3 Review

Relationship between two variables Bivariate data

What three shapes are possible for a bivariate data relationship?

What three shapes are possible for a bivariate data relationship? Linear Curved No shape

Shape : Linear

Shape : Linear

Shape: Curved

Shape: Curved

Shape: Curved

Shape: None

Shape: None

The line on the plot is the ____________.

The line on the plot is the least squares regression line, LSRL, or regression line.

Two main reasons to fit a line to a set of data:

Two main reasons to fit a line to a set of data: 1) to find a summary or model that describes relationship between two variables 2) to use the line to predict value of y when you know value of x

To make a reasonable prediction, what needs to be true about: A) shape of data? B) strength of relationship?

To make a reasonable prediction, what needs to be true about: A) shape of data? linear B) strength of relationship?

To make a reasonable prediction, what needs to be true about: A) shape of data? linear B) strength of relationship? Stronger the better

Usually, the independent variable, x, is on the horizontal axis. Dependent variable, y, is on vertical axis,

Statistics, not Algebra! The variable on the x-axis is called the __________ or __________ variable. The variable on the y-axis is called the __________ or __________ variable.

Statistics, not Algebra! The variable on the x-axis is called the predictor or explanatory variable. The variable on the y-axis is called the __________ or __________ variable.

Statistics, not Algebra! The variable on the x-axis is called the predictor or explanatory variable. The variable on the y-axis is called the predicted or response variable.

Which is correct? Year vs Minimum Wage or Minimum Wage vs Year?

Which is correct? Year vs Minimum Wage or Minimum Wage vs Year?

Two types of predictions:

Two types of predictions: 1) interpolation – making prediction when value of x falls within range of the data

Two types of predictions: 1) interpolation – making prediction when value of x falls within range of the data 2) extrapolation – making prediction when value of x falls outside range of actual data

Two types of predictions: 1) interpolation – making prediction when value of x falls within range of the data 2) extrapolation – making prediction when value of x falls outside range of actual data Interpolation fairly safe Extrapolation risky especially the further x-value is outside range of actual data

Prediction error: difference between the actual value of y and value of y predicted from a regression line Usually unknown except for the points used to construct the regression line, whose prediction errors are called residuals

Residual = observed value of y – predicted value of y Residual = y - y

Residual is the signed vertical distance from an observed data point to the regression line. Positive if point above the line Negative if point below the line 0 if point on the line

Least squares regression line, also called least squares line or regression line, is the line for which the sum of the squared errors or SSE is as small as possible. SSE = (residuals)2

Find the least squares line for this passenger jets data.

Put explanatory values in L1 and response values in L2

Put explanatory values in L1 and response values in L2 STAT CALC 8. LinReg (a + bx) LinReg (a + bx) L1, L2, Y1 (Y1 needed if want to show LSRL on graph)

Put explanatory values in L1 and response values in L2 STAT CALC 8. LinReg (a + bx) LinReg (a + bx) L1, L2, Y1 To get Y1, go to VARS, Y-VARS, 1: Function, ENTER, 1: Y1, ENTER

LinReg y = a + bx a = 366.6666667 b = 16 r2 = .9795918367 r = .9897433186 So, what is equation for LSRL?

LinReg y = a + bx a = 366.6666667 b = 16 r2 = .9795918367 (Turn Diagnostic On) r = .9897433186 So, what is equation for LSRL?

y = 367 + 16x Is this it? LinReg y = a + bx a = 366.6666667 b = 16 So, what is equation for LSRL? y = 367 + 16x Is this it?

Is this it? No! Need equation in context! LinReg y = a + bx a = 366.6666667 b = 16 r2 = .9795918367 r = .9897433186 So, what is equation for LSRL? y = 367 + 16x Is this it? No! Need equation in context!

Is this it? No! Need equation in context! Cost = 367 + 16(seats) So, what is equation for LSRL? y = 367 + 16x Is this it? No! Need equation in context! Cost = 367 + 16(seats)

Cost = 367 + 16(seats) Interpret the slope and y-intercept.

Cost = 367 + 16(seats) Interpret the slope and y-intercept. Slope: For each additional seat, the cost increases by about $16 per hour

Cost = 367 + 16(seats) Interpret the slope and y-intercept. Slope: For each additional seat, the cost increases by about $16 per hour y-intercept: If a passenger jet had 0 seats, it would cost $367 per hour to operate.

Correlation What do you recall about correlation?

Correlation Measures strength and direction of a linear relationship between two variables Numerical value between -1 and 1, inclusive How tightly packed points of scatterplot are about the LSRL Correlation and slope always have the same sign

Sketch ellipse around points in scatterplot. If ellipse has points scattered throughout and points appear to follow a linear trend, then correlation is a reasonable measure of strength of the relationship.

No shape

Does a higher correlation mean the relationship is more like a line, less like a line, or neither?

Does a higher correlation mean the relationship is more like a line, less like a line, or neither? Neither if misused

r = 0.91 for this data but a linear model is not appropriate as growth is exponential.

Here r = 0.48. In spite of the scatter, a linear model is appropriate because there is no curvature in the pattern of data points.

Moral of this story: Always plot your data before deciding a linear model is appropriate for your data.

Moral of this story: Always plot your data before deciding a linear model is appropriate for your data. Correlation is only meaningful if a linear model is appropriate for your data.

When the correlation is small in absolute value, what does it mean for the prediction error?

When the correlation is small in absolute value, the error in prediction will be larger than if the correlation were larger.

When the correlation is small in absolute value, the error in prediction will be larger than if the correlation were larger. A larger correlation (near 1 or -1) means the points are generally closer to the LSRL, and predictions using the line will be relatively close to the observed values.

True or false: A high correlation means that a change in the explanatory variable causes a change in the response variable.

True or false: A high correlation means that a change in the explanatory variable causes a change in the response variable. False. Correlation does not imply causation as there may be a lurking variable involved.

r2 is the coefficient of determination. This tells us the proportion of total variation in the y-variable that is “explained” by the variation in the x-variable.

Enter the information about Fat and Calories for 7 kinds of pizza in calculator. Find LSRL equation, r, and r2.

Calories = 112 + 14.9(fat) r = 0.908 r2 = 0.824 Interpret slope, intercept, and r2.

Calories = 112 + 14.9(fat) Slope: For each 1 gram increase in fat, the calories increase by about 14.9

Calories = 112 + 14.9(fat) Slope: For each 1 gram increase in fat, the calories increase by about 14.9 Intercept: If there were 0 grams of fat in a pizza there would be 112 calories.

Calories = 112 + 14.9(fat) Slope: For each 1 gram increase in fat, the calories increase by about 14.9 Intercept: If there were 0 grams of fat in a pizza there would be 112 calories. r2 = 0.824: About 82% of the variation in calories among these brands of pizza can be attributed to fat content.

Both plots have a correlation of 0. 26 Both plots have a correlation of 0.26. For each plot is fitting a regression line appropriate, why or why not?

Left plot has strong curvature so LSRL not appropriate Left plot has strong curvature so LSRL not appropriate. Right plot is linear as cloud of points is roughly elliptical.

Residual plots may help you uncover more detailed patterns. A residual plot that shows nearly random scatter, with no obvious trends is the ideal shape for a residual plot. This indicates that a line is a reasonable model for the trend in the original data.

This model looks nearly linear, but is a line a suitable model?

Residual plot dramatically reveals the trend is not as linear as first thought.

Curvature in residual plot mimics curvature in original scatterplot, which is harder to see. So line is not a good model for these data.

Create residual plot for this data.

Compute LSRL

To get RESID, select 2nd, LIST, 7: RESID

No obvious trends, so line is reasonable model for this data.

Questions?