Linear Regression Day 1 – (pg )

Slides:



Advertisements
Similar presentations
Least-Squares Regression Section 3.3. Correlation measures the strength and direction of a linear relationship between two variables. How do we summarize.
Advertisements

Chapter 12 Inference for Linear Regression
Residuals.
Least Squares Regression
Chapter 3 Bivariate Data
Warm up Use calculator to find r,, a, b. Chapter 8 LSRL-Least Squares Regression Line.
Scatter Diagrams and Linear Correlation
AP Statistics Mrs Johnson
LSRL Least Squares Regression Line
Linear Regression and Correlation Analysis
CHAPTER 3 Describing Relationships
3.3 Least-Squares Regression.  Calculate the least squares regression line  Predict data using your LSRL  Determine and interpret the coefficient of.
Chapters 8 & 9 Linear Regression & Regression Wisdom.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.2 Least-Squares.
CHAPTER 5 Regression BPS - 5TH ED.CHAPTER 5 1. PREDICTION VIA REGRESSION LINE NUMBER OF NEW BIRDS AND PERCENT RETURNING BPS - 5TH ED.CHAPTER 5 2.
Chapter 3-Examining Relationships Scatterplots and Correlation Least-squares Regression.
^ y = a + bx Stats Chapter 5 - Least Squares Regression
LEAST-SQUARES REGRESSION 3.2 Least Squares Regression Line and Residuals.
CHAPTER 3 Describing Relationships
Least Squares Regression Lines Text: Chapter 3.3 Unit 4: Notes page 58.
Unit 4 Lesson 3 (5.3) Summarizing Bivariate Data 5.3: LSRL.
Chapter 8 Linear Regression. Fat Versus Protein: An Example 30 items on the Burger King menu:
Chapter 7 Linear Regression. Bivariate data x – variable: is the independent or explanatory variable y- variable: is the dependent or response variable.
Chapter 5 Lesson 5.2 Summarizing Bivariate Data 5.2: LSRL.
Chapters 8 Linear Regression. Correlation and Regression Correlation = linear relationship between two variables. Summarize relationship with line. Called.
Section 1.3 Scatter Plots and Correlation.  Graph a scatter plot and identify the data correlation.  Use a graphing calculator to find the correlation.
CHAPTER 5: Regression ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
Bell Ringer A random sample of records of sales of homes from Feb. 15 to Apr. 30, 1993, from the files maintained by the Albuquerque Board of Realtors.
Describing Relationships. Least-Squares Regression  A method for finding a line that summarizes the relationship between two variables Only in a specific.
1. Analyzing patterns in scatterplots 2. Correlation and linearity 3. Least-squares regression line 4. Residual plots, outliers, and influential points.
Chapter 3 LSRL. Bivariate data x – variable: is the independent or explanatory variable y- variable: is the dependent or response variable Use x to predict.
Chapter 5 LSRL. Bivariate data x – variable: is the independent or explanatory variable y- variable: is the dependent or response variable Use x to predict.
Warm-up Get a sheet of computer paper/construction paper from the front of the room, and create your very own paper airplane. Try to create planes with.
Inference for Regression
Unit 4 LSRL.
LSRL.
Chapter 4.2 Notes LSRL.
Least Squares Regression Line.
Sections Review.
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 5 LSRL.
LSRL Least Squares Regression Line
Chapter 4 Correlation.
Chapter 3.2 LSRL.
Least Squares Regression Line LSRL Chapter 7-continued
CHAPTER 3 Describing Relationships
^ y = a + bx Stats Chapter 5 - Least Squares Regression
Chapter 3: Describing Relationships
Chapter 3 Describing Relationships Section 3.2
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 5 LSRL.
Chapter 5 LSRL.
Chapter 5 LSRL.
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Section 3.2: Least Squares Regressions
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
9/27/ A Least-Squares Regression.
Homework: PG. 204 #30, 31 pg. 212 #35,36 30.) a. Reading scores are predicted to increase by for each one-point increase in IQ. For x=90: 45.98;
CHAPTER 3 Describing Relationships
Presentation transcript:

Linear Regression Day 1 – (pg 176-184) Chapter 7 Linear Regression Day 1 – (pg 176-184)

Correlation and Regression Correlation = linear relationship between two variables. Summarize relationship with line. Called Regression line. Explanatory variable (x) Response variable (y)

Regression line Explains how response variable (y) changes in relation to explanatory variable (x). Use line to predict value of y for given value of x. The predicted values are called Values on the regression line. The observed values are called y. Points in the scatterplot. Look at 2 examples

Regression line Residual - the difference between the observed value and its associated predicted value. To find the residuals, we always subtract the predicted value from the observed one:

Least squares regression (LSRL) Most commonly used regression line. Puts line where sum of the squared errors (residuals) as small as possible. Minimizes Based on statistics

Regression line equation where

Regression line equation b1 = slope of line. For every unit increase in x, changes by the amount of the slope. Very important for interpreting data. b0 = y-intercept of line. The value of when x = 0. Usually not important for interpreting data. Values of x are usually not close to 0.

Calculating the regression line. Degree Days vs. Gas Usage

Calculating the regression line. Degree Days vs. Gas Usage

Calculating the regression line. Degree Days vs. Gas Usage

Calculating the regression line. Degree Days vs. Gas Usage

Calculating the regression line. Don’t forget to write the equation. Where = predicted gas usage x = degree days ALWAYS IDENTIFY THE VARIABLES

Interpretations Slope Intercept For every one unit increase in degree days, the predicted gas usage increases by 0.19 Intercept When the degree days is 0, the predicted gas usage is 1.07

Prediction Use regression equation to predict y from x. Ex. Predicted gas consumption when degree days = 40? Ex. Predicted gas consumption when degree days = 20?

Prediction Use regression equation to predict y from x. Ex. Predicted gas consumption when degree days = 40? Ex. Predicted gas consumption when degree days = 20?

Prediction Use regression equation to predict y from x. Ex. Predicted gas consumption when degree days = 40? Ex. Predicted gas consumption when degree days = 20?

Plotting the regression line Find two points on line. Pick two x values Find predicted for each x value. Ex. x = 20, =4.87 and x = 40, = 8.67 Plot two points on graph. Make line through two points. Regression line .

Finding In Calculator Stat  Edit  Enter Data in L1 and L2 Stat  Calc  8: LinReg (y = a + bx) x 16 24 42 60 75 102 120 y 24 30 35 40 48 56 60 LSRL:

Example: A random sample of records of sales of homes from Feb. 15 to Apr. 30, 1993, from the files maintained by the Albuquerque Board of Realtors gives the price (in thousands of dollars) and size (in square feet) of 117 homes. A regression analysis gives us the model: What does the slope of this line say about housing prices and house size? What price would you predict to pay for a 3000 square foot home? A real estate agent shows a potential buyer a 1200 sq-ft home with an asking price that is $6000 less than one would expect to pay for a house of that size. What is the asking price?

Example Every square foot increase in size increases the average price $0.061x1000, or $61. $230,820 (230.82 thousand) $115,020

Homework – Read Chapter 7 Pg 184 – 196 Ch 7 Day 1 WS

Properties of regression line Regression line always goes through point r is connected to the value of b1. r has same sign as b1. (If slope is negative, correlation is negative & vice versa)

Properties of regression line The values of the y variable vary. Regression line tries to explain variation of y through its relationship with x. Not perfectly – points not exactly on line. Points close to line = regression explains variation of y well. Points far from line = regression does not explain variation of y well.

Properties of regression line How much variation can we explain with our regression? Answer: R2 Percent of variation in y that is explained by the linear relationship with x. Higher values of R2 mean regression line helps explain the variation in y variable.

Degree Days vs. Gas Usage R2 = r2 Ex. r = 0.9953, R2 = 0.9906 or 99.06% of the variation in gas usage can be explained by the linear relationship with number of degree days.

Example: Roller Coasters People who responded to a July 2004 Discovery Channel poll names the 10 best roller coasters in the United States. A table in Chapter 7, Exercise 33 shows the length of the initial drop (in feet) and the duration of the ride (in seconds). A regression to predict duration from drop has R2=12.4%. What are the variables and units in this regression? Write a sentence (in context) summarizing that the R2 says about this regression. What is the correlation between drop and duration?

Example: Roller Coasters a. y: Duration (in seconds) x: drop (in feet) b. Differences in height explain 12.4% of the variability in the duration of the ride… (or better)… “12.4% of the variation in y (the duration of the ride) can be explained by the least squares regression with x (length of the initial drop).” c. 0.352

Residuals Variation in y not measured by regression line. Formula: Residual for each data point. Mean of residuals = 0.

Example: Calculating Residuals Degree Days vs. Gas Usage Find the residual for the point (30,6.4)

Calculating Residuals Degree Days vs. Gas Usage Find the residual for the point (30,6.4)

Calculating Residuals Degree Days vs. Gas Usage Find the residual for the point (30,6.4)

Calculating Residuals Find the residual for the point (13,4.0)

Calculating Residuals Find the residual for the point (13,4.0)

Calculating Residuals Find the residual for the point (13,4.0)

Residual Plots Scatterplot Good Residual Plot Explanatory variable (x) on horizontal axis. Residuals (e) on vertical axis. Horizontal line at residual = 0. Good Residual Plot No pattern or shape No outliers

Interpreting Residual Plots The residual plot should show about the same amount of scatter throughout!! If it has a… Curved Pattern relationship is not linear. Increasing spread about line as x increases. Predictions of y for larger x will be less accurate. Decreasing spread about line as x increases. Predictions of y for smaller x will be less accurate.

Example: Children’s Ages and Heights The ages (in months) and heights (in inches) of seven children are given. x 16 24 42 60 75 102 120 y 24 30 35 40 48 56 60 Find the LSRL. Interpret the slope and correlation coefficient in the context of the problem.

Correlation coefficient: There is a strong, positive, linear association between the age and height of children. Slope: For an increase in age of one month, there is an approximate increase of .34 inches in heights of children.

Predict the height of a child who is 4.5 years old. The ages (in months) and heights (in inches) of seven children are given. x 16 24 42 60 75 102 120 y 24 30 35 40 48 56 60 Predict the height of a child who is 4.5 years old. Predict the height of someone who is 20 years old. Graph, find lsrl, also examine mean of x & y

Create the residual plot of the data in your calculator: Stat  Edit  Enter Data in L1 and L2 Stat  Calc  8: LinReg (y = a + bx) 2nd  y= (stat plot)  Plot 1 Type: Scatterplot Xlist = L1 Ylist = 2nd  Stat (List)  7: Residuals Enter Zoom  9 Is a LSRL appropriate here? Why?

Extrapolation The LSRL should not be used to predict y for values of x outside the data set. It is unknown whether the pattern observed in the scatterplot continues outside this range.

Reading Computer Output Exercise physiologists are investigating the relationship between lean body mass (in kilograms) and the resting metabolic rate (in calories per day) in sedentary males. What is the LSRL?

The correlation coefficient and the LSRL are both non-resistant measures.

You should be able to…. Calculate a regression line given summary statistics. Interpret the slope and intercept of the regression line. Find predictions and residuals for points. Interpret a residual plot. Interpret the R2 value for a regression. Understand the limitations of regression.