Simple Linear Regression 1. 2 I want to start this section with a story. Imagine we take everyone in the class and line them up from shortest to tallest.

Slides:



Advertisements
Similar presentations
Here we add more independent variables to the regression.
Advertisements

Simple Linear Regression 1. 2 I want to start this section with a story. Imagine we take everyone in the class and line them up from shortest to tallest.
Chapter 3 Bivariate Data
2nd Day: Bear Example Length (in) Weight (lb)
Warm up Use calculator to find r,, a, b. Chapter 8 LSRL-Least Squares Regression Line.
Chapter 8 Linear Regression © 2010 Pearson Education 1.
1 Multiple Regression Interpretation. 2 Correlation, Causation Think about a light switch and the light that is on the electrical circuit. If you and.
1 The Basics of Regression. 2 Remember back in your prior school daze some algebra? You might recall the equation for a line as being y = mx + b. Or maybe.
1 Multiple Regression Here we add more independent variables to the regression. In this section I focus on sections 13.1, 13.2 and 13.4.
The Simple Regression Model
The Basics of Regression continued
More Simple Linear Regression 1. Variation 2 Remember to calculate the standard deviation of a variable we take each value and subtract off the mean and.
Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We.
Descriptive Methods in Regression and Correlation
Introduction to Linear Regression and Correlation Analysis
EC339: Lecture 6 Chapter 5: Interpreting OLS Regression.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Ch4 Describing Relationships Between Variables. Pressure.
AP STATISTICS LESSON 3 – 3 LEAST – SQUARES REGRESSION.
Applied Quantitative Analysis and Practices LECTURE#22 By Dr. Osman Sadiq Paracha.
Objective: Understanding and using linear regression Answer the following questions: (c) If one house is larger in size than another, do you think it affects.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
11/23/2015Slide 1 Using a combination of tables and plots from SPSS plus spreadsheets from Excel, we will show the linkage between correlation and linear.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.2 Least-Squares.
Chapter 9: Correlation and Regression Analysis. Correlation Correlation is a numerical way to measure the strength and direction of a linear association.
1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.
Linear Regression Day 1 – (pg )
Business Statistics for Managerial Decision Making
LEAST-SQUARES REGRESSION 3.2 Least Squares Regression Line and Residuals.
CHAPTER 3 Describing Relationships
CHAPTER 5: Regression ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
Copyright © Cengage Learning. All rights reserved. 8 9 Correlation and Regression.
Describing Relationships. Least-Squares Regression  A method for finding a line that summarizes the relationship between two variables Only in a specific.
Week 2 Normal Distributions, Scatter Plots, Regression and Random.
1. Analyzing patterns in scatterplots 2. Correlation and linearity 3. Least-squares regression line 4. Residual plots, outliers, and influential points.
Chapter 2 Linear regression.
Warm-up Get a sheet of computer paper/construction paper from the front of the room, and create your very own paper airplane. Try to create planes with.
The simple linear regression model and parameter estimation
Lecture 9 Sections 3.3 Objectives:
Topics
LSRL.
Least Squares Regression Line.
Sections Review.
LSRL Least Squares Regression Line
The Least-Squares Regression Line
Lecture Slides Elementary Statistics Thirteenth Edition
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3 Describing Relationships Section 3.2
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 5 LSRL.
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Algebra Review The equation of a straight line y = mx + b
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
9/27/ A Least-Squares Regression.
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Presentation transcript:

Simple Linear Regression 1

2 I want to start this section with a story. Imagine we take everyone in the class and line them up from shortest to tallest. As you look to the front of the class from your seat the shortest will be on the left and the tallest will be on the right. In fact, in a face to face class we will line you up. Compare yourself to other people and if you are taller than someone else move to the right, if smaller move to the left. Now, imagine we have everyone lined up in order from smallest to tallest. If you are back in your seat and you look down at the line-up (you have to use your imagination because you can not be both in the line-up and in your seat) I bet the line-up looks like the following (when thinking about the height of the people): height5’6”6’1”

3 On the previous screen you see most people are between 5’6” and 6’1”. There are some that are shorter and some that are taller. This is not rocket science, right? From the line-up we could calculate the average height for the group. Now, instead of looking at the height of people, let’s look at the size of their feet. In the same order as height I would venture to say that the size of the feet gets larger as we go from left to right in the room. Imagine you are walking across the room looking down at peoples feet. I say the feet probably looks like the following (I only show three, but I wanted you to fill in the rest):

Overview 4 Imagine you are the first person to get into the room each day. Say you have a class roster so you know the names of all the other people in the class. Also say on each successive day you will try to guess the height of the person who comes into the room first after you. At this point in the story you have to guess without any clue about who will come into the room. I tell you that the best guess you could make each day is to just guess the average height. While you would likely be wrong each day at least you would even out days of being below average and days being above average. Other methods to try to guess at the height might always have you guess too high a value or too low a value.

Overview 5 Now, let’s change the story somewhat. Say before the person enters the room and before you have to guess the height you can see the person’s feet. Would knowing the size of their feet help you guess the height of the person? Since there is a pattern that people with larger feet tend to be taller you could say the height is above average if the feet size is above aveage and the height is below average if the feet are below average. While you probably still not guess the height exactly you would improve on just guessing the average height. So, since foot size and height are related, knowing foot size can help us predict height.

Overview 6 Note in this example that I am not saying that foot size is the cause of height, just that foot size and height are related. Regression analysis is a method to assist us in seeing if variables are related. In this context when we say related we often use the phrase that variables are correlated. Also note that correlation is not causation. Foot size does not cause height. In fact, foot size and height are really caused by other variables such as nutrition and family genes. In business we often seek out relationships between variables to assist us in making sense of the world. The aim is to come up with stories similar to the feet size/height story.

Overview 7 Consider an example about a group of college graduates. Each graduate does not have the same dollar amount of starting salary. Since each graduate does not have the same starting salary amount, an investigation might occur as to why not. In the investigation one might think about other variables that might influence starting salary. Starting salaries could be influenced by, among other things, the gpa of the student, the number of student groups the student was in, or even the work experience of the graduate. This gpa variable might be important because the larger the gpa the more will be the starting salary.

Overview 8 In the example so far, starting salary is called the response variable because the values for starting salary are thought to respond on the values for the other variables. The response variable is often called the y variable and in a graph is put on the vertical axis. GPA, student group and work experience are all examples of explanatory variables. When we use just one explanatory variable with the response variable we have a situation where we can conduct SIMPLE LINEAR REGRESSION. The explanatory variable would be called the x variable and put on the horizontal axis. When two or more explanatory variables are used we could do MULTIPLE REGRESSION. For now we stick with simple linear regression.

Using a Sample to Estimate the Model 9 On the next slide I show some data and scatterplot for the example we have been developing. Note that a sample has been taken from 7 graduates and in the data the gpa and starting salary are in the rows of the table. Each point in the scatterplot is a gpa, starting salary pair for a graduate. With a sample of data we can estimate the regression line as ŷ = b o + b 1 x, where b o is the y intercept of the line and is the value of ŷ when x is 0. The slope b 1 is a number that represents the expected change in ŷ when x increases by 1 unit. By the way ŷ is called y hat and we say that to know we have a regression line.

10 GraduateGPAStart Salary

Least Squares Method 11 X Y Line 1 Line 2 Line 3

Least Squares 12 On the previous slide I show a more generic scatter plot and I put three lines in the graph. All three lines are decent in the sense that with the upward slope they all show the same basic idea as the dots in the graph: as x rises, y rise (meaning x and y are positively related.) In theory we could find the equation for each line by algebra, or something like that. Then for each line we would have a bo and b1 value. Now line 1 is bad because it is too high. What I mean here is that if we used the line to predict y we would always predict too high a number. Similarly with line 3 we would be too low all the time.

Least Squares 13 Line 2 is “among” the data points and when you make predictions with the line sometimes you will be too high and sometimes too low. But, no straight line can be exactly perfect (unless all the points are truly on a straight line, which will likely not happen in business and social research). Line 2 is my interpretation of the line that would be picked by what is called the least squares method. When you look at a y value on the line, called ŷ, the least squares line is placed in such a way the that sum of the squared differences of each dot to the line is minimized. Since each dot has a y, the least squares method picks a bo and b1 such that the resulting differences y minus ŷ when squared, and then summed across all values, is minimized.

14 bo = b1 =

Least Squares 15 For now we will assume Microsoft Excel or some other program can show us the estimated regression line using least squares. We just want to use what we get. On the previous page I have Excel. Note in cell B25 you see the word Coefficients. In cells a26:a27 you see the words Intercept and GPA and then the numbers and are in cells b26:b27. This means ŷ = bo + b1Xi has been estimated to be Starting salary = gpa. Note the data had starting salary measured in thousands. This means, for example, the data had 29.8 but it means the real value is 29,800.

Prediction with least squares 16 Remember our estimated line is Starting Salaries = gpa. Say we want to predict salary if the gpa is 2.7. Starting Salaries = (2.7) = This starting salary is $30,223.82

Interpolation and Extrapolation 17 You will notice in our example data set that the smallest value for x was 2.21 and the largest value was When we want to predict a value of y, ŷ, for a given x, if the x is within the range of the data values for x (2.21 to 3.82 in our example) then we are interpolating. But if an x is outside our range for x we are extrapolating. Extrapolating should be used with a great deal of caution. Maybe the relationship between x and y is different outside the range of our data. If so, and we use the estimated line we may be way off in our predictions. Note the intercept has to be interpreted with similar caution because unless our data includes x’s that include zero in the range, the relationship between x and y could be very different in the x = 0 neighborhood than the one suggested by least squares.

Variation 18 Remember to calculate the standard deviation of a variable we take each value and subtract off the mean and then square the result. (We also the divided by something, but that is not important in this discussion.) In a regression setting on the response variable Y we define the total sum of squares SST as Σ(Yi – Ybar) 2. SST can be rewritten as SST = Σ(Yi – Ŷi + Ŷi –Ybar) 2 = Σ(Ŷi –Ybar) 2 + Σ(Yi – Ŷi) 2 = SSR + SSE. Note: you may recall from algebra that (a + b) 2 = a 2 + 2ab + b 2. In our story here 2ab = 0. While this is not true in general in algebra it is in this context of regression. If this note makes no sense to you do not worry, just use SST = SSR + SSE

Variation 19 So we have SST = Σ(Yi – Ybar) 2, SSR = Σ(Ŷi –Ybar) 2 and SSE = Σ(Yi – Ŷi) 2. On the next slide I have a graph of the data with the regression line put in and a line showing the mean of Y. For each point we could look at the how far the point is from the mean line. This is what SST is looking at. But SSR is indicating that of all the difference in the point and the mean the regression line is able to account for some of that variation. The rest of the difference is SSE.

Variation Y Least Squares regression Line = Ŷi X 20 Y bar Two examples of what is going into SSR Two examples of what is going into SSE

The Coefficient of Determination 21 The coefficient of determination, often denoted r 2, measures the proportion in the variation in Y that is explained by the explanatory variable X in the regression model. r 2 = SSR/SST. In our example from above we have r 2 = SSR/SST = 0.98 rounded to 2 decimals. This means that 98% percent of the variation in starting salary is explained by the variability in the gpa of students. Plus, only 2% of the variability in starting salary is due to other factors.

Coefficient of Determination 22 Say we didn’t have an X variable to help us predict the Y variable. Then a reasonable way to predict Y would be to just use its average or mean value. But, with a regression, by using an X variable it is thought we can do better than just using the mean of Y as a predictor. In a simple linear regression r 2 is an indicator of the strength of the relationship between two variables because the use of the regression model would reduce the variability in predicting the sales by just using the mean sales by the percentage obtained. In different areas of study (like marketing, management, and so on) the idea of what a good r 2 is varies. But, you can be sure if r 2 is.8 or above you have a strong relationship.

Correlation 23 Remember the correlation coefficient r was used to understand the direction and strength of the relationship between two variables. The coefficient r when squared is the r 2 in regression. Regression and correlation are related in this way in the simple linear regression.

Residuals 24 A residual = observed value minus the predicted value = y – ŷ. Back on slide 14 I had the data set and we see, for example, the individual with gps = 2.6 and starting salary = So y = In the equation ŷ = gpa with gpa = 2.6 we get a y hat = (2.6) = and the residual would be 29.8 – =.15 Individual points with large residuals would indicate influential data points.