Presentation is loading. Please wait.

Presentation is loading. Please wait.

Linear regression involves finding the equation of the line of best fit on a scatter graph. The equation obtained can then be used to make an estimate.

Similar presentations


Presentation on theme: "Linear regression involves finding the equation of the line of best fit on a scatter graph. The equation obtained can then be used to make an estimate."— Presentation transcript:

1 Linear regression involves finding the equation of the line of best fit on a scatter graph. The equation obtained can then be used to make an estimate of one variable given the value of the other variable. There are two cases to consider, depending upon whether: Regression S1 deals with the with the first situation. 1. We wish to find a value of y given a value for x, or 2. We want to estimate x given y.

2 Linear regression involves finding the equation of the line of best fit on a scatter graph. The equation obtained can then be used to make an estimate of one variable given the value of the other variable. There are two cases to consider, depending upon whether: Regression S1 deals with the with the first situation. 1. We wish to find a value of y given a value for x, 2. We want to estimate x given y.

3 Regression The best fitting line is the one that minimizes the sum of the squared deviations,, where d i is the vertical distance between the i th point and the line. d1d1 d2d2 d3d3 d4d4 d5d5 d6d6 The distances d i are sometimes referred to as residuals.

4 Regression As stated previously, the best fitting line should pass through the mean point,.

5 The line that minimizes the sum of squared deviations is formally known as the least squares regression line of y on x. The equation of the least squares regression line of y on x is: Regression and: Recall:and y = a + bx b is sometimes referred to as the regression coefficient. where:

6 Example: The table shows the latitude, x, and mean January temperature(°C), y, for a sample of 10 cities in the northern hemisphere. Calculate the equation of the regression line of y on x and use it to predict the mean January temperature for the city of Los Angeles, which has a latitude of 34°N. Regression CityLatitudeMean Jan. temp. (°C) Belgrade451 Bangkok1432 Cairo3014 Dublin503 Havana2322 Kuala Lumpur327 Madrid405 New York410 Reykjavik30–1 Tokyo365

7 City Belgrade451 Bangkok1432 Cairo3014 Dublin503 Havana2322 Kuala Lumpur 327 Madrid405 New York410 Reykjavik30–1 Tokyo365 TOTALS Regression - EXAMPLE

8 Regression We begin by finding summary statistics for the table: We then use these to calculate the gradient ( b ) and y -intercept ( a ) for the regression line. CityLatitude ( x ) Mean Jan. temp. (°C) ( y ) Belgrade451 Bangkok1432 Cairo3014 Dublin503 Havana2322 Kuala Lumpur327 Madrid405 New York410 Reykjavik30–1 Tokyo365

9 Regression To find the gradient, we need S xy and S xx : Therefore: –0.720 (to 3 s.f.)

10 Therefore, the equation of the regression line is: y = 33.3 – x This is our estimate of the mean January temperature in Los Angeles. Regression To find the y -intercept we also need and : So: = 33.3 (to 3 s.f.) So, when x = 34, y = 33.3 – × 34 = 8.82°C.

11 This prediction for the mean January temperature in Los Angeles is based purely on the city’s latitude. There are likely to be additional factors that can affect the climate of a city, for example: Regression The concept of regression we have considered here can be extended to incorporate other relevant factors, producing a new formula. This allows for more accurate prediction. altitude; proximity to the coast; ocean currents; prevailing winds.

12 A regression equation can only confidently be used to predict values of y that correspond to x values that lie within the range of the data values available. The dangers of extrapolation It can be dangerous to extrapolate (i.e. to predict) from the graph, a value for y that corresponds to a value of x that lies beyond the range of the values in the data set. It is reasonably safe to make predictions within the range of the data. It is unwise to extrapolate beyond the given data. This is because we cannot be sure that the relationship between the two variables will continue to be true.

13 Examination-style question: The average weight and wingspan of 9 species of British birds are given in the table. Examination-style question: regression BirdWeight (g) Wingspan (cm) Wren1015 Robin1821 Chaffinch1824 Cuckoo5733 Blackbird10037 Pigeon30067 Lapwing22070 Crow50099 Common gull a)Plot the data on a scatter graph. Comment on the relationship between the variables. b)Calculate the regression line of wingspan on weight. c)Use your regression line to estimate the wingspan of a jay, if its average weight is 160 g. d)Explain why it would be inappropriate to use your line to estimate the wingspan of a duck, if the average weight of a duck is 1 kg.

14 Examination-style question: regression a) The graph indicates that there is fairly strong positive correlation between weight and wingspan – this means that wingspan tends to be longer in heavier birds.

15 b) Summary values for the paired data are: Examination-style question: regression These can be used to find the gradient of the regression line: Therefore: x = weight y = wingspan (to 3 s.f.)

16 Examination-style question: regression To find the y -intercept we also need and : So: Therefore, the equation of the regression line is: y = x where y = wingspan and x = weight.

17 c)When the weight is 160 g, we can predict the wingspan to be: y = x = d)The average weight of a duck is outside the range of weights provided in the data. It would therefore be inappropriate to use the regression line to predict the wingspan of a duck, as we cannot be certain that the same relationship will continue to be true at higher weights. Note: The regression coefficient (0.176) can be interpreted here as follows: as the weight increases by 1 g, the wingspan increases by cm, on average. Examination-style question: regression (0.176 × 160) = 48.2 cm (to 3 s.f.)


Download ppt "Linear regression involves finding the equation of the line of best fit on a scatter graph. The equation obtained can then be used to make an estimate."

Similar presentations


Ads by Google