Presentation is loading. Please wait.

Presentation is loading. Please wait.

Diploma in Statistics Introduction to Regression Lecture 2.11 Introduction to Regression Lecture 2.1 1.Review of Lecture 1.1 2.Correlation 3.Pitfalls with.

Similar presentations


Presentation on theme: "Diploma in Statistics Introduction to Regression Lecture 2.11 Introduction to Regression Lecture 2.1 1.Review of Lecture 1.1 2.Correlation 3.Pitfalls with."— Presentation transcript:

1 Diploma in Statistics Introduction to Regression Lecture 2.11 Introduction to Regression Lecture 2.1 1.Review of Lecture 1.1 2.Correlation 3.Pitfalls with Regression and Correlation 4.Introducing Multiple Linear Regression –Job times case study –Stamp sales case study 5.Homework

2 Diploma in Statistics Introduction to Regression Lecture 2.12 Review of Lecture 1.1 Scatter plot of US mail handling data, exceptions deleted

3 Diploma in Statistics Introduction to Regression Lecture 2.13 Always look ar your data! "Although regression can be done without ever looking at a scatter plot, that is the statistical equivalent of flying blind" Amy Lap Mui Choi, JF MSISS, 1993/94. "Decision-making under risk is when you know what will probably happen and decision-making under uncertainty is when you probably know what will happen." Anon., JF MSISS 1995/96

4 Diploma in Statistics Introduction to Regression Lecture 2.14 Simple linear regression model with Normal model for chance variation Y = α + β X + 

5 Diploma in Statistics Introduction to Regression Lecture 2.15 The prediction formula Prediction equation: Prediction equation allowing for chance variation:

6 Diploma in Statistics Introduction to Regression Lecture 2.16 Homework Use the prediction formula to predict the extra manpower requirement during Christmas period, based on the experience of Period 7, Fiscal 1963, when Y was 1,070 and X was 270. Compare with actual. Comment.

7 Diploma in Statistics Introduction to Regression Lecture 2.17 Application 1 Confidence interval for marginal change Recall confidence interval for  or Confidence interval for  : Small sample:

8 Diploma in Statistics Introduction to Regression Lecture 2.18

9 Diploma in Statistics Introduction to Regression Lecture 2.19 Application 2 Testing the statistical significance of the intercept Formal test: H 0 :  = 0 Test statistic: Critical value:2 (or t 21,.05 = 2.08) Calculated value:0.848 Comparison:Z < 2 (or t < 2.08) Conclusion:Accept H 0

10 Diploma in Statistics Introduction to Regression Lecture 2.110 Testing the statistical significance of the intercept Informal test: is less than its standard error, Draw a picture!

11 Diploma in Statistics Introduction to Regression Lecture 2.111 Regression Analysis: Manhours versus Volume The regression equation is Manhours = 50.4 + 3.35 Volume Predictor Coef SE Coef T P Constant 50.44 59.46 0.85 0.406 Volume 3.3454 0.3401 9.84 0.000 S = 18.9300 More on Minitab results

12 Diploma in Statistics Introduction to Regression Lecture 2.112 Homework In a study of a wholesaler's distribution costs, undertaken with a view to cost control, the volume of goods handled and the overall costs were recorded for one month in each of ten depots in a distribution network. The results are presented in the following table.

13 Diploma in Statistics Introduction to Regression Lecture 2.113 Homework The simple linear regression of costs (Y) on volume (X) was calculated, and resulted in the following numerical summary. Regression Analysis: Costs versus Volume The regression equation is Costs = 2.98 + 0.332 Volume Predictor Coef SE Coef T P Constant 2.982 1.646 1.81 0.108 Volume 0.33174 0.03182 10.42 0.000 S = 0.667603

14 Diploma in Statistics Introduction to Regression Lecture 2.114 Homework (i)Draw a scatter plot for these data. Comment. Interpret the numerical summary in context. (ii)Calculate a prediction interval for costs next month when Volume in Depot 1 is planned to be £40,000, and Volume in Depot 2 is planned to be £51,000. (iii)Next month, when the two depots recorded volumes of £40,000 and £51,000 as planned, costs were £1,700 and £2,300 respectively. Comment on each case. Illustrate with an enhancement of your scatter plot.

15 Diploma in Statistics Introduction to Regression Lecture 2.115 Homework Solution (i) There appears to be a strong positive relationship between Costs and Volume.

16 Diploma in Statistics Introduction to Regression Lecture 2.116 Homework Solution (i) Costs increase approximately linearly with Volume, by around £33.20 for every £1,000 increase in Volume, from a base of around £300. (Costs = 2.98 + 0.332 Volume) The cost for a given volume is subject to chance variation with a standard deviation of around £67. (S = 0.667603)

17 Diploma in Statistics Introduction to Regression Lecture 2.117 Homework Solution (ii)Volume = £40,000, Costs  (£1,491, £1,759) Volume = £51,000, Costs  (£1,857, £2,124) (iii)£1,700 is within the corresponding prediction interval, satisfactory. £2,300 is outside the corresponding prediction interval, too high. An investigation is needed. Illustrate

18 Diploma in Statistics Introduction to Regression Lecture 2.118 Confidence interval for mean response: Prediction interval for next response: More precise formulas (ii)Volume = £40,000, Costs  (£1,444, £1,807) Volume = £51,000, Costs  (£1,829, £2,151)

19 Diploma in Statistics Introduction to Regression Lecture 2.119 Standard error of prediction of estimation Ref: "The Standard Error of Prediction" Extra Notes folder in mstuart/get or Diploma webpage

20 Diploma in Statistics Introduction to Regression Lecture 2.120 Homework Solution

21 Diploma in Statistics Introduction to Regression Lecture 2.121 2.Correlation The correlation coefficient formula r and reduction of prediction error Positive and negative correlation Perfect correlation Conventional interpretations of r

22 Diploma in Statistics Introduction to Regression Lecture 2.122 The correlation coefficient formula Recall equivalently,

23 Diploma in Statistics Introduction to Regression Lecture 2.123 Scatter plot showing zero correlation

24 Diploma in Statistics Introduction to Regression Lecture 2.124 Correlation r = 0.1 to r = 0.9 Data Desk

25 Diploma in Statistics Introduction to Regression Lecture 2.125 r and reduction in prediction error

26 Diploma in Statistics Introduction to Regression Lecture 2.126 r and reduction in prediction error

27 Diploma in Statistics Introduction to Regression Lecture 2.127 Positive and negative correlation

28 Diploma in Statistics Introduction to Regression Lecture 2.128 Perfect correlation, positive and negative

29 Diploma in Statistics Introduction to Regression Lecture 2.129 Conventional interpretations of r Science / Engineering:r > 0.9 is "interesting" Econometrics:r > 0.7 is "interesting", otherwise,r > 0.5 is "interesting" Sociology:r > 0.3 is "interesting" Recommendation:compare s to S Y

30 Diploma in Statistics Introduction to Regression Lecture 2.130 3.Pitfalls with regression and correlation

31 Diploma in Statistics Introduction to Regression Lecture 2.131 Anscombe's data summary

32 Diploma in Statistics Introduction to Regression Lecture 2.132 Anscombe's scatter plots

33 Diploma in Statistics Introduction to Regression Lecture 2.133 Homework The shelf life of packaged foods depends on many factors. Dry cereal (such as corn flakes) is considered to be a moisture-sensitive product, with the shelf life determined primarily by moisture. In a study of the shelf life of one brand of cereal, packets of cereal were stored in controlled conditions (23°C and 50% relative humidity) for a range of times, and moisture content was measured. The results were as follows. Draw a scatter diagram. Comment. What action is suggested? Why?

34 Diploma in Statistics Introduction to Regression Lecture 2.134 Following appropriate action, the following regression was computed. The regression equation is Moisture Content = 2.86 + 0.0417 Storage Time Predictor Coef SE Coef T P Constant 2.86122 0.02488 115.01 0.000 Storage Time 0.041660 0.001177 35.40 0.000 S = 0.0493475 Calculate a 95% confidence interval for the daily change in moisture content; show details.

35 Diploma in Statistics Introduction to Regression Lecture 2.135 Was the action you suggested on studying the scatter diagram in part (a) justified? Explain. Predict the moisture content of a packet of cereal stored under these conditions for 3 weeks; calculate a prediction interval. What would be the effect on your interval of not taking the action you suggested on studying the scatter diagram? Why? Taste tests indicate that this brand of cereal is unacceptably soggy when the moisture content exceeds 4. Based on your prediction interval, do you think that a box of cereal that has been on the shelf for 3 weeks will be acceptable? Explain. What about 4 weeks? 5 weeks? What is acceptable?

36 Diploma in Statistics Introduction to Regression Lecture 2.136 Reading SA Sections 6.4, 6.5

37 Diploma in Statistics Introduction to Regression Lecture 2.137 4 Introducing Multiple Linear Regression SLRexplaining variation in Y in terms of variation in X MLRexplaining variation in Y in terms of variation in several X 's

38 Diploma in Statistics Introduction to Regression Lecture 2.138 Example 1 What determines the taste of mature cheese? X 1 = Acetic Acid X 2 = Hydrogen Sulphide X 3 = Lactic Acid Y = Taste Score

39 Diploma in Statistics Introduction to Regression Lecture 2.139 Example 2 Explaining crime rates Variable Description M percentage of males aged 14–24 So indicator variable for a southern state Ed mean years of schooling Po1 police expenditure in 1960 Po2 police expenditure in 1959 LF labour force participation rate M.F number of males per 1000 females Pop state population NW number of nonwhites per 1000 people U1 unemployment rate of urban males 14–24 U2 unemployment rate of urban males 35–39 GDP gross domestic product per head Ineq income inequality Prob probability of imprisonment Time average time served in state prisons Crime rate of crimes in a particular category per head of population

40 Diploma in Statistics Introduction to Regression Lecture 2.140 Example 3 Estimating tree volume / timber yield For a sample of 31 black cherry trees in the Allegheny National Forest, Pennsylvania, measure Y= volume (cubic feet), X 1 = height (feet) X 2 = diameter (inches) (at 54 inches above ground

41 Diploma in Statistics Introduction to Regression Lecture 2.141 Example 4 The Stamp Sales Case Study The problem January 1984, An Post established New business plan; sales forecasts required Historical sales data available bring in a consultant!

42 Diploma in Statistics Introduction to Regression Lecture 2.142 Example 5 A production prediction problem The problem The data Initial data analysis –dotplots –lineplots (time series plots) –scatterplot matrix Model fitting / estimation Model criticism Application

43 Diploma in Statistics Introduction to Regression Lecture 2.143 Erie Metal Products: The problem Metal products fabrication: customers order varying quantities of products of varying complexity; customers demand accurate and precise order delivery times.

44 Diploma in Statistics Introduction to Regression Lecture 2.144 Stephan Clark Metal Products A specially designed cabinetRear view

45 Diploma in Statistics Introduction to Regression Lecture 2.145 Stephan Clark Metal Products Instrument casingAnother view

46 Diploma in Statistics Introduction to Regression Lecture 2.146 Stephan Clark Metal Products Instrument casing; oblique viewLockers

47 Diploma in Statistics Introduction to Regression Lecture 2.147 Stephan Clark Metal Products "One customer is an international manufacturer of petrochemical equipment." "Stephen Clark supplies painted metalwork components, panels and fabrications, which are used throughout the customer's product range." "Stephen Clark plays an important part in them being able to cope with frequent scheduling changes." "Through careful program management, we are able to offer excellent flexibility of supply, delivering finished product against weekly call-offs."

48 Diploma in Statistics Introduction to Regression Lecture 2.148 Erie Metal Products: The data

49 Diploma in Statistics Introduction to Regression Lecture 2.149 The variables Response: –Jobtime, time (hours) to complete an order Explanatory: –Units, the number of units ordered –Operations per Unit, the number of operations involved in manufacturing a unit, –Rushed, indicator of "rushed" priority status –Total OperationsUnits × Operations per Unit

50 Diploma in Statistics Introduction to Regression Lecture 2.150 Initial data analysis, dotplots

51 Diploma in Statistics Introduction to Regression Lecture 2.151 Initial data analysis, lineplots

52 Diploma in Statistics Introduction to Regression Lecture 2.152 Initial data analysis, scatterplot matrix

53 Diploma in Statistics Introduction to Regression Lecture 2.153 The multiple linear regression model Jobtime =   Units × Units  Ops × Ops  T_Ops × T_Ops  Rushed × Rushed 

54 Diploma in Statistics Introduction to Regression Lecture 2.154 Model parameters The regression coefficients:  Units,  Ops,  T_Ops,  Rushed The "uncertainty" parameter:  standard deviation of 

55 Diploma in Statistics Introduction to Regression Lecture 2.155 Parameter estimates Prediction formula Jobtime = 44 – 0.07 × Units + 9.8×Ops + 0.1×T_Ops – 38×Rushed ± 15 Exercise Job 9, a rushed job with 21 units and 9 operations per unit, took 260 hours to complete. Was this reasonable?

56 Diploma in Statistics Introduction to Regression Lecture 2.156 Find values for  and  that minimise the deviations Y 1 −  −  X 1, Y 2 −  −  X 2, Y 3 −  −  X 3,    Y n −  −  X n Choosing values for the regression coefficients, SLR

57 Diploma in Statistics Introduction to Regression Lecture 2.157 The method of least squares, SLR Find values for  and  that minimise the sum of the squared deviations: (Y 1 −  −  X 1 ) 2 + (Y 2 −  −  X 2 ) 2 + (Y 3 −  −  X 3 ) 2  + (Y n −  −  X n ) 2

58 Diploma in Statistics Introduction to Regression Lecture 2.158 The method of least squares, MLR Find values for  and  that minimise the sum of the squared deviations: (Y 1 −  −  1 X 11 −  2 X 21 −  3 X 31 − etc. ) 2 + (Y 2 −  −  1 X 12 −  2 X 22 −  3 X 32 − etc. ) 2 + (Y 3 −  −  1 X 13 −  2 X 23 −  3 X 33 − etc. ) 2    + (Y n −  −  1 X 1n −  2 X 2n −  3 X 3n − etc. ) 2 Minitab!

59 Diploma in Statistics Introduction to Regression Lecture 2.159 Regression of Jobtime on other variables Predictor Coef SE Coef T P Constant 77.24 44.76 1.73 0.105 Units -0.1507 0.1121 -1.34 0.199 Ops 7.152 4.305 1.66 0.117 T_Ops 0.11460 0.01322 8.67 0.000 Rushed -24.94 19.11 -1.31 0.211 S = 37.4612

60 Diploma in Statistics Introduction to Regression Lecture 2.160 Exercise From the computer output, write down the parameter estimates and the prediction formula. Predict job times for a typical job, say 300 units requiring 10 operations per unit, both normal and rushed.

61 Diploma in Statistics Introduction to Regression Lecture 2.161 Exercise (continued) Is this a useful prediction? What is S? What is 2S? When will my order arrive? NEXT Diagnostics; analysis of residuals

62 Diploma in Statistics Introduction to Regression Lecture 2.162 Homework Predict job times for small (U=100, O=5), medium (U=300, O=10) and large (U=500, O=15) jobs, both normal and rushed. Present the results in tabular form.

63 Diploma in Statistics Introduction to Regression Lecture 2.163 Return to The Stamp Sales Case Study The problem January 1984, An Post established New business plan; sales forecasts required Historical sales data available bring in a consultant!

64 Diploma in Statistics Introduction to Regression Lecture 2.164 Historical data

65 Diploma in Statistics Introduction to Regression Lecture 2.165 Trend projection?

66 Diploma in Statistics Introduction to Regression Lecture 2.166 Factors influencing sales Economic growth Stamp prices Alternative product prices measurement problems!

67 Diploma in Statistics Introduction to Regression Lecture 2.167 Project: develop a sales forecasting system for An Post Terms of reference 1.Identify and collect the relevant macro- economic data. 2.Establish a data base containing the data needed for model building; 3.Identify, estimate and check a dynamic regression model suitable for the purposes outlined below:

68 Diploma in Statistics Introduction to Regression Lecture 2.168 (a)medium-term (one to five years) forecasting of aggregate demand for postal services; (b)analysis of the effects of levels of general economic activity, postal prices and the prices of competing services, on aggregate demand for postal services; (c)use as a benchmark for the analysis of the effects of demand stimulation activities.

69 Diploma in Statistics Introduction to Regression Lecture 2.169 Project: develop a sales forecasting system for An Post Terms of reference 1.Identify and collect the relevant macro- economic data. 2.Establish a data base containing the data needed for model building; 3.Identify, estimate and check a dynamic regression model suitable for the purposes outlined below:

70 Diploma in Statistics Introduction to Regression Lecture 2.170 (a)medium-term (one to five years) forecasting of aggregate demand for postal services; (b)analysis of the effects of levels of general economic activity, postal prices and the prices of competing services, on aggregate demand for postal services; (c)use as a benchmark for the analysis of the effects of demand stimulation activities.

71 Diploma in Statistics Introduction to Regression Lecture 2.171 Reading SASections 1.6, 8.1, 8.2,


Download ppt "Diploma in Statistics Introduction to Regression Lecture 2.11 Introduction to Regression Lecture 2.1 1.Review of Lecture 1.1 2.Correlation 3.Pitfalls with."

Similar presentations


Ads by Google