1 Chapter 16 Linear regression is a procedure that identifies relationship between independent variables and a dependent variable.Linear regression is.

Slides:



Advertisements
Similar presentations
Chapter 9: Simple Regression Continued
Advertisements

Test of (µ 1 – µ 2 ),  1 =  2, Populations Normal Test Statistic and df = n 1 + n 2 – 2 2– )1– 2 ( 2 1 )1– 1 ( 2 where ] 2 – 1 [–
Example 1 To predict the asking price of a used Chevrolet Camaro, the following data were collected on the car’s age and mileage. Data is stored in CAMARO1.
Chi Squared Tests. Introduction Two statistical techniques are presented. Both are used to analyze nominal data. –A goodness-of-fit test for a multinomial.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
1 Simple Linear Regression and Correlation The Model Estimating the Coefficients EXAMPLE 1: USED CAR SALES Assessing the model –T-tests –R-square.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Simple Linear Regression
Lecture 22 Multiple Regression (Sections )
Chapter Goals After completing this chapter, you should be able to:
SIMPLE LINEAR REGRESSION
Chapter 16 Chi Squared Tests.
Chapter 17 Linear regression is a procedure that identifies relationship between independent variables and a dependent variable. This relationship helps.
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
Chapter 11 Multiple Regression.
Multiple Regression and Correlation Analysis
SIMPLE LINEAR REGRESSION
Ch. 14: The Multiple Regression Model building
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Simple Linear Regression Analysis
Simple Linear Regression. Introduction In Chapters 17 to 19, we examine the relationship between interval variables via a mathematical equation. The motivation.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
Lecture 5 Correlation and Regression
Correlation and Linear Regression
SIMPLE LINEAR REGRESSION
The Chi-Square Distribution 1. The student will be able to  Perform a Goodness of Fit hypothesis test  Perform a Test of Independence hypothesis test.
Correlation and Regression
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
Hypothesis Testing:.
Chapter 13: Inference in Regression
Linear Regression and Correlation
Correlation and Regression
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts.
QMS 6351 Statistics and Research Methods Regression Analysis: Testing for Significance Chapter 14 ( ) Chapter 15 (15.5) Prof. Vera Adamchik.
INTRODUCTORY LINEAR REGRESSION SIMPLE LINEAR REGRESSION - Curve fitting - Inferences about estimated parameter - Adequacy of the models - Linear.
1 1 Slide Simple Linear Regression Coefficient of Determination Chapter 14 BA 303 – Spring 2011.
Production Planning and Control. A correlation is a relationship between two variables. The data can be represented by the ordered pairs (x, y) where.
Copyright © 2009 Cengage Learning 15.1 Chapter 16 Chi-Squared Tests.
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
A Course In Business Statistics 4th © 2006 Prentice-Hall, Inc. Chap 9-1 A Course In Business Statistics 4 th Edition Chapter 9 Estimation and Hypothesis.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 16 Chi-Squared Tests.
Chapter 13 Multiple Regression
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests Business Statistics: A First Course Fifth Edition.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Chapter 13 Inference for Counts: Chi-Square Tests © 2011 Pearson Education, Inc. 1 Business Statistics: A First Course.
Copyright © Cengage Learning. All rights reserved. Chi-Square and F Distributions 10.
Dan Piett STAT West Virginia University Lecture 12.
Chapter 8: Simple Linear Regression Yang Zhenlin.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Multiple Regression Chapter 14.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
Chi-Två Test Kapitel 6. Introduction Two statistical techniques are presented, to analyze nominal data. –A goodness-of-fit test for the multinomial experiment.
INTRODUCTION TO MULTIPLE REGRESSION MULTIPLE REGRESSION MODEL 11.2 MULTIPLE COEFFICIENT OF DETERMINATION 11.3 MODEL ASSUMPTIONS 11.4 TEST OF SIGNIFICANCE.
Bivariate Regression. Bivariate Regression analyzes the relationship between two variables. Bivariate Regression analyzes the relationship between two.
Chapter 14 Introduction to Multiple Regression
10.2 Regression If the value of the correlation coefficient is significant, the next step is to determine the equation of the regression line which is.
Chapter 11 Chi-Square Tests.
Chapter 12 Tests with Qualitative Data
Chapter 11 Chi-Square Tests.
Chapter 11 Chi-Square Tests.
Presentation transcript:

1 Chapter 16 Linear regression is a procedure that identifies relationship between independent variables and a dependent variable.Linear regression is a procedure that identifies relationship between independent variables and a dependent variable. This relationship helps reduce the unexplained variation of the dependent variable behavior, thus provide better predictions of its future values.This relationship helps reduce the unexplained variation of the dependent variable behavior, thus provide better predictions of its future values.

2 The Simple linear regression model The model is:

3 The Simple linear regression model The model is: We try to estimate the deterministic part of it by developing the line with the best fit. Best fit is defined as the minimum sum of squared errors. An error is the difference between the line value and the actual value for a given x. The analysis yields

4 Are the costs of welding machines breakdowns related to their age? From the data answer the following: –Find the sample regression line –What is the coefficient of determination. Interpret. –Are machine age and monthly repair costs linearly related? –Is the fit good enough to use the model to predict the monthly repair costs of a 120 months old machine? –Make the prediction. Problem 6

5 From Excel we get: –The Cov(age,cost)= Mean age (x) = = Mean cost (y) = = –b 1 = cov(x,y)  = / = b 0 = y-b 0 x = (113.35) = The regression line: Data

6 Problem 6 Coefficient of determination. –In this case –56.59% of the variation in costs, are explained by the model (by the different ages).

7 Problem 6 Is there a linear relationship between monthly costs and machine age? We need to test if  1 is not equal to zero. –H 0 :  1 = 0 H 1 :  1  0 –In this case – t= [2.47-0]/.5106 = –The rejection region is t>t  /2 or t< -t  /2 with n-2 degrees of freedom Can be calculated separately

8 Problem 6 The p value < alpha Data

9 Problem 6 We need to forecast the expected cost for a 120 months old machine. The equation provides a point prediction: Cost = (120) = $ The prediction interval (use data analysis plus): LCL = $318.12; UCL = What’s the prediction for the average monthly repair cost for all the machines 120 months old? To answer this question construct the confidence interval (notice, not the prediction interval!)

10 Chapter 17 The multiple regression model allows more than one independent variable explain the values of the dependent variable.The multiple regression model allows more than one independent variable explain the values of the dependent variable. We assess the model as before usingWe assess the model as before using – t test for linear relationships between the independent variables and the dependent variable (tested one at a time) –F test for the over usefulness of the model –Coefficient of determination for the fit.

11 Problem 7 When a company buys another company it is not unusual that some workers are terminated. A buyout contract between Laurier Comp and the Western Comp required that Laurier provides a severance package to Western workers fired, equivalent to packages offered to Laurier workers. It is suggested that severance is determined by three factors: Age, length of service, pay. Bill smith, a Western employee, is offered a 5 weeks severance pay when his employment is terminated. Based on the data provided by Laurier (Xr19-05.xls) about severance offered to 50 of its employees in the past, answer the following questions:Xr19-05.xls

12 Determine the regression equation. Interpret the coefficients. Comment on how well the model fits the data. Do all the independent variables belong in the model? Does Laurier meet its obligation to Bill Smith? Problem 7 - continued

13 Problem 8 A linear regression model for life longevity –Insurance companies are interested in predicting life longevity of their customers. –Data for 100 deceased male customers was collected, and a regression model run. –The model studied was: Longevity =  0 +  1 MotherAge+ b 2 FatherAge+  3 GrandM+  4 GrandF+ 

14 Problem 8 The equation Coefficient of determination

15 Problem 8 Overall usefulness: H 0 : all   = 0 H 1 : At least one  i = 0 F Significance = p value = 4.86( ) Reject H 0. The model is useful.

16 Problem 8 Mother’s age and father’s age at death have strong linear relationships to an Individual’s age at death. Grandparents’ age at death are not good predictors of an individual’s age at death. The t-test for  i : H 0 :  i = 0 H 1 :  i = 0 t= (b i –  i )/s bi Rejection region: t>t  /2, n-k-1 or t<-t  /2, n-k-1

17 Chapter 18.2 Dummy variables help include qualitative data in a regression model.Dummy variables help include qualitative data in a regression model. If qualitative data can be categorized by n categories, there are n-1 dummy variables needed to express all the categories.If qualitative data can be categorized by n categories, there are n-1 dummy variables needed to express all the categories. Dummy variables take on the values 0 or 1.Dummy variables take on the values 0 or 1. –X i = 0 if the data point in question does not belong to category i –X i = 1 if the data point in question belongs to category i.

18 In problem 6 we studied the relationship between age of welding machines and breakdown costs. This study was expanded. It is now including also lathe machine and stamping machines. See Data file. Code for machine type: 1=Welding; 2=Lathe; 3=StampingData Answer the following: –Develop a regression model –Interpret the coefficient –Can we conclude that welding machines cost more to repair than stamping machine. –Predict the monthly cost to repair an 85 month old lathe machine Problem 9

19 First we need to prepare the input data Original data Problem 9

20 Run the multiple regression Problem 9

21 Run the multiple regression Note the reference line (for the stamping machine): Cost= Age Cost= Age W L Repair cost increase on the average by $2.53 a month. The monthly repair cost for a welding machine is $11.75 lower than for a stamping machine of the same age. However, this result is not significant p value=.55). There is insufficient evidence in the sample to support the hypothesis that there is any difference between repair costs of welding machines and stamping machines. The monthly repair cost for a lathe machine is $ lower than for a stamping machine of the same age. This result is significant. Problem 9

22 Chapter 15 We test the hypotheses that a set of data belongs to certain distributions: –The multinomial distribution –The normal distribution We also study whether two variables are dependent or not. We apply a tool called a Chi-squared test

23 The multinomial experiment The multinomial experiment is an extension of the binomial experiment. Characteristics –There are n independent trials. –Each trial can result in one of k possible outcomes. –There is a probability of a type k success (p k ) in each trial. We test whether the sample gathered support the hypothesis that p 1, p 2,…,p k are equal to specified values. The test is called: The goodness of fit test.

24 Problem 1 To determine whether a single die is balanced, or fair, the die was rolled 600 times. (See Xr15-09.xls). Is there sufficient evidence at 5% significance level to allow you to conclude that the die is not fair?

25 Problem 1 The hypothesis: –H 0 : p 1 = p 2 =…p 6 = 1/6 H 1 : At least one p is not 1/6. –Build a rejection Region: –In our case:  2 >  2 ,5

26 Problem 1 –We calculate  2 as follows: –In our case: e 1 =e 2 =…=e 6 =600(1/6)=100 From the file we have: f 1 =114; f 2 =92; f 3 =84; f 4 =101; f 5 =107; f 6 =103

27 Contingency table Here we test the relationship between two variables. Are they dependent? We build a contingency table and a Chi-Square statistic Variable/ Category 1 Variable/ Category 2 r rows c columns

28 A Sample Problem Contingency table Type of music vs. geographic location –A group of 30-years-old people is interviewed to determined whether the type of music is somehow related to the geographic location of their residence. –From the data presented can we infer that music preference is affected by the geographic location? Use (  =.10). H 0 : Type of music and geographic location are independent. H 1: Type of music and geographic location are dependent.

29 A Sample problem – contd. e11=(195)(428)/632=129.59; e12=(195)(100)/632=30.85 e23=(235)(65)/632=24.16;  2 = ( ) 2 / …+( ) 2 /24.16+…=64.92  2.10,(3-1)(4-1) = 10.64; 64.92> Reject the null hypothesis. Type of music and geographic location are not independent. RockR & BCountryClassical Northeast South West

30 Using data analysis Plus A Sample problem – contd.

31 Chi squared test for normality Hypothesize on  and    and    Divide the Z interval into equal size sub-intervals. [i.e. (–2, – 1); (-1,0); (0,1); (1,2)] Determine the corresponding probabilities covered by each subinterval. [i.e. p1=P(Z<-2); p2=P(-2<Z<-1); …] Translate the Z scores to the associated X values. [i,.e. x1=  0 +(-2)  0 ; x2=  0 +(-1)  0 ; …] Find the actual frequency for each subinterval [i.e. f1 - for the interval below x1; f2 - for the interval (x1,x2); …] Calculate the expected frequency for each interval: e1 = np1; e2 = np2; … Build a Chi squared statistic and perform the test