1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.

Slides:



Advertisements
Similar presentations
Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
Advertisements

6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Simple Regression. y = mx + b y = a + bx.
Simple Linear Regression
Chapter 13 Multiple Regression
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
1 BA 555 Practical Business Analysis Housekeeping Review of Statistics Exploring Data Sampling Distribution of a Statistic Confidence Interval Estimation.
Chapter 10 Simple Regression.
Chapter 12 Simple Regression
BA 555 Practical Business Analysis
Chapter 12 Multiple Regression
The Simple Regression Model
SIMPLE LINEAR REGRESSION
1 BA 275 Quantitative Business Methods Simple Linear Regression Introduction Case Study: Housing Prices Agenda.
Pengujian Parameter Koefisien Korelasi Pertemuan 04 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Chapter Topics Types of Regression Models
Chapter 11 Multiple Regression.
Simple Linear Regression Analysis
SIMPLE LINEAR REGRESSION
Correlation and Regression Analysis
Chapter 7 Forecasting with Simple Regression
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Simple Linear Regression. Introduction In Chapters 17 to 19, we examine the relationship between interval variables via a mathematical equation. The motivation.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Correlation & Regression
Regression and Correlation Methods Judy Zhong Ph.D.
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
1 BA 275 Quantitative Business Methods Hypothesis Testing Elements of a Test Concept behind a Test Examples Agenda.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Correlation and Regression
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Inference for Linear Regression Conditions for Regression Inference: Suppose we have n observations on an explanatory variable x and a response variable.
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
INTRODUCTORY LINEAR REGRESSION SIMPLE LINEAR REGRESSION - Curve fitting - Inferences about estimated parameter - Adequacy of the models - Linear.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Production Planning and Control. A correlation is a relationship between two variables. The data can be represented by the ordered pairs (x, y) where.
1 BA 275 Quantitative Business Methods Confidence Interval Estimation Estimating the Population Proportion Hypothesis Testing Elements of a Test Concept.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Topic 10 - Linear Regression Least squares principle - pages 301 – – 309 Hypothesis tests/confidence intervals/prediction intervals for regression.
Chapter 5: Regression Analysis Part 1: Simple Linear Regression.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Y X 0 X and Y are not perfectly correlated. However, there is on average a positive relationship between Y and X X1X1 X2X2.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Lecture 10: Correlation and Regression Model.
1 BA 275 Quantitative Business Methods Quiz #3 Statistical Inference: Hypothesis Testing Types of a Test P-value Agenda.
Applied Quantitative Analysis and Practices LECTURE#25 By Dr. Osman Sadiq Paracha.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Lesson Testing the Significance of the Least Squares Regression Model.
BPS - 5th Ed. Chapter 231 Inference for Regression.
Chapter 11 Linear Regression and Correlation. Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Inference about the slope parameter and correlation
Chapter 14 Introduction to Multiple Regression
Chapter 20 Linear and Multiple Regression
Regression Analysis AGEC 784.
CHAPTER 26: Inference for Regression
BA 275 Quantitative Business Methods
SIMPLE LINEAR REGRESSION
Simple Linear Regression
SIMPLE LINEAR REGRESSION
Presentation transcript:

1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case Study: Cost of Manufacturing Computers Simple Linear Regression Agenda

2 The Empirical Rule (p.5)

3 Review Example Suppose that the average hourly earnings of production workers over the past three years were reported to be $12.27, $12.85, and $13.39 with the standard deviations $0.15, $0.18, and $0.23, respectively. The average hourly earnings of the production workers in your company also continued to rise over the past three years from $12.72 in 2002, $13.35 in 2003, to $13.95 in Assume that the distribution of the hourly earnings for all production workers is mound-shaped. Do the earnings in your company become less and less competitive? Why or why not.

4 Review Example Year Industry average Industry std. % increase Company average % increase Z score % % % %2.43

5 The Empirical Rule Generalize the results from the empirical rule. Justify the use of the mound-shaped distribution.

6 Sampling Distribution (p.6) The sampling distribution of a statistic is the probability distribution for all possible values of the statistic that results when random samples of size n are repeatedly drawn from the population. When the sample size is large, what is the sampling distribution of the sample mean / sample proportion / the difference of two samples means / the difference of two sample proportions?  NORMAL !!!

7 Central Limit Theorem (CLT) (p.6)

8 CLT

9 Summary: Sampling Distributions The sampling distribution of a sample mean The sampling distribution of a sample proportion The sampling distribution of the difference between two sample means The sampling distribution of the difference between two sample proportions

10 Standard Deviations

11 Statistical Inference: Estimation Research Question: What is the parameter value? Sample of size n Population Tools (i.e., formulas): Point Estimator Interval Estimator

12 Confidence Interval Estimation (p.7)

13 Example 1: Estimation for the population mean A random sampling of a company’s weekly operating expenses for a sample of 48 weeks produced a sample mean of $5474 and a standard deviation of $764. Construct a 95% confidence interval for the company’s mean weekly expenses. Example 2Example 2: Estimation for the population proportion

14 Statistical Inference: Hypothesis Testing Research Question: Is the claim supported? Sample of size n Population Tools (i.e., formulas): z or t statistic

15 Hypothesis Testing (p.9)

16 Example A bank has set up a customer service goal that the mean waiting time for its customers will be less than 2 minutes. The bank randomly samples 30 customers and finds that the sample mean is 100 seconds. Assuming that the sample is from a normal distribution and the standard deviation is 28 seconds, can the bank safely conclude that the population mean waiting time is less than 2 minutes?

17 Setting Up the Rejection Region Type I Error Type I Error If we reject H 0 (accept H a ) when in fact H 0 is true, this is a Type I error. False Alarm.

18 The P-Value of a Test (p.11)P-Value The p-value or observed significance level is the smallest value of  for which test results are statistically significant. “the conclusion of rejecting H 0 can be reached.”

19 Regression Analysis A technique to examine the relationship between an outcome variable (dependent variable, Y) and a group of explanatory variables (independent variables, X 1, X 2, … X k ). The model allows us to understand (quantify) the effect of each X on Y. It also allows us to predict Y based on X 1, X 2, …. X k.

20 Types of Relationship Linear Relationship Simple Linear Relationship Y =  0 +  1 X +  Multiple Linear Relationship Y =  0 +  1 X 1 +  2 X 2 + … +  k X k +  Nonlinear Relationship Y =  0 exp(  1 X+  Y =  0 +  1 X 1 +  2 X  … etc. Will focus only on linear relationship.

21 Simple Linear Regression Model population sample True effect of X on Y Estimated effect of X on Y Key questions: 1. Does X have any effect on Y? 2. If yes, how large is the effect? 3. Given X, what is the estimated Y?

22 Least Squares Method Least squares line: Least squares line It is a statistical procedure for finding the “best- fitting” straight line. It minimizes the sum of squares of the deviations of the observed values of Y from those predicted Deviations are minimized. Bad fit.

23 Case: Cost of Manufacturing Computers (pp.13 – 45) A manufacturer produces computers. The goal is to quantify cost drivers and to understand the variation in production costs from week to week. The following production variables were recorded: COST: the total weekly production cost (in $millions) UNITS: the total number of units (in 000s) produced during the week. LABOR: the total weekly direct labor cost (in $10K). SWITCH: the total number of times that the production process was re-configured for different types of computers FACTA: = 1 if the observation is from factory A; = 0 if from factory B.

24 Raw Data (p. 14) How many possible regression models can we build?

25 Simple Linear Regression Model (pp. 17 – 26) Question1: Is Labor a significant cost driver? This question leads us to think about the following model: Cost = f(Labor) + . Specifically, Cost =  0 +  1 Labor +  Question 2: How well does this model perform? (How accurate can Labor predict Cost?) This question leads us to try other regression models and make comparison.

26 Initial Analysis (pp. 15 – 16) Summary statistics + Plots (e.g., histograms + scatter plots) + Correlations Things to look for Features of Data (e.g., data range, outliers) do not want to extrapolate outside data range because the relationship is unknown (or un-established). Summary statistics and graphs. Is the assumption of linearity appropriate? Inter-dependence among variables? Any potential problem? Scatter plots and correlations.

27 Correlation (p. 15)  (rho): Population correlation (its value most likely is unknown.) r: Sample correlation (its value can be calculated from the sample.) Correlation is a measure of the strength of linear relationship. Correlation falls between –1 and 1. No linear relationship if correlation is close to 0. But, ….  = –1 –1 <  < 0  = 0 0 <  < 1  = 1 r = –1 –1 < r < 0 r = 0 0 < r < 1 r = 1

28 Correlation (p. 15) Is a  or r? Sample size P-value for H 0 :  = 0 H a :  ≠ 0

29 Fitted Model (Least Squares Line) (p.18) H 0 :  1 = 0 H a :  1 ≠ 0  1 or b 1 ?  0 or b 0 ? S b1 S b0 b1b1 b0b0 Degrees of freedom = n – k – 1, where n = sample size, k = # of Xs. ** Divide the p-value by 2 for one-sided test. Make sure there is at least weak evidence for doing this step.

30 Hypothesis Testing and Confidence Interval Estimation for  (pp. 19 – 20) S b1 S b0 b1b1 b0b0 Degrees of freedom = n – k – 1 k = # of independent variables Q1: Does Labor have any impact on Cost → Hypothesis Testing Q2: If so, how large is the impact? → Confidence Interval Estimation

31 Analysis of Variance (p. 21) - Not very useful in simple regression. - Useful in multiple regression.

32 Sum of SquaresSum of Squares (p.22) S yy = Total variation in Y SSE = remaining variation that can not be explained by the model. SSR = S yy – SSE = variation in Y that has been explained by the model.

33 Fit Statistics (pp. 23 – 24) x =

34 Prediction (pp. 25 – 26) What is the predicted production cost of a given week, say, Week 21 of the year that Labor = 5 (i.e., $50,000)? Point estimate: predicted cost = b 0 + b 1 (5) = (5) = (million dollars). Margin of error? → Prediction Interval What is the average production cost of a typical week that Labor = 5? Point estimate: estimated cost = b 0 + b 1 (5) = (5) = (million dollars). Margin of error? → Confidence Interval

35 Prediction vs. Confidence Intervals (pp. 25 – 26) ☺ ☺ ☺ ☺ ☺ ☻☻ ☻ ☻☻☻ ☺ Variation (margin of error) on both ends seems larger. Implication?

36 Another Simple Regression Model: Cost =  0 +  1 Units +  (p. 27) A better model? Why?

37 Statgraphics Simple Regression Analysis Relate / Simple Regression X = Independent variable, Y = dependent variable For prediction, click on the Tabular option icon and check Forecasts. Right click to change X values. Multiple Regression Analysis Relate / Multiple Regression For prediction, enter values of Xs in the Data Window and leave the corresponding Y blank. Click on the Tabular option icon and check Reports.

38 Normal Probabilities

39 Critical Values of t