1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation.

Slides:



Advertisements
Similar presentations
Forecasting Using the Simple Linear Regression Model and Correlation
Advertisements

Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Simple Linear Regression. G. Baker, Department of Statistics University of South Carolina; Slide 2 Relationship Between Two Quantitative Variables If.
EPI 809/Spring Probability Distribution of Random Error.
Simple Linear Regression and Correlation
Chapter 12 Simple Linear Regression
Creating Graphs on Saturn GOPTIONS DEVICE = png HTITLE=2 HTEXT=1.5 GSFMODE = replace; PROC REG DATA=agebp; MODEL sbp = age; PLOT sbp*age; RUN; This will.
Multiple regression analysis
Statistics for Business and Economics
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
1 Pertemuan 13 Uji Koefisien Korelasi dan Regresi Matakuliah: A0392 – Statistik Ekonomi Tahun: 2006.
SIMPLE LINEAR REGRESSION
1 Review of Correlation A correlation coefficient measures the strength of a linear relation between two measurement variables. The measure is based on.
Chapter Topics Types of Regression Models
Linear Regression and Correlation Analysis
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Simple Linear Regression Analysis
SIMPLE LINEAR REGRESSION
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3.
1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Chapter 7 Forecasting with Simple Regression
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Simple Linear Regression. Introduction In Chapters 17 to 19, we examine the relationship between interval variables via a mathematical equation. The motivation.
Checking Regression Model Assumptions NBA 2013/14 Player Heights and Weights.
Correlation & Regression
Correlation and Linear Regression
Regression and Correlation Methods Judy Zhong Ph.D.
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
Statistics for Business and Economics Chapter 10 Simple Linear Regression.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
12a - 1 © 2000 Prentice-Hall, Inc. Statistics Multiple Regression and Model Building Chapter 12 part I.
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation Note: Homework Due Thursday.
6-3 Multiple Regression Estimation of Parameters in Multiple Regression.
Regression Examples. Gas Mileage 1993 SOURCES: Consumer Reports: The 1993 Cars - Annual Auto Issue (April 1993), Yonkers, NY: Consumers Union. PACE New.
Regression For the purposes of this class: –Does Y depend on X? –Does a change in X cause a change in Y? –Can Y be predicted from X? Y= mX + b Predicted.
© 2001 Prentice-Hall, Inc. Statistics for Business and Economics Simple Linear Regression Chapter 10.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Lecture 10: Correlation and Regression Model.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Simple Linear Regression. Data available : (X,Y) Goal : To predict the response Y. (i.e. to obtain the fitted response function f(X)) Least Squares Fitting.
1 Experimental Statistics - week 12 Chapter 12: Multiple Regression Chapter 13: Variable Selection Model Checking.
Experimental Statistics - week 3
Chapter 12 Simple Linear Regression n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n Testing.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
1 Experimental Statistics - week 13 Multiple Regression Miscellaneous Topics.
Experimental Statistics - week 9
Lecture 10 Introduction to Linear Regression and Correlation Analysis.
1 Experimental Statistics - week 12 Chapter 11: Linear Regression and Correlation Chapter 12: Multiple Regression.
1 Experimental Statistics - week 11 Chapter 11: Linear Regression and Correlation.
Chapter 11 Linear Regression and Correlation. Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and.
Correlation and Simple Linear Regression
6-1 Introduction To Empirical Models
Correlation and Simple Linear Regression
SIMPLE LINEAR REGRESSION
Simple Linear Regression and Correlation
SIMPLE LINEAR REGRESSION
Presentation transcript:

1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation

2 Example Probability Plots for Various Data Shapes

3 December 2000 Unemployment Rates in the 50 States

4 Distribution of Monthly Returns for all U.S. Common Stocks from

5 Distribution of Individual Salaries of Cincinnati Reds Players on Opening Day of the 2000 Season

6 Back to Correlation and Regression

Association Between Two Variables Regression Analysis -- we want to predict the dependent variable using the independent variable Correlation Analysis -- measures the strength of the linear association between 2 quantitative variables

8 Calculating the Correlation Coefficient

9 Notation: So --

10 Study Time Exam (hours) Score (X) (Y) The data below are the study times and the test scores on an exam given over the material covered during the two weeks. Find r

11 r calculated from a set of data is an estimate of a theoretical parameter  or  yx Population Parameter  -- if  = 0 then there is no linear relationship between the two variables -- in the same way the sample average is an estimate of the population mean 

12 Rejection Region Test Statistic t > t  2 or t <  t  /2 df  n  2 Testing Statistical Significance of Correlation Coefficient

13 Study Time Exam (hours) Score (X) (Y) The data below are the study times and the test scores on an exam given over the material covered during the two weeks. Test H 0 :    H a :   

14 Correlation Between Study Time and Score Rejection Region: Test Statistic: Conclusion: P-value: Test H 0 :   H a :   

15 The CORR Procedure 2 Variables: score time Simple Statistics Variable N Mean Std Dev Sum Minimum Maximum score time Pearson Correlation Coefficients, N = 8 Prob > |r| under H0: Rho=0 score time score time Study Time by Score

16

Regression Analysis

18 Notation Theoretical Model Regression line -- these are evaluated from the data

19 Data we write

20

21 Least Squares Estimates Computation Formula

22 Study Time Exam (hours) Score (X) (Y) The data below are the study times and the test scores on an exam given over the material covered during the two weeks. Find the equation of the regression line for prediction exam score from study time.

23 Calculations: Study Time Data Equation of Regression Line:

24 The GLM Procedure Dependent Variable: score Sum of Source DF Squares Mean Square F Value Pr > F Model Error Corrected Total R-Square Coeff Var Root MSE score Mean Source DF Type I SS Mean Square F Value Pr > F time Source DF Type III SS Mean Square F Value Pr > F time Standard Parameter Estimate Error t Value Pr > |t| Intercept <.0001 time PROC REG; MODEL score=time; RUN; YX

25 To Predict Y for a given x: -- plug x into the regression equation and solve for Y Example: If a student studied 10 hours, then the predicted score would be

26 Notes: - is called the sum-of-squared residuals -- SS(Residuals) -- SSE is the estimate of the error variance

27 Testing for Significance of the Regression If knowing x is of absolutely no help in predicting Y, then it seems reasonable that the regression line for predicting Y from x should have slope ________. That is, to test for a “significant regression” we test Test Statistic Rejection Region: where t has n  2 df

28 Study Time Data

29 The GLM Procedure Dependent Variable: score Sum of Source DF Squares Mean Square F Value Pr > F Model Error Corrected Total R-Square Coeff Var Root MSE score Mean Source DF Type I SS Mean Square F Value Pr > F time Source DF Type III SS Mean Square F Value Pr > F time Standard Parameter Estimate Error t Value Pr > |t| Intercept <.0001 time PROC GLM; MODEL score=time; RUN;

30 The CORR Procedure 2 Variables: score time Simple Statistics Variable N Mean Std Dev Sum Minimum Maximum score time Pearson Correlation Coefficients, N = 8 Prob > |r| under H0: Rho=0 score time score time Study Time by Score

31 Note: The t values for testing H 0 :  and for testing H 0 :    are the same. - both tests depend on the normality assumption

32 Recall: One-sample Test about a Mean In general: df = n – 1

33 (1 –  )100% Confidence Interval for  df = n – 1

34 Similarly

35 df = n – 2 Can also find confidence interval for   - not as useful Alternative form

36 Prediction Setting:

37 2 Intervals 1. Confidence Interval on  Y|x n+1

38

39

40 2. Prediction Interval for y n+1 Notes:

41

ExtrapolationExtrapolation l Predicting beyond the range of predictor variables

43 Predict the price of a car that weighs 3500 lbs. - extrapolation would say it’s about $16,000

44 Predict the price of a car that weighs 3500 lbs. - extrapolation would say it’s about $16,000 oops!!!

ExtrapolationExtrapolation l Predicting beyond the range of predictor variables NOT a good idea

46 Analysis of Variance Approach Mathematical Fact SS(Total) = SS(Regression) + SS(Residuals) p. 649 (SS “explained” by the model) (SS “unexplained” by the model) (S yy )

47 Plot of Production vs Cost

48 SS(???)

49 SS(???)

50 SS(???)

51 measures the proportion of the variability in Y that is explained by the regression on X

YXX

53 The REG Procedure Dependent Variable: y Sum of Source DF Squares Model Error Corrected Total The REG Procedure Dependent Variable: y Sum of Source DF Squares Model =SS(reg) Error =SS(Res) Corrected Total =SS(Total)

54 RECALL Theoretical Model Regression line residuals

55 Residual Analysis Examination of residuals to help determine if: - assumptions are met - regression model is appropriate Residual Plot: Plot of x vs residuals

56

57

58 Study Time Data PROC REG; MODEL score=time; OUTPUT out=new r=resid; RUN; PROC GPLOT; TITLE 'Plot of Residuals'; PLOT resid*time; RUN;

59 Average Height of Girls by Age

60 Average Height of Girls by Age

61 Residual Plot

62 Residual Analysis Examination of residuals to help determine if: - assumptions are met - regression model is appropriate Residual Plot: - plot of x vs residuals Normality of Residuals: - probability plot - histogram