Lecture 4 SIMPLE LINEAR REGRESSION.

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Topic 12: Multiple Linear Regression
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
EPI 809/Spring Probability Distribution of Random Error.
Objectives (BPS chapter 24)
Chapter 12 Simple Linear Regression
Topic 3: Simple Linear Regression. Outline Simple linear regression model –Model parameters –Distribution of error terms Estimation of regression parameters.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Multiple regression analysis
Chapter 10 Simple Regression.
REGRESSION Want to predict one variable (say Y) using the other variable (say X) GOAL: Set up an equation connecting X and Y. Linear regression linear.
1 Review of Correlation A correlation coefficient measures the strength of a linear relation between two measurement variables. The measure is based on.
Chapter Topics Types of Regression Models
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
Simple Linear Regression Analysis
Example: Price Analysis for Diamond Rings in Singapore Variables Response Variable: Price in Singapore dollars (Y) Explanatory Variable: Weight of diamond.
This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Simple Linear Regression Analysis
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Inference for regression - Simple linear regression
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
Topic 2: An Example. Leaning Tower of Pisa Construction began in 1173 and by 1178 (2 nd floor), it began to sink Construction resumed in To compensate.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation.
Topic 7: Analysis of Variance. Outline Partitioning sums of squares Breakdown degrees of freedom Expected mean squares (EMS) F test ANOVA table General.
Topic 14: Inference in Multiple Regression. Outline Review multiple linear regression Inference of regression coefficients –Application to book example.
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation Note: Homework Due Thursday.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Introduction to Linear Regression
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
Chapter 14 Inference for Regression AP Statistics 14.1 – Inference about the Model 14.2 – Predictions and Conditions.
1 Lecture 4 Main Tasks Today 1. Review of Lecture 3 2. Accuracy of the LS estimators 3. Significance Tests of the Parameters 4. Confidence Interval 5.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Topic 6: Estimation and Prediction of Y h. Outline Estimation and inference of E(Y h ) Prediction of a new observation Construction of a confidence band.
Topic 30: Random Effects. Outline One-way random effects model –Data –Model –Inference.
Topic 26: Analysis of Covariance. Outline One-way analysis of covariance –Data –Model –Inference –Diagnostics and rememdies Multifactor analysis of covariance.
Topic 4: Statistical Inference. Outline Statistical inference –confidence intervals –significance tests Statistical inference for β 1 Statistical inference.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Simple Linear Regression. Data available : (X,Y) Goal : To predict the response Y. (i.e. to obtain the fitted response function f(X)) Least Squares Fitting.
Environmental Modeling Basic Testing Methods - Statistics III.
1 Experimental Statistics - week 12 Chapter 12: Multiple Regression Chapter 13: Variable Selection Model Checking.
Regression. Height Weight How much would an adult female weigh if she were 5 feet tall? She could weigh varying amounts – in other words, there is a distribution.
1 Statistics 262: Intermediate Biostatistics Regression Models for longitudinal data: Mixed Models.
Chapter 12 Simple Linear Regression n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n Testing.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Topic 22: Inference. Outline Review One-way ANOVA Inference for means Differences in cell means Contrasts.
1 Experimental Statistics - week 12 Chapter 11: Linear Regression and Correlation Chapter 12: Multiple Regression.
1 Experimental Statistics - week 11 Chapter 11: Linear Regression and Correlation.
Chapter 20 Linear and Multiple Regression
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Inference for Regression (Chapter 14) A.P. Stats Review Topic #3
Inference for Regression
Inference on Mean, Var Unknown
CHAPTER 29: Multiple Regression*
6-1 Introduction To Empirical Models
Simple Linear Regression
Simple Linear Regression
Simple Linear Regression
Chapter 14 Inference for Regression
Introduction to Regression
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Lecture 4 SIMPLE LINEAR REGRESSION

Leaning tower of Pisa

Example (1) response variable the lean (Y) explanatory variable time (X) plot fit a line predict the future

Example (2) Basic SAS, see program p1.sas on class website Simple linear regression model Estimation of regression parameters

SAS Data Step data a1; input year lean @@; cards; 75 642 76 644 77 656 78 667 79 673 80 688 81 696 82 698 83 713 84 717 85 725 86 742 87 757 100 . ; data a1p; set a1; if lean ne .;

SAS Proc Print proc print data=a1; run;

OBS YEAR LEAN 1 75 642 2 76 644 3 77 656 4 78 667 5 79 673 6 80 688 7 81 696 8 82 698 9 83 713 10 84 717 11 85 725 12 86 742 13 87 757 14 100 .

SAS Proc Gplot symbol1 v=circle i=sm70s; proc gplot data=a1p; plot lean*year; run; symbol1 v=circle i=rl;

SAS Proc Reg proc reg data=a1; model lean=year/p r; output out=a2 p=pred r=resid; id year;

Parameter Standard Variable DF Estimate Error INTERCEP 1 -61.120879 25.12981850 YEAR 1 9.318681 0.30991420 T for H0: Parameter=0 Prob > |T| -2.432 0.0333 30.069 0.0001

Dep Var Predict Obs YEAR LEAN Value Residual 1 75 642.0 637.8 4.2198 2 76 644.0 647.1 -3.0989 3 77 656.0 656.4 -0.4176 4 78 667.0 665.7 1.2637 5 79 673.0 675.1 -2.0549 6 80 688.0 684.4 3.6264 7 81 696.0 693.7 2.3077 8 82 698.0 703.0 -5.0110 9 83 713.0 712.3 0.6703 10 84 717.0 721.6 -4.6484 11 85 725.0 731.0 -5.9670 12 86 742.0 740.3 1.7143 13 87 757.0 749.6 7.3956 14 100 . 870.7

Data for Simple Linear Regression Yi the response variable Xi the explanatory variable for cases i = 1 to n

Simple Linear Regression Model Yi = β0 + β1Xi + ξi Yi is the value of the response variable for the ith case β0 is the intercept β1 is the slope

Simple Linear Regression Model (2) Xi is the value of the explanatory variable for the ith case ξi is a normally distributed random error with mean 0 and variance σ2

Simple Linear Regression Model (3) Parameters β0 the intercept β1 the slope σ2 the variance of the error term

Features of Simple Linear Regression Model Yi = β0 + β1Xi + ξi E (Yi|Xi) = β0 + β1Xi Var(Yi |Xi) = var(ξi) = σ2

Fitted Regression Equation and Residuals Ŷi = b0 + b1Xi ei = Yi – Ŷi , residual ei = Yi – (b0 + b1Xi)

Plot the residuals proc gplot data=a2; plot resid*year; where lean ne .; run;

Least Squares minimize Σ(Yi – (b0 + b1Xi) )2 =∑ei2 use calculus take derivative with respect to b0 and with respect to b1 set the two resulting equations equal to zero and solve for b0 and b1

Least Squares Solution These are also maximum likelihood estimators

Maximum Likelihood

Estimation of σ2

Parameter Standard Variable DF Estimate Error INTERCEP 1 -61.120879 25.12981850 YEAR 1 9.318681 0.30991420 Sum of Mean Source DF Squares Square Model 1 15804.48352 15804.48352 Error 11 192.28571 17.48052 C Total 12 15996.76923 Root MSE 4.18097 Dep Mean 693.69231 C.V. 0.60271

Theory for β1 Inference b1 ~ Normal(β1,σ2(b1)) where σ2(b1)=σ2 /Σ(Xi – )2 t=(b1-β1)/s(b1) where s2(b1)=s2 /Σ(Xi – )2 t ~ t(n-2)

Confidence Interval for β1 b1 ± tcs(b1) where tc = t(1-α/2,n-2), the upper (1-α/2)100 percentile of the t distribution with n-2 degrees of freedom 1-α is the confidence level

Significance tests for β1 H0: β1 = 0, Ha: β1 0 t = (b1-0)/s(b1) reject H0 if |t| tc, where tc = t(1-α/2,n-2) P = Prob(|z| |t|), where z~t(n-2)

Theory for β0 Inference b0 ~ Normal(β0,σ2(b0)) where σ2(b0)= t=(b0-β0)/s(b0) for s( ), replace σ2 by s2 t ~ t(n-2)

Confidence Interval for β0 b0 ± tcs(b0) where tc = t(1-α/2,n-2), the upper (1-α/2)100 percentile of the t distribution with n-2 degrees of freedom 1-α is the confidence level

Significance tests for β0 H0: β0 = β00, Ha: β0 β00 t = (b0- β00)/s(b0) reject H0 if |t| tc, where tc= t(1-α/2,n-2) P = Prob(|z| |t|), where z~t(n-2)

Notes (1) The normality of b0 and b1 follows from the fact that each of these is a linear combination of the Yi , which are independent normal variables

Notes (2) Usually the CI and significance test for β0 are not of interest If the ξi are not normal but are approximately normal, then the CIs and significance tests are generally reasonable approximations

Notes (3) These procedures can easily be modified to produce one-sided significance tests Because σ2(b1)=σ2 /Σ(Xi – )2, we can make this quantity small by making Σ(Xi – )2 large.

SAS Proc Reg proc reg data=a1; model lean=year/clb;

Parameter Standard Variable DF Estimate Error Intercept 1 -61.12088 25.12982 year 1 9.31868 0.30991 t Value Pr > |t| 95% Confidence Limits -2.43 0.0333 -116.43124 -5.81052 30.07 <.0001 8.63656 10.00080

Power The power of a significance test is the probability that the null hypothesis is to be rejected when, in fact, it is false. This probability depends on the particular value of the parameter in the alternative space.

Power for β1 (1) H0: β1 = 0, Ha: β1 0 t =b1/s(b1) tc = t(1-α/2,n-2) for α=.05 , we reject H0 when |t| tc so we need to find P(|t| tc) for arbitrary values of β1 0 when β1 = 0, the calculation gives ?

Power for β1 (2) t~ t(n-2,δ) – noncentral t distribution δ= β1/ σ(b1) – noncentrality parameter We need to assume values for σ2(b1)=σ2 /Σ(Xi – )2 and n

Example of Power Calculations for β1 we assume σ2=2500 , n=25 and Σ(Xi – )2 =19800 so we have σ2(b1)=σ2 /Σ(Xi – )2= 0.1263

Example of Power (2) consider β1 = 1.5 we now can calculate δ= β1/ σ(b1) t~ t(n-2,δ), we want to find P(|t| tc) we use a function that calculates the cumulative distribution function for the noncentral t distribution

data a1; n=25; sig2=2500; ssx=19800; alpha=.05; sig2b1=sig2/ssx; df=n-2; beta1=1.5; delta=beta1/sqrt(sig2b1); tc=tinv(1-alpha/2,df); power=1-probt(tc,df,delta) +probt(-tc,df,delta); output; proc print data=a1; run;

Obs n sig2 ssx alpha 1 25 2500 19800 0.05 sig2b1 df beta1 delta 0.12626 23 1.5 4.22137 tc power 2.06866 0.98121

data a2; n=25; sig2=2500; ssx=19800; alpha=.05; sig2b1=sig2/ssx; df=n-2; tc=tinv(1-alpha/2,df); do beta1=-2.0 to 2.0 by .05; delta=beta1/sqrt(sig2b1); power=1-probt(tc,df,delta) +probt(-tc,df,delta); output; end;

title1 'Power for the slope in simple linear regression'; symbol1 v=none i=join; proc gplot data=a2; plot power*beta1; proc print data=a2; run;