Topic 6: Estimation and Prediction of Y h. Outline Estimation and inference of E(Y h ) Prediction of a new observation Construction of a confidence band.

Slides:



Advertisements
Similar presentations
Chapter 9: Simple Regression Continued
Advertisements

Chapter 12 Inference for Linear Regression
Topic 9: Remedies.
Topic 15: General Linear Tests and Extra Sum of Squares.
EPI 809/Spring Probability Distribution of Random Error.
Objectives (BPS chapter 24)
Topic 3: Simple Linear Regression. Outline Simple linear regression model –Model parameters –Distribution of error terms Estimation of regression parameters.
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
Multiple regression analysis
Matrix A matrix is a rectangular array of elements arranged in rows and columns Dimension of a matrix is r x c  r = c  square matrix  r = 1  (row)
Linear Regression with One Regression
4. Multiple Regression Analysis: Estimation -Most econometric regressions are motivated by a question -ie: Do Canadian Heritage commercials have a positive.
Simple Linear Regression Statistics 700 Week of November 27.
More Linear Regression Outliers, Influential Points, and Confidence Interval Construction.
Simple Linear Regression Analysis
Example: Price Analysis for Diamond Rings in Singapore Variables Response Variable: Price in Singapore dollars (Y) Explanatory Variable: Weight of diamond.
This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3.
Statistics 350 Lecture 17. Today Last Day: Introduction to Multiple Linear Regression Model Today: More Chapter 6.
Chapter 12 Section 1 Inference for Linear Regression.
Simple Linear Regression Analysis
Simple Linear Regression. Introduction In Chapters 17 to 19, we examine the relationship between interval variables via a mathematical equation. The motivation.
Correlation & Regression
Topic 10: Miscellaneous Topics. Outline Joint estimation of β 0 and β 1 Multiplicity Regression through the origin Measurement error Inverse predictions.
Inference for regression - Simple linear regression
Prediction concerning Y variable. Three different research questions What is the mean response, E(Y h ), for a given level, X h, of the predictor variable?
Topic 16: Multicollinearity and Polynomial Regression.
Topic 28: Unequal Replication in Two-Way ANOVA. Outline Two-way ANOVA with unequal numbers of observations in the cells –Data and model –Regression approach.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Topic 2: An Example. Leaning Tower of Pisa Construction began in 1173 and by 1178 (2 nd floor), it began to sink Construction resumed in To compensate.
Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation.
Topic 7: Analysis of Variance. Outline Partitioning sums of squares Breakdown degrees of freedom Expected mean squares (EMS) F test ANOVA table General.
Lecture 4 SIMPLE LINEAR REGRESSION.
Topic 14: Inference in Multiple Regression. Outline Review multiple linear regression Inference of regression coefficients –Application to book example.
Regression For the purposes of this class: –Does Y depend on X? –Does a change in X cause a change in Y? –Can Y be predicted from X? Y= mX + b Predicted.
Topic 17: Interaction Models. Interaction Models With several explanatory variables, we need to consider the possibility that the effect of one variable.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Inference for Regression Chapter 14. Linear Regression We can use least squares regression to estimate the linear relationship between two quantitative.
6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
Topic 13: Multiple Linear Regression Example. Outline Description of example Descriptive summaries Investigation of various models Conclusions.
Topic 23: Diagnostics and Remedies. Outline Diagnostics –residual checks ANOVA remedial measures.
Regression Analysis Part C Confidence Intervals and Hypothesis Testing
Topic 30: Random Effects. Outline One-way random effects model –Data –Model –Inference.
Topic 25: Inference for Two-Way ANOVA. Outline Two-way ANOVA –Data, models, parameter estimates ANOVA table, EMS Analytical strategies Regression approach.
Topic 26: Analysis of Covariance. Outline One-way analysis of covariance –Data –Model –Inference –Diagnostics and rememdies Multifactor analysis of covariance.
Agresti/Franklin Statistics, 1 of 88  Section 11.4 What Do We Learn from How the Data Vary Around the Regression Line?
Topic 4: Statistical Inference. Outline Statistical inference –confidence intervals –significance tests Statistical inference for β 1 Statistical inference.
Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis.
AP STATISTICS LESSON 14 – 1 ( DAY 1 ) INFERENCE ABOUT THE MODEL.
Applied Quantitative Analysis and Practices LECTURE#25 By Dr. Osman Sadiq Paracha.
Topic 24: Two-Way ANOVA. Outline Two-way ANOVA –Data –Cell means model –Parameter estimates –Factor effects model.
Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects.
ANOVA, Regression and Multiple Regression March
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Topic 29: Three-Way ANOVA. Outline Three-way ANOVA –Data –Model –Inference.
Topic 27: Strategies of Analysis. Outline Strategy for analysis of two-way studies –Interaction is not significant –Interaction is significant What if.
Topic 22: Inference. Outline Review One-way ANOVA Inference for means Differences in cell means Contrasts.
Topic 21: ANOVA and Linear Regression. Outline Review cell means and factor effects models Relationship between factor effects constraint and explanatory.
1 Experimental Statistics - week 12 Chapter 11: Linear Regression and Correlation Chapter 12: Multiple Regression.
1 Experimental Statistics - week 11 Chapter 11: Linear Regression and Correlation.
Inference for Regression (Chapter 14) A.P. Stats Review Topic #3
Simultaneous Inferences and Other Regression Topics
Data Analysis and Statistical Software I ( ) Quarter: Autumn 02/03
LESSON 24: INFERENCES USING REGRESSION
6-1 Introduction To Empirical Models
Chapter 14 Inference for Regression
Inference for Regression
Inference about the Slope and Intercept
Pearson Correlation and R2
Presentation transcript:

Topic 6: Estimation and Prediction of Y h

Outline Estimation and inference of E(Y h ) Prediction of a new observation Construction of a confidence band for the entire regression line

Estimation of E(Y h ) E(Y h ) = μ h = β 0 + β 1 X h, the mean value of Y for the subpopulation with X=X h We will estimate E(Y h ) by KNNL use for this estimate, see equation (2.28) on pp 52

Theory for Estimation of E(Y h ) is Normal with mean μ h and variance The Normality is a consequence of the fact that b 0 + b 1 X h is a linear combination of Y i ’s See KNNL pp for details

Application of the Theory We estimate σ 2 ( ) by It then follows that Details for confidence intervals and significance tests are consequences

95% Confidence Interval for E(Y h ) ± t c s( ) where t c = t(.975, n-2) NOTE: significance tests can be constructed but they are rarely used in practice

Toluca Company Example (pg 19) Manufactures refrigeration equipment One replacement part manufactured in lots of varying sizes Company wants to determine the optimum lot size To do this, company needs to first describe the relationship between work hours and lot size

Scatterplot w/ regr line

***Generating the data set***; data toluca; infile ‘../data/CH01TA01.txt'; input lotsize hours; data other; size=65; output; size=100; output; data toluca1; set toluca other; proc print data=toluca1; run; SAS CODE

***Generating the confidence intervals for all values of X in the data set***; proc reg data=toluca1; model hours=size/clm; id lotsize; run; SAS CODE clm option generates confidence intervals for the mean

Parameter Estimates VariableDF Parameter Estimate Standard Errort ValuePr > |t| Intercept lotsize <.0001 Output Statistics Obslotsize Dependent Variable Predicted Value Std Error Mean Predict95% CL Mean

Notes Standard error affected by how far X h is from (see Figure 2.6) Recall teeter-totter idea…a change in the slope has bigger impact on Y as you move away from

Prediction of Y h(new) Want to predict value for a new observation at X=X h Model: Y h(new) = β 0 + β 1 X h +  Since E(e)=0 same value as for E(Y h ) Prediction interval, however, relies heavily on assumption that e are Normally distributed Note!!

Prediction of Y h(new) Var(Y h(new) )=Var( )+Var(ξ ) Then follows that

Notes Procedure can be modified for the mean of m observations at X=X h (see 2.39a and 239b on page 60) Standard error affected by how far X h is from (see Figure 2.6)

***Generating the prediction intervals for all values of X in data set***; proc reg data=toluca1; model hours=lotsize/cli; id lotsize; run; SAS CODE cli option generates prediction interval for a new observation

Output Statistics Obslotsize Dependent Variable Predicted Value Std Error Mean Predict95% CL Predict These are wrong…same as before. Does not include variability about regression line

Notes The standard error (Std Error Mean Predict)given in this output is the standard error of not s(pred) The prediction interval is correct and wider than the previous confidence interval

Notes To get correct standard error need to add the variance about the regression line

Confidence band for regression line ± Ws( ) where W 2 =2F(1-α; 2, n-2) This gives combined “confidence intervals” for all X h Boundary values of confidence bands define a hyperbola Will be wider at X h than single CI

Confidence band for regression line Theory comes from the joint confidence region for (β 0, β 1 ) which is an ellipse (Stat 524) We can find an alpha for t c that gives the same results We find W 2 and then find the alpha for t c that will give W = t c

data a1; n=25; alpha=.10; dfn=2; dfd=n-2; tsingle=tinv(1-alpha/2,dfd); w2=2*finv(1-alpha,dfn,dfd); w=sqrt(w2); alphat=2*(1-probt(w,dfd)); t_c=tinv(1-alphat/2,dfd); output; proc print data=a1; run; SAS CODE

nalphadfndfdtsinglew2walphatt_c SAS OUTPUT Used for single 90% CI Used for 90% confidence band

symbol1 v=circle i=rlclm97; proc gplot data=toluca; plot hours*lotsize; run; SAS CODE

Estimation of E(Y h ) and Prediction of Y h = b 0 + b 1 X h

symbol1 v=circle i=rlclm95; proc gplot data=toluca; plot hours*lotsize; symbol1 v=circle i=rlcli95; proc gplot data=toluca; plot hours*lotsize; run; SAS CODE Confidence intervals Prediction intervals

Confidence band

Confidence intervals

Prediction intervals

Background Reading Program topic6.sas has the code for the various plots and calculations; Sections 2.7, 2.8, and 2.9