Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Regression and correlation methods
Chapter 14, part D Statistical Significance. IV. Model Assumptions The error term is a normally distributed random variable and The variance of  is constant.
Inference for Regression
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Objectives (BPS chapter 24)
1 Chapter 2 Simple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Chapter 12 Simple Linear Regression
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
The Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.
Correlation and Regression. Spearman's rank correlation An alternative to correlation that does not make so many assumptions Still measures the strength.
Chapter 10 Simple Regression.
Chapter 12 Simple Regression
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Simple Linear Regression Basic Business Statistics 11 th Edition.
The Simple Regression Model
Final Review Session.
SIMPLE LINEAR REGRESSION
Linear Regression and Correlation Analysis
Chapter 11 Multiple Regression.
Simple Linear Regression Analysis
REGRESSION AND CORRELATION
Introduction to Probability and Statistics Linear Regression and Correlation.
SIMPLE LINEAR REGRESSION
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
1 732G21/732A35/732G28. Formal statement  Y i is i th response value  β 0 β 1 model parameters, regression parameters (intercept, slope)  X i is i.
Simple Linear Regression and Correlation
Introduction to Regression Analysis, Chapter 13,
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Descriptive measures of the strength of a linear association r-squared and the (Pearson) correlation coefficient r.
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Regression Analysis (2)
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Simple Linear Regression Models
Econ 3790: Business and Economics Statistics
CHAPTER 14 MULTIPLE REGRESSION
INTRODUCTORY LINEAR REGRESSION SIMPLE LINEAR REGRESSION - Curve fitting - Inferences about estimated parameter - Adequacy of the models - Linear.
1 1 Slide Simple Linear Regression Coefficient of Determination Chapter 14 BA 303 – Spring 2011.
Introduction to Probability and Statistics Chapter 12 Linear Regression and Correlation.
Introduction to Linear Regression
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Chapter 11 Linear Regression Straight Lines, Least-Squares and More Chapter 11A Can you pick out the straight lines and find the least-square?
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Introduction to Probability and Statistics Thirteenth Edition Chapter 12 Linear Regression and Correlation.
An alternative approach to testing for a linear association The Analysis of Variance (ANOVA) Table.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Chapter 12 Simple Linear Regression n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n Testing.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Chapter 26: Inference for Slope. Height Weight How much would an adult female weigh if she were 5 feet tall? She could weigh varying amounts – in other.
Regression Analysis Presentation 13. Regression In Chapter 15, we looked at associations between two categorical variables. We will now focus on relationships.
The “Big Picture” (from Heath 1995). Simple Linear Regression.
Analysis of variance approach to regression analysis … an (alternative) approach to testing for a linear association.
INTRODUCTION TO MULTIPLE REGRESSION MULTIPLE REGRESSION MODEL 11.2 MULTIPLE COEFFICIENT OF DETERMINATION 11.3 MODEL ASSUMPTIONS 11.4 TEST OF SIGNIFICANCE.
Applied Regression Analysis BUSI 6220
Inference about the slope parameter and correlation
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Chapter 13 Simple Linear Regression
6-1 Introduction To Empirical Models
Simple Linear Regression
Simple Linear Regression
SIMPLE LINEAR REGRESSION
Simple Linear Regression
SIMPLE LINEAR REGRESSION
Inference about the Slope and Intercept
Chapter 13 Simple Linear Regression
Presentation transcript:

Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803

Topics to be covered Interval estimation for regression parameters. Tests for regression parameters. Prediction intervals. Analysis of Variance Correlation coefficients

Remembering β 1 Here β 1 is the slope of the regression line. The main interest is to check if: –H 0 : β 1 = 0 –H 1 : β 1  0 The main reason to test if β 1 =0 is that, when β 1 =0, there is no linear association between Y and X.

Sampling Distribution for β 1 For the normal error regression model the sampling distribution for b 1 is also normal with This is because b 1 is a linear combination of the observations Y i and since Y i is normally distributed, so is b 1. Remember we can use MSE to estimate  2.

Properties of k i The coefficient k i have the following properties:

The t-distribution Note: is distributed as t(n-2) for the normal error regression model. The 1-  confidence limits for β 1 are: Lets do an example.

Interpretation If the confidence interval does not include zero, we can conclude that β 1  0, the association between Y and X is sometimes described to be a linear statistical association.

Sampling Distribution of β 0 For a normal error regression model, the sampling distribution of b 0 is normal with mean and variance:

Confidence Interval for β 0 Similar to the previous confidence interval, CI for β 0 is : Let us find the 90% confidence interval for β 0 for the ampules data discussed in chapter 1.

Interval Estimation of E{Y h } Let X h denote the level of X for which we wish to estimate the mean response. Here E{Y h } denotes the mean response when we observe X h. Here E{Y h } is normally distributed with mean and variance:

Confidence Interval for E{Y h } The 1-  confidence limits for E{Y h } are:

Prediction interval for Y h(new) We only look at predicting a new Y h when the parameters are known. We denote the level of X for the new trial as X h and the new observation on Y as Y h(new). Assume the regression model applicable for the basic data is still appropriate. Note the distinction between predicting E{Y h } and Y h(new).

Prediction Interval : Example Suppose for the GPA example we know that  0 = 0.1 and  1 =0.95. Thus E{Y} = X. It is known that  = The admission office is considering an applicant whose high school GPA is X h =3.5. Thus for this student E{Y h }= ?

Example continued The E{Y h } has a standard deviation of With the assumption that this is a Normal Regression Model we find the prediction interval as: –E{Y h }  3  –The probability is that this prediction interval will give a correct prediction for the applicant with a highs school GPA of 3.5.

Modification As noted the previous PI was quite wide. In general, when the regression parameters of normal error regression model are known, the 1-  prediction limits are: –E{Y h }  z(1-  /2) 

Comment Prediction intervals resemble confidence intervals but are conceptually different. A confidence interval is for a parameter and gives the range in which the parameter will lie. A prediction interval on the other hand is the range an observation will lie.

Analysis of Variance Partitioning Sums of Squares: The total deviation can be partitioned as deviation of fitted regression values around the mean plus the deviation around the fitted regression line.

ANOVA Table Source of Variation Degrees of Freedom (df) Sums of Squares (SS) Mean Sum of Squares (MS) Regression1SSRSSR/1 Errorn-2SSESSE/n-2 Totaln-1SSTO

Modified ANOVA Table The total sums of squares are partitioned in two: –Total uncorrected sums of squares SSTOU = –Correction for mean sums of squares = This splits the degrees of freedom to 1 for the correction for mean and n for SSTOU.

F-Tests Using the ANOVA table we can test is –H 0 : β 1 =0 –H 1 : β 1  0 –We use F * =MSR/MSE as the test statistics. –Output from Minitab gives a p-value, if p-value is less than 0.05, we reject the null hypothesis and conclude that β 1  0 and there is a significant linear relationship between X and Y.

Coefficient of Determination Descriptive measure to describe the linear association between X and Y. SSTO is the measure of uncertainty in predicting Y without using information on X. SSE is the reduced uncertainty in Y while using X. Thus SSTO-SSE = SSR is the reduction in uncertainty due to X.

R2R2 We define the coefficient of determination R 2 as the proportionate reduction in total variation associated with using the predictor variable X. R 2 = SSR / SSTO Larger the R 2 more is the reduction in variation in Y due to X.

Remember R 2 only gives the degree of linear relation between X and Y. A R 2 close to one does not imply good fit. –May not capture the true nature for the relation. A R 2 close to zero does not imply no relation between X and Y. –The relation can be curvilinear.

Coefficient of Correlation The measure of linear association between X and Y is given by The range is –1  r  1. The sign depends upon the slope of the regression line, if the slope is positive r is positive.

Correlation Analysis When X values may not be known constants. When we want to study the effect of X on Y and Y on X. We use Correlation Analysis instead of Regression Analysis. Example: –Study relation between blood pressure and age in humans. –Height and weight of a person.

Bivariate Normal Distribution We will consider a correlation model between two variable Y 1 and Y 2 using the bivariate normal distribution. If Y 1 and Y 2 are jointly normally then the marginal distributions of Y 1 and Y 2 are also normal. The conditional distribution of Y 1 given Y 2 is also normal.

Bivariate normal parameters Note  1 and  2 are the means of Y 1 and Y 2 respectively. Similarly  1 and  2 are the standard deviations for Y 1 and Y 2 respectively.  12 is the correlation coefficient between Y 1 and Y 2.

Parameters of Interest The first parameter represent the intercept of the line of regression of Y 1 on Y 2. The second parameter is the slope of the regression line.

Comments Two distinct regression lines are of interest, one when we have Y 1 regressed on Y 2 and the other with Y 2 regressed on Y 1. In general the regression lines are not the same. Only if the standard deviations are equal will the lines coincide.

Point estimator for  12 The maximum likelihood estimator for  12 is denoted by r 12 and is given by:

To test if  12 =0 Note is  12 =0 this implies that Y 1 and Y 2 are independent. We apply a t-test with the test statistic as given. Reject the null hypothesis is t * > t(1-α/2, n-2)

Spearman Rank Correlation Coefficient Suppose that Y 1 and Y 2 are not Bivariate normal. A non-parametric rank correlation method is applied to make inferences about Y 1 and Y 2 Define R i1 as the rank of Y i1.

Spearman Rank Correlation Coefficient The rank correlation coefficient is defined as In case of ties among some data values, each of the tied values is given the average of the ranks involved.

Spearman Rank Correlation Coefficient H 0 : There is no association between Y 1 and Y 2. H 1 : There is association between Y 1 and Y 2. Similar t-test as before with n-2 degrees of freedom.