Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.

Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803

Topics to be covered Interval estimation for regression parameters. Tests for regression parameters. Prediction intervals. Analysis of Variance Correlation coefficients

Remembering β 1 Here β 1 is the slope of the regression line. The main interest is to check if: –H 0 : β 1 = 0 –H 1 : β 1  0 The main reason to test if β 1 =0 is that, when β 1 =0, there is no linear association between Y and X.

Sampling Distribution for β 1 For the normal error regression model the sampling distribution for b 1 is also normal with This is because b 1 is a linear combination of the observations Y i and since Y i is normally distributed, so is b 1. Remember we can use MSE to estimate  2.

Properties of k i The coefficient k i have the following properties:

The t-distribution Note: is distributed as t(n-2) for the normal error regression model. The 1-  confidence limits for β 1 are: Lets do an example.

Interpretation If the confidence interval does not include zero, we can conclude that β 1  0, the association between Y and X is sometimes described to be a linear statistical association.

Sampling Distribution of β 0 For a normal error regression model, the sampling distribution of b 0 is normal with mean and variance:

Confidence Interval for β 0 Similar to the previous confidence interval, CI for β 0 is : Let us find the 90% confidence interval for β 0 for the ampules data discussed in chapter 1.

Interval Estimation of E{Y h } Let X h denote the level of X for which we wish to estimate the mean response. Here E{Y h } denotes the mean response when we observe X h. Here E{Y h } is normally distributed with mean and variance:

Confidence Interval for E{Y h } The 1-  confidence limits for E{Y h } are:

Prediction interval for Y h(new) We only look at predicting a new Y h when the parameters are known. We denote the level of X for the new trial as X h and the new observation on Y as Y h(new). Assume the regression model applicable for the basic data is still appropriate. Note the distinction between predicting E{Y h } and Y h(new).

Prediction Interval : Example Suppose for the GPA example we know that  0 = 0.1 and  1 =0.95. Thus E{Y} = 0.1 + 0.95 X. It is known that  = 0.12. The admission office is considering an applicant whose high school GPA is X h =3.5. Thus for this student E{Y h }= ?

Example continued The E{Y h } has a standard deviation of 0.12. With the assumption that this is a Normal Regression Model we find the prediction interval as: –E{Y h }  3  –The probability is 0.997 that this prediction interval will give a correct prediction for the applicant with a highs school GPA of 3.5.

Modification As noted the previous PI was quite wide. In general, when the regression parameters of normal error regression model are known, the 1-  prediction limits are: –E{Y h }  z(1-  /2) 

Comment Prediction intervals resemble confidence intervals but are conceptually different. A confidence interval is for a parameter and gives the range in which the parameter will lie. A prediction interval on the other hand is the range an observation will lie.

Analysis of Variance Partitioning Sums of Squares: The total deviation can be partitioned as deviation of fitted regression values around the mean plus the deviation around the fitted regression line.

ANOVA Table Source of Variation Degrees of Freedom (df) Sums of Squares (SS) Mean Sum of Squares (MS) Regression1SSRSSR/1 Errorn-2SSESSE/n-2 Totaln-1SSTO

Modified ANOVA Table The total sums of squares are partitioned in two: –Total uncorrected sums of squares SSTOU = –Correction for mean sums of squares = This splits the degrees of freedom to 1 for the correction for mean and n for SSTOU.

F-Tests Using the ANOVA table we can test is –H 0 : β 1 =0 –H 1 : β 1  0 –We use F * =MSR/MSE as the test statistics. –Output from Minitab gives a p-value, if p-value is less than 0.05, we reject the null hypothesis and conclude that β 1  0 and there is a significant linear relationship between X and Y.

Coefficient of Determination Descriptive measure to describe the linear association between X and Y. SSTO is the measure of uncertainty in predicting Y without using information on X. SSE is the reduced uncertainty in Y while using X. Thus SSTO-SSE = SSR is the reduction in uncertainty due to X.

R2R2 We define the coefficient of determination R 2 as the proportionate reduction in total variation associated with using the predictor variable X. R 2 = SSR / SSTO Larger the R 2 more is the reduction in variation in Y due to X.

Remember R 2 only gives the degree of linear relation between X and Y. A R 2 close to one does not imply good fit. –May not capture the true nature for the relation. A R 2 close to zero does not imply no relation between X and Y. –The relation can be curvilinear.

Coefficient of Correlation The measure of linear association between X and Y is given by The range is –1  r  1. The sign depends upon the slope of the regression line, if the slope is positive r is positive.

Correlation Analysis When X values may not be known constants. When we want to study the effect of X on Y and Y on X. We use Correlation Analysis instead of Regression Analysis. Example: –Study relation between blood pressure and age in humans. –Height and weight of a person.

Bivariate Normal Distribution We will consider a correlation model between two variable Y 1 and Y 2 using the bivariate normal distribution. If Y 1 and Y 2 are jointly normally then the marginal distributions of Y 1 and Y 2 are also normal. The conditional distribution of Y 1 given Y 2 is also normal.

Bivariate normal parameters Note  1 and  2 are the means of Y 1 and Y 2 respectively. Similarly  1 and  2 are the standard deviations for Y 1 and Y 2 respectively.  12 is the correlation coefficient between Y 1 and Y 2.

Parameters of Interest The first parameter represent the intercept of the line of regression of Y 1 on Y 2. The second parameter is the slope of the regression line.

Comments Two distinct regression lines are of interest, one when we have Y 1 regressed on Y 2 and the other with Y 2 regressed on Y 1. In general the regression lines are not the same. Only if the standard deviations are equal will the lines coincide.

Point estimator for  12 The maximum likelihood estimator for  12 is denoted by r 12 and is given by:

To test if  12 =0 Note is  12 =0 this implies that Y 1 and Y 2 are independent. We apply a t-test with the test statistic as given. Reject the null hypothesis is t * > t(1-α/2, n-2)

Spearman Rank Correlation Coefficient Suppose that Y 1 and Y 2 are not Bivariate normal. A non-parametric rank correlation method is applied to make inferences about Y 1 and Y 2 Define R i1 as the rank of Y i1.

Spearman Rank Correlation Coefficient The rank correlation coefficient is defined as In case of ties among some data values, each of the tied values is given the average of the ranks involved.

Spearman Rank Correlation Coefficient H 0 : There is no association between Y 1 and Y 2. H 1 : There is association between Y 1 and Y 2. Similar t-test as before with n-2 degrees of freedom.

Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.

Similar presentations

Presentation on theme: "Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.

Similar presentations

Presentation on theme: "Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803."— Presentation transcript:

Similar presentations

About project

Feedback