 # Lecture 5 Correlation and Regression

## Presentation on theme: "Lecture 5 Correlation and Regression"— Presentation transcript:

Lecture 5 Correlation and Regression
Dr Peter Wheale

A Scatter Plot of Monthly Returns

Interpretation of Correlation Coefficient
Correlation Interpretation coefficient (r) (r) r = perfect positive correlation 0 < r < positive linear relationship r = no linear relationship r = perfect negative correlation -1 < r < negative linear relationship

Scatter Plots and Correlation

Covariance of Rates of Return
Example: Calculate the covariance between the returns on the two stocks indicated below:

Covariance Using Historical Data
Σ = Cov = / 2 = R1 = 0.05 R2 = 0.07

Sample Correlation Coefficient
Correlation, ρ, is a standardized measure of covariance and is bounded by +1 and –1 Example: The covariance of returns on two assets is and σ1= 7% and σ2= 11%. Calculate ρ1,2.

Testing H0: Correlation = 0
The test of whether the true correlation between two random variables is zero (i.e., there is no correlation) is a t-test based on the sample correlation coefficient, r. With n (pairs of) observations the test statistic is: Degrees of freedom is n – 2

Example Data: n = 10 r = Determine if the sample correlation is significant at the 5% level of significance. t = (8)0.5 / [1 – (0.475)2] 0.5 = / 0.88 = The two-tailed critical t – values at a 5% level of significance with df = 8 (n-2) are found to be +/ Since ≤ ≤ 2.306, the null hypothesis cannot be rejected, i.e. correlation between variables X and Y is not significantly different from zero at a 5% significance level.

Testing H0: Correlation = 0
The test of whether the true correlation between two random variables is zero (i.e., there is no correlation) is a t-test based on the sample correlation coefficient, r. With n (pairs of) observations the test statistic is: Degrees of freedom is n – 2

Testing H0: Correlation = 0
The test of whether the true correlation between two random variables is zero (i.e., there is no correlation) is a t-test based on the sample correlation coefficient, r. With n (pairs of) observations the test statistic is: Degrees of freedom is n – 2

Testing H0: Correlation = 0
The test of whether the true correlation between two random variables is zero (i.e., there is no correlation) is a t-test based on the sample correlation coefficient, r. With n (pairs of) observations the test statistic is: Degrees of freedom is n – 2

Linear Regression Dependent variable: you are trying to explain changes in this variable Independent variable: the variable being used to explain the changes in the dependent variable Example: You want to predict housing starts using mortgage interest rates: Independent variable = mortgage interest rates Dependent variable = housing starts

Regression Equation y-Intercept Error Term Independent Variable
Slope Coefficient

Assumptions of Linear Regression
Linear relation between dependent and independent variables Independent variable uncorrelated with error term Expected value of error term is zero Variance of the error term is constant Error term is independently distributed Error term is normally distributed

Estimated Regression Coefficients
Estimated regression line is: Slope Y-Intercept

Estimating the slope coefficient
b1 = the cov(X,Y) / var(X) Example Compute the slope coefficient and intercept term for the least squares regression equation using the following information: Where X – Xmean multiplied by Y-Ymean = 445, and X – Xmean squared = The sample means of X and Y = 25 and 75, respectively. The slope coefficient, b1 = 445/374.5 = The intercept term, b0 = 75 – (25) = 45.3.

Calculating the Standard Error of the Estimate (SEE)
SEE measures the accuracy of the prediction from a regression equation It is the standard dev. of the error term The lower the SEE, the greater the accuracy

Interpreting the Coefficient of Determination (R2)
R2 measures the percentage of the variation in the dependent variable that can be explained by the independent variable An R2 of 0.25 means the independent variable explains 25% of the variation in the dependent variable Caution: You cannot conclude causation

Calculating the Coefficient of Determination (R2)
For simple linear regression, R2 is the correlation coefficient (r) squared Example: Correlation coefficient between X and Y, (r) = 0.50 Coefficient of determination = = 0.25

Coefficient of Determination (R2)
R2 can also be calculated with SST and SSR SS Total = SS Regression + SS Error Total variation = explained variation + unexplained variation