Presentation is loading. Please wait.

Presentation is loading. Please wait.

Correlations and simple regression analysis

Similar presentations


Presentation on theme: "Correlations and simple regression analysis"— Presentation transcript:

1 Correlations and simple regression analysis
Data analysis and information management EUZC405 M.Bazarov

2 Today’s Agenda Measuring association between the variables (covariance and coefficient of correlation) Simple regression analysis Summary in Excel

3 Learning Objectives After completion of this lecture you will be able to: Define and calculate correlation coefficient; Find the regression line and use it for regression analysis; Define and calculate coefficient of determination (R-squared); Understand and interpret regression output from Excel

4 Measuring association between the variables
Use of term correlation implies: That there are two or more entities under consideration. That there is some common link which makes them related to a greater or lesser degree. Consider: CA1 assessment scores and final exam results. Height and Weight. Price of goods and wages paid to the producers.

5 Measuring association between the variables
Consider example: Tim Newton is the sales manager of a firm which manufactures meat products and a sell a big part of them directly to retail food stores via a large force of sales representatives. Recently, as the recession has begun to affect the business, Mr. Newton has become aware of the need to monitor representatives’ performance more closely, but the trouble is that he does not have very much idea what factors may influence that performance.

6 Measuring association between the variables
Rep. no. Value of last quarter’s sales ($000s) Number of retail outlets visited regularly Area covered (square miles) 1 2 3 4 5 6 7 8 9 10 25 29 31 42 44 45 47 57 50 12 17 21 26 34 30 38 61 450 500 350 250 150 420 275 200 400 300

7 Measuring association between the variables

8 Measuring association between the variables
What can we say about this relationship? Outlier Outlier!

9 Measuring association between the variables
In general, one could observe that when number of outlets visited (or variable X) is above its mean then sales (or variable Y) also above its mean. Mean X Mean Y

10 Measuring association between the variables
The covariance measures linear dependence between two variables. Covariance (x,y)= Cov>0 indicates that two variables move in the same direction (when x is above the mean so does the y) Cov<0 indicates that two variables move in opposite direction (when x is above the mean the y is opposite)

11 Measuring association between the variables
To standardize the covariance we need to divide it by the product of two separate standard deviations. R or r = Where R or r is also known as Pearson’s product moment correlation coefficient Cov (x,y)=

12 The sales data revisited
Rep No Value of last quarter's sales (y) Number of retail outlets visited regularly (x) y^2 x^2 xy 2 25 12 625 144 300 3 29 17 841 289 493 4 31 21 961 441 651 5 26 676 806 6 42 34 1764 1156 1428 7 44 30 1936 900 1320 8 45 38 2025 1444 1710 9 47 2209 2115 10 57 61 3249 3721 3477 351 284 14571 10796 12300

13 Finding the coefficient of correlation
= 351/9, =284/9 Covariance= = 136

14 Simple regression analysis
Hence, if the relationship between variables exists (as we can see from correlation coefficient) we would be interested in predicting the behaviour of one variable, say y, from behaviour of the other, say x - predictor or independent variable denoted x ; - dependant variable denoted by y.

15 Simple regression analysis
For example, relationship between the sales and number of outlets visited could be well approximated by the line : Sales=a+ b *number of outlets visited (where a is a number of sales when no outlet is visited (x=0) Or y=a+bx

16 Simple regression analysis
The problem is we could draw many possible lines. Which one to choose?

17 Simple regression analysis
Well, try to find a line that minimizes the sum of squared distances between the data and the line (see the graph!) to ensure a better fit!

18 Simple regression analysis
For example, let’s estimate the regression line for our data on sales minimizing the sum of squared differences between data and the line: Sales=a+ b *number of outlets visited Coefficient b of such line could be found using the following formula Coefficient a of such line could be found using the following formula

19 Simple regression analysis
Hence,

20 Simple regression analysis
sales= x Wow, we now could predict the sales by looking at number of outlet visited by sales representatives! In our case, if we increase the number of outlets visited by sales representative by one the sales will increase by thousand dollars or $.

21 Simple regression analysis
After we derived the regression line you have to ask yourself how well such line actually fits the data or “Goodness-of-fit” of the regression? Consider example: The average sales are: 351/9=39 Take any one value, say representative #8 Regression predicts: y= x= *38=43.29 Rep No Value of last quarter's sales (y) Number of retail outlets visited regularly (x) 2 25 12 3 29 17 4 31 21 5 26 6 42 34 7 44 30 8 45 38 9 47 10 57 61 544

22 Simple regression analysis
Look at the graph: Y=45 s a l e du= =1.71 dt=Y-mean=45-39=6 de= =4.29 Mean=39 b= a=17.94 # of outlet visited X=38

23 Simple regression analysis
Hence, we could say that on average we generate 39 thousand dollars in sales. When representative #8 visits 38 outlets we use regression to predict the sales number to be thousand dollars. Hence, our regression explains proportion of deviation from the mean or de (explained deviation) and du (unexplained deviation) is the proportion of deviation that is left unexplained. The total deviation (dt) is simply sum of both: dt=de+du! Summing such deviation across all observations gives us: As you probably remember from our previous lectures deviations from the mean sum to zero.

24 Simple regression analysis
Hence, we could use the sum of squared deviations to see how well our regression fits the data. And we denote -Total Sum of Squares (TSS) -Regression (Explained) Sum of Squares (ESS) - Residual (Unexplained) Sum of Squares (RSS) The coefficient of determination (R-squared) is R (squared):

25 Simple regression analysis
Now, look at the regression output (from Excel) below:

26 Simple regression analysis
As you have probably noticed, the good thing is we do not need do all these calculations manually, Excel reports it to us! And you can easily identify all components we looked at today: correlation coefficient (Multiple R), R-squared, and regression coefficients (a=17.94 and b=0.66) The only part, we have to explain to finalize our discussion today is to understand what is the t-statistics reported means.

27 Simple regression analysis
As you have probably noticed, the estimated coefficients (a=17.94 and b=0.66) or estimates are obtained from the sample! The t-statistics tests the hypothesis that a population regression coefficient β is 0, that is, Ho: β=0. There is also alternative hypothesis H1: β≠0. So t-statistics shows us how β is significantly different from zero. In our example t-statistics for β is equal to

28 Simple regression analysis
Please note, it is only different from z-statistics we used in our previous example is that we are using SD of sample coefficient in our formula above! Should we reject this H0 or not at 5% level of significance? To decide on this you could either look at p-value or confidence interval reported in Excel regression output.! Using p-values: p = 2*P(t-statistics>t-critical). In other words, p-value less than significance level leads us to reject null hypothesis H0: β=0

29 Further Reading and Reference
Chapter S3 . Swift and S. Piff Quantitative methods for business, management and finance ( edition), Palgrave Chapter 15&16, Curwin, J. & Slater, R. (2002 5th edition) Quantitative Methods for Business Decisions, Thomson Chapter 3, Burton, G., Carrol, G. & Wall, S. (2002 2nd edition) Quantitative Methods for Business & Economics, Financial Times / Prentice Hall Chapter 11, Bancroft and O’Sullivan (2000) Foundations in Quantitative Business Techniques, Mc-Graw Hill Publishing


Download ppt "Correlations and simple regression analysis"

Similar presentations


Ads by Google