Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dr. Mario MazzocchiResearch Methods & Data Analysis1 Correlation and regression analysis Week 8 Research Methods & Data Analysis.

Similar presentations


Presentation on theme: "Dr. Mario MazzocchiResearch Methods & Data Analysis1 Correlation and regression analysis Week 8 Research Methods & Data Analysis."— Presentation transcript:

1 Dr. Mario MazzocchiResearch Methods & Data Analysis1 Correlation and regression analysis Week 8 Research Methods & Data Analysis

2 Dr. Mario MazzocchiResearch Methods & Data Analysis2 Lecture outline Correlation Regression Analysis The least squares estimation method SPSS and regression output Task overview

3 Dr. Mario MazzocchiResearch Methods & Data Analysis3 Correlation Correlation measures to what extent two (or more) variables are related –Correlation expresses a relationship that is not necessarily precise (e.g. height and weight) –Positive correlation indicates that the two variables move in the same direction –Negative correlation indicates that they move in opposite directions

4 Dr. Mario MazzocchiResearch Methods & Data Analysis4 Covariance Covariance measures the “joint variability” If two variables are independent, then the covariance is zero (however, Cov=O does not mean that two variables are independent) Where E(…) indicates the expected value (i.e. average value)

5 Dr. Mario MazzocchiResearch Methods & Data Analysis5 Correlation coefficient The correlation coefficient r gives a measure (in the range –1, +1) of the relationship between two variables –r=0 means no correlation –r=+1 means perfect positive correlation –r=-1 means perfect negative correlation Perfect correlation indicates that a p% variation in x corresponds to a p% variation in y

6 Dr. Mario MazzocchiResearch Methods & Data Analysis6 Correlation coefficient and covariance Pearson correlation coefficient Correlation coefficient - POPULATION SAMPLE

7 Dr. Mario MazzocchiResearch Methods & Data Analysis7 Bivariate and multivariate correlation Bivariate correlation –2 variables –Pearson correlation coefficient Partial correlation –The correlation between two variables after allowing for the effect of other “control” variables

8 Dr. Mario MazzocchiResearch Methods & Data Analysis8 Significance level in correlation Level of correlation (value of the correlation coefficient): indicates to what extent the two variables “move together” Significance of correlation (p value): given that the correlation coefficient is computed on a sample, indicates whether the relationship appear to be statistically significant Examples –Correlation is 0.50, but not significant: the sampling error is so high that the actual correlation could even be 0 –Correlation is 0.10 and highly significant: the level of correlation is very low, but we can be confident on the value of such correlation

9 Dr. Mario MazzocchiResearch Methods & Data Analysis9 Correlation and covariance in SPSS Choose between bivariate & partial

10 Dr. Mario MazzocchiResearch Methods & Data Analysis10 Bivariate correlation Select the variables you want to analyse Require the significance level (two tailed) Ask for additional statistics (if necessary)

11 Dr. Mario MazzocchiResearch Methods & Data Analysis11 Bivariate correlation output

12 Dr. Mario MazzocchiResearch Methods & Data Analysis12 Partial correlations List of variables to be analysed Control variables

13 Dr. Mario MazzocchiResearch Methods & Data Analysis13 Partial correlation output - - - P A R T I A L C O R R E L A T I O N C O E F F I C I E N T S - - - Controlling for.. SIZE STYLE AMTSPENT USECOUP ORG AMTSPENT 1.0000.2677 -.0116 ( 0) ( 775) ( 775) P=. P=.000 P=.746 USECOUP.2677 1.0000.0500 ( 775) ( 0) ( 775) P=.000 P=. P=.164 ORG -.0116.0500 1.0000 ( 775) ( 775) ( 0) P=.746 P=.164 P=. (Coefficient / (D.F.) / 2-tailed Significance) ". " is printed if a coefficient cannot be computed Partial correlations still measure the correlation between two variables, but eliminate the effect of other variables, i.e. the correlations are computed on consumers shopping in stores of identical size and with the same shopping style

14 Dr. Mario MazzocchiResearch Methods & Data Analysis14 Bivariate and partial correlations Correlation between Amount spent and Use of coupon –Bivariate correlation: 0.291 (p value 0.00) –Partial correlation: 0.268 (p value 0.00) The amount spent is positively correlated with the use of coupon (0=no use, 1=from newspaper, 2=from mailing, 3=both) The level of correlation does not change much after accounting for different shop size and shopping styles

15 Dr. Mario MazzocchiResearch Methods & Data Analysis15 Linear regression analysis Dependent variable Intercept Regression coefficient Independent variable (explanatory variable, regressor…) Error

16 Dr. Mario MazzocchiResearch Methods & Data Analysis16 Regression analysis y x

17 Dr. Mario MazzocchiResearch Methods & Data Analysis17Example We want to investigate if there is a relationship between cholesterol and age on a sample of 18 people The dependent variable is the cholesterol level The explanatory variable is age

18 Dr. Mario MazzocchiResearch Methods & Data Analysis18 What regression analysis does Determine whether a relationships exist between the dependent and explanatory variables Determine how much of the variation in the dependent variable is explained by the independent variable (goodness of fit) Allow to predict the values of the dependent variable

19 Dr. Mario MazzocchiResearch Methods & Data Analysis19 Regression and correlation Correlation: there is no causal relationship assumed Regression: we assume that the explanatory variables “cause” the dependent variable –Bivariate: one explanatory variable –Multivariate: two or more explanatory variables

20 Dr. Mario MazzocchiResearch Methods & Data Analysis20 How to estimate the regression coefficients The objective is to estimate the population parameters  e  on our data sample  A good way to estimate it is by minimising the error e i, which represents the difference between the actual observation and the estimated (predicted) one

21 Dr. Mario MazzocchiResearch Methods & Data Analysis21 The objective is to identify the line (i.e. the a and b coefficients) that minimise the distance between the actual points and the fit line

22 Dr. Mario MazzocchiResearch Methods & Data Analysis22 The least square method This is based on minimising the square of the distance (error) rather than the distance

23 Dr. Mario MazzocchiResearch Methods & Data Analysis23 Bivariate regression in SPSS

24 Dr. Mario MazzocchiResearch Methods & Data Analysis24 Regression dialog box Dependent variable Explanatory variable Leave this unchanged!

25 Dr. Mario MazzocchiResearch Methods & Data Analysis25 Regression output Value of the coefficients Statistical significance Is the coefficient different from 0?

26 Dr. Mario MazzocchiResearch Methods & Data Analysis26 Model diagnostics: goodness of fit The value of the R square is included between 0 and 1 and represents the proportion of total variation that is explained by the regression model

27 Dr. Mario MazzocchiResearch Methods & Data Analysis27 R-square Total variation Variation explaned by regression Residual variation

28 Dr. Mario MazzocchiResearch Methods & Data Analysis28 Multivariate regression The principle is identical to bivariate regression, but there are more explanatory variables The goodness of fit can be measured through the adjusted R-square, which takes into account the number of explanatory variables

29 Dr. Mario MazzocchiResearch Methods & Data Analysis29 Multivariate regression in SPSS Analyze / Regression / Linear Simply select more than one explanatory variable

30 Dr. Mario MazzocchiResearch Methods & Data Analysis30 Output

31 Dr. Mario MazzocchiResearch Methods & Data Analysis31 Coefficient interpretation The constant represents the amount spent being 0 all other variables (£ 296.5) Health food stores, Size of store and being vegetarian are not significantly different from 0 Gender coeff = -69.6: On average being woman (G=1) implies spending £ 69 less Shopping style coeff = +22.8 S –S=1 (shop per himself) = +22.8 –S=2 (shop per himself & spouse) = +45.6 –S=3 (shop per himself & family) = +68.4 Coupon use coeff = 30.4 C –C=1 (do not use coupon) = +30.4 –C=2 (coupon from newspapers) = +60.8 –C=3 (coupon from mailings) = +91.2 –C=4 (coupon from both) = +121.6 Categorization problems?

32 Dr. Mario MazzocchiResearch Methods & Data Analysis32 Prediction On average, how much will someone with the following characteristics spend: –Male (G=0) –Shopping for family (S=3) –Not using coupons (C=1)

33 Dr. Mario MazzocchiResearch Methods & Data Analysis33 How good is the model? The regression model explain less than 19% of the total variation in the amount spent

34 Dr. Mario MazzocchiResearch Methods & Data Analysis34 Task A Examine the relationship between the amount spent and the following customer characteristics: –Being male/female –Being vegetarian –Shopping for himself / for himself and others –Shopping style (weekly, bi-weekly, etc.) Potential methods: Battery of hypothesis testing & Analysis of variance Regression Analysis

35 Dr. Mario MazzocchiResearch Methods & Data Analysis35 Task B Examine the relationship between the amount spent and the following customer characteristics: –Hypothesis: the average amount spent in health- oriented shop is higher than those of other shops. True or false? –Test the same hypothesis accounting for different shop sizes Potential methods: Battery of hypothesis testing & Analysis of variance Regression Analysis

36 Dr. Mario MazzocchiResearch Methods & Data Analysis36 Task C Find a relationship between the average amount spent per store and the following store characteristics: –Size of store –Health-oriented store –Store organisation Potential methods: Transform the customer data set into a store data set Battery of ANOVA Regression Analysis

37 Dr. Mario MazzocchiResearch Methods & Data Analysis37 Task D Hypothesis: is the amount spent by those that use coupon significantly higher? What is the most effective way of distributing coupons: –By mail –On newspapers –Both Potential methods: Recode the variable into 1=not using coupon and 2=using coupon Hypothesis testing Analysis of variance


Download ppt "Dr. Mario MazzocchiResearch Methods & Data Analysis1 Correlation and regression analysis Week 8 Research Methods & Data Analysis."

Similar presentations


Ads by Google