12-2 Relationships between Variables Is there a relationship between the two variables we are interested in? How strong is the relationship? How can that relationship be best described?
12-3 Relationships between Variables – Linear relationship: The strength and nature of the relationship remains the same over the range of both variables – Curvilinear relationship: The strength and/or direction of their relationship changes over the range of both variables
12-4 Covariation and Variable Relationships Covariation: The amount of change in one variable that is consistently related to the change in another variable of interest – Scatter diagram: A graphic plot of the relative position of two variables using a horizontal and a vertical axis to represent the values of the respective variables A way of visually describing the covariation between two variables
12-9 Correlation Analysis Pearson correlation coefficient: Statistical measure of the strength of a linear relationship between two numerical variables – Varies between – 1.00 and 1.00 0 represents absolutely no association between two variables – 1.00 or 1.00 represent a perfect association between two variables – -1.00 represents a perfect negative (indirect) association – + 1.00 represents a perfect positive (direct) association
12-10 Rules of Thumb about the Strength of Correlation Coefficients
12-11 Assumptions for Calculating Pearson’s Correlation Coefficient The two variables have been measured using interval- or ratio-scaled measures Relationship is linear Variables come from a normally distributed population
12-12 SPSS Pearson Correlation Example What is the extent of the relationship between ‘Satisfaction’ and ‘Likelihood of Recommending’?
12-13 Coefficient of Determination Coefficient of determination (r 2 ): A number measuring the proportion of variation in one variable accounted for by the variation in another variable – Can be thought of as a percentage and varies from 0.0 to 100% (0.0 to 1.0) – The larger the size of the coefficient of determination: The stronger the linear relationship between the two variables being examined The greater the proportion of variation in the DV that is explained by variation in the IV. – How is this the same as / different from the correlation coefficient (r)?
12-14 Correlating Rank Data Spearman rank order correlation coefficient: A statistical measure of the linear association between two variables where both have been measured using rank order scales. – Measures essentially the same thing as the Pearson correlation coefficient
12-15 SPSS Spearman Rank Order Correlation What is the extent of the relationship between ‘Food Quality’ and ‘Service’ rankings by customers of Santa Fe Grill?
12-16 What is Regression Analysis? A method for arriving at more mathematically detailed relationships (predictions) than those provided by the correlation coefficient Allows numerical predictions of DVs from IVs Assumptions – Variables are measured on interval or ratio scales – Variables come from a normal population – Error terms are normally and independently distributed
12-17 Bivariate regression analysis Bivariate regression analysis: A statistical technique that analyzes the linear relationship between two variables by estimating coefficients for an equation of a straight line – One variable is designated as the dependent variable (DV) – The other is designated the independent or predictor variable (IV)
12-18 Fundamentals of Bivariate Regression General formula for a straight line: Where, – Y = The dependent variable – a = The intercept (point where the straight line intersects the Y-axis when X = 0) – b = The slope (the change in Y for every 1 unit change in X ) – X = The independent variable used to predict Y – e i = The error of the prediction
12-19 The Straight Line Relationship in Regression
12-20 Fitting the Regression Line Using the “Least Squares” Procedure
12-21 Ordinary Least Squares A statistical procedure that estimates regression equation coefficients that produce the lowest sum of squared differences between the actual and predicted values of the dependent variable Regression Coefficient Same as “slope coefficient” An indicator of the importance of an independent variable in predicting a dependent variable Large coefficients are good predictors and small coefficients are weak predictors – ** only applies to bivariate regression!!
12-22 SPSS Results for Bivariate Regression What is the mathematical relationship between ‘Satisfaction’ and customers’ perception of ‘Reasonable Prices’ at Santa Fe Grill?
12-23 Multiple Regression Analysis A statistical technique which analyzes the linear relationship between a dependent variable and multiple independent variables by: – Estimates multiple slope coefficients for the equation of a straight line – Each DV has a slope coefficient that partially predicts IV – Much more complicated than bivariate regression
12-24 Fundamentals of Multiple Regression General formula for a straight line: Y = a + b 1 X 1 + b 2 X 2 + b 3 X 3 + … e i Where, – Y = The dependent variable – a = The intercept (point where the straight line intersects the Y-axis when X = 0) – b 1 = The slope (the change in Y for every 1 unit change in X 1 ) – X 1 = The first independent variable used to predict Y – b 2 = The slope (the change in Y for every 1 unit change in X 2 ) – X 2 = The second independent variable used to predict Y – e i = The error of the prediction
12-25 Standardized Beta Coefficient An estimated regression coefficient that has been recalculated to have a mean of 0 and a standard deviation of 1 Enables independent variables with different units of measurement to be directly compared on the strength of their association with the dependent variable
12-26 Examining the Omnibus Statistical Significance of the Regression Model Model F statistic: Magnitude of “Model F” used determine whether the entire regression is significant – A significant F statistic indicates that the regression model as a whole is “significant” (i.e. can be trusted!) – Look for p-value of F-Statistic less than.05
12-27 Substantive Significance The multiple r 2 (coefficient of determination) describes the strength of the relationship between all the independent variables as a group and the dependent variable – The larger the r 2 measure, the more the behavior of the dependent measure that is explained by the group of independent variables – 1 - r 2 = “coefficient of alienation” or the portion of DV variation that remains unexplained.
12-28 Examining the Statistical Significance of Each Coefficient Each regression coefficient is divided by its standard error to produce a t statistic P-values of t-tests for coefficients = 0 that are less than.05 are typically regarded as “significant” – Significantly different from “0” – If “significant”, we are confident in coefficient’s (i.e. variable’s) mathematical effect on the DV.
12-29 Multiple Regression Assumptions Linear relationship between DV and IVs Homoskedasticity: The pattern of the co- variation is constant (the same) around the regression line, whether the values are small, medium, or large – Heteroskedasticity: The pattern of covariation around the regression line is not constant, and varies in some way when the values change from smaller to larger Normal distribution: All variables are normally distributed
12-31 Example of a Normally Distributed Variable
12-32 SPSS Results for Multiple Regression What is the mathematical relationship between ‘Satisfaction’ and customers’ perception of ‘Fresh Food’, ‘Food Taste’ and ‘Proper Food Temperature’ at Santa Fe Grill? Which is the best regression model?
12-33 Assess the statistical significance of the overall (omnibus) regression model using the “Model F” statistic and its associated p-value Evaluate the regression’s adjusted multiple R- squared (i.e. coefficient of determination) Examine the individual regression coefficients and their p-values to see which are statistically significant Look for p <.05, but consider “marginal” cases Look at values of the standardized beta coefficients to assess relative influence of each predictor (IV) on the Dependent Variable (DV) Evaluating a Regression Analysis - Summary
12-34 Multicollinearity A situation in which several independent variables are highly correlated with each other Can result in difficulty in estimating independent regression coefficients for the correlated variables Standard errors of Beta coefficients become unreasonably high Beta coefficients will typically not be significant
12-35 Multicollinearity – How to Avoid or Fix it!! Eliminate or replace highly correlated IVs – Perform a correlation matrix – Typically search for correlations higher than.5 in absolute value Factor Analysis Techniques (also called “Principal Components Analysis”) “Live with it”