Chapter 16 Data Analysis: Testing for Associations.

Chapter 16 Data Analysis: Testing for Associations

Relationships Direction Positive Negative Strength Weak Strong Moderate 16-2

Relationships 3 Types Curvilinear relationship between two variables – the strength and/or direction of the relationship changes over the range of both variables. Linear relationship between two variables – the strength and nature of the relationship remains the same over the range of both variables. 16-3

Relationships between Variables Three Questions Three Questions Is there a relationship between the two variables we are interested in? How strong is the relationship? How can that relationship be best described? 16-4

No Relationship between X and Y 16-5

Positive Relationship between X and Y 16-6

Negative Relationship between X and Y 16-7

Curvilinear Relationship between X and Y 16-8

Pearson Correlation Coefficient... statistical measure of the strength of a linear relationship between two metric (interval or ratio level) variables. 16-9

It varies between –1.00 and +1.00, with 0 representing absolutely no association between two variables, and –1.00 and +1.00 representing perfect association between two variables. The higher the absolute value of the correlation coefficient the stronger the level of association. The size of the correlation coefficient can be used to quantitatively describe the strength of the association between two variables. Pearson Correlation Coefficient 16-10

Null hypothesis states that there is no association between the two variables in the population and that the correlation coefficient is zero. Pearson Correlation Coefficient If correlation coefficient is statistically significant the null hypothesis is rejected and the conclusion is that the two variables do share some association in the population. 16-11

Spearman Rank Order Correlation... a statistical measure of the linear association between two variables where both have been measured using ordinal (rank order) scales. 16-12

If either one of the variables is represented by rank order (ordinal) data – use the Spearman rank order correlation coefficient. Spearman Rank Order Correlation Spearman rank order correlation coefficient tends to produce a lower coefficient and is considered a more conservative measure. We should choose a Pearson Correlation when we can. 16-13

... a statistical technique that analyzes the linear relationship between two variables by estimating coefficients for an equation for a straight line. One variable is designated as a dependent variable and the other is called an independent or predictor variable. Bivariate Regression Analysis 16-14

Relationship is linear. Variables of interest are measured on interval or ratio scales (except in the case of dummy variables). Variables come from a bivariate normal population (distribution). The error terms associated with making predictions are normally and independently distributed. Regression Assumptions 16-15

Regression – formula for a straight line y = a + bX + e i where y=the dependent variable a=the intercept (point where the straight line intersects the y-axis when X = 0 b=the slope (the change in y for very 1-unit change in x) X=the independent variable used to predict y e i =the error for the prediction What is Regression Analysis? 16-16

Adjusted r-square – adjustment reduces the r 2 by taking into account the sample size and the number of independent variables in the regression equation. It tells you when the multiple regression equation has too many independent variables. Explained variance – amount of variation in the dependent variable that can be accounted for by the combination of independent variables (represented by r 2 in a bivariate regression or adjusted r 2 in a multivariate regression). Unexplained variance – amount of variation in the dependent variable that can not be accounted for by the combination of independent variables. Regression coefficient – indicator of the importance of an independent variable in predicting a dependent variable. Large coefficients are good predictors and small coefficients are weak predictors. Regression Analysis Terms 16-17

Significant Model? – answers the first question about the relationship – “Is there a relationship between the dependent and independent variable?” How strong is the relationship? – the size of the coefficient of determination (r 2 ) – tells what percentage of the total variation in dependent variable is explained. r 2 measure varies between.00 and 1.00 – the size of the r 2 indicates the strength of the relationship – the closer to 1.00 the stronger the relationship. Statistical Significance in Regression 16-18

Multiple Regression Analysis... a statistical technique that analyzes the linear relationship between a dependent variable and MULTIPLE independent variables by estimating coefficients for the equation for a straight line. 16-19

If the independent variables are measured using a different scale (1 -5 for one 1 – 10 for another) then the different scales do not permit relative comparisons between regression coefficients to see which independent variable has the most influence on the dependent variable. Multiple Regression concern 16-20

Standardized regression coefficients (beta coefficients) correct this problem. Beta coefficient is an estimated regression coefficient that has been recalculated (standardized) to have a mean of 0 and a standard deviation of 1. Standardization removes the effects of different scales and enables independent variables with different units of measurement to be directly compared for their predictive ability. 16-21

Assess the statistical significance of the overall regression model using the F statistic and its associated probability. Examine the r 2 to see how large it is. FOR MULTIVARIATE: Evaluate the individual regression coefficients and their t-test statistic to see which are statistically significant. FOR MULTIVARIATE: Look at the variables’ beta coefficients to assess relative influence (standardized if the variable scales are different). When evaluating regression analysis results 16-22

Used when independent variables you may want to use to predict a dependent variable may not be measured using interval or ratio scales. Dummy Variables – artificial variables introduced into a regression equation to represent the categories of a nominally scaled variable. There will be one dummy variable for each of the nominal categories of the independent variable and the values will typically be 0 or 1. 16-23

Can result in difficulty in estimating independent regression coefficients for the correlated variables. It inflates the standard error of the coefficient and lowers the t statistic associated with it (makes the variables in question less likely to be significant). Impacts the individual regression coefficients (the independent variables). Does not impact the size of the r 2 or the ability to predict values of the dependent variable. Can result in difficulty in estimating independent regression coefficients for the correlated variables. It inflates the standard error of the coefficient and lowers the t statistic associated with it (makes the variables in question less likely to be significant). Impacts the individual regression coefficients (the independent variables). Does not impact the size of the r 2 or the ability to predict values of the dependent variable. Multicollinearity – independent variables are highly correlated with each other. 16-24

Chapter 16 Data Analysis: Testing for Associations.

Similar presentations

Presentation on theme: "Chapter 16 Data Analysis: Testing for Associations."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 16 Data Analysis: Testing for Associations.

Similar presentations

Presentation on theme: "Chapter 16 Data Analysis: Testing for Associations."— Presentation transcript:

Similar presentations

About project

Feedback