PPA 501 – Analytical Methods in Administration Lecture 8 – Linear Regression and Correlation.
Published byModified over 4 years ago
Presentation on theme: "PPA 501 – Analytical Methods in Administration Lecture 8 – Linear Regression and Correlation."— Presentation transcript:
PPA 501 – Analytical Methods in Administration Lecture 8 – Linear Regression and Correlation
Scattergrams The first step in examining a relationship between interval-ratio variables is to prepare a scattergram. A scattergram, like a bivariate table, has two dimensions. The scores on the independent variable (X) are arrayed along the horizontal axis. The scores on the dependent variables (Y) are arrayed along the vertical axis. Each dot in the scattergram represents the X and Y scores for a case.
Scattergrams The pattern can be enhanced by drawing a straight line to represent the data that is as close as possible to all of the points in the scattergram. Two variables are associated if the values of Y are conditional on the values of X (increase or decrease depending on X).
Scattergrams The strength of the relationship can be visualized by examining how tightly the points fit around the line. Positive relationships will slope up; Negative relationships will slope down. Zero relationships will appear to be a cloud of random points.
Regression and Prediction One key assumption underlying the statistical techniques to be discussed here is that the two variables have an essentially linear relationship. That is, the scattergram approximates a straight line. Non-linearity requires adjustments to the model that will not be discussed in this course.
The mean of any distribution of scores is the point around which the variation of scores is a minimum (i.e., no other point provides a smaller minimum). The same is true of conditional means.
Regression and Prediction The best fitting line will go as close as possible to the conditional means since the conditional means will rarely lie in a straight line. The formula for the best fitting line is:
Regression and Prediction The Y intercept is the point at which the regression line crosses the Y axis. The slope of the least-squares regression line is the amount of change in the dependent variable (Y) that is produced by a unit change in the independent variable (X). You can use the formula for the regression line to predict scores not in the data set.
The Correlation Coefficient (Pearson’s r) The slope of the regression line is a measure of the effect of X on Y. Because it is measure in the units of measurement of Y, it is not restricted to fall between zero and one. As a result, researchers rely on Pearson’s r as a measure of interval-ratio association. Varies from -1 to 1 with 0 equaling no association.
The r suggests a moderately strong, positive relationship.
Interpreting the Correlation Coefficient: r 2 The interpretation of Pearson’s r focuses on r 2, the coefficient of determination. The coefficient of determination is based on the following formula for r-squared.
Interpreting the Correlation Coefficient: r 2 R-squared can be interpreted in two ways: Percentage of total variation explained by the independent variable. Proportional reduction in error. Thus, the r-squared in the ideology-party ID problem calculated on slide 19 is.2565. Ideology explains 25.65% of the variation in party identification. Knowing a person’s ideology reduces the error in predicting their party identification by 25.65%. The unexplained variation is represented by the scatter of points around the regression line.
The Five-Step Model for Testing Pearson’s r. Step 1. Making assumptions. Random sampling. Interval-ratio measurement. Bivariate normal distribution. Linear relationship. Homoscedasticity. Normal sampling distribution.
The Five-Step Model for Testing Pearson’s r. Step 2. Stating the null hypothesis. H 0 : ρ = 0.0. H 1 : ρ > 0.0. Step 3. Selecting the sampling distribution and establishing the critical region. Sampling distribution = t distribution. Alpha=.05, one-tailed. Degrees of freedom=N-2=10-2=8. T (critical) = +1.860.
The Five-Step Model for Testing Pearson’s r. Step 4. Computing the test statistic.
The Five-Step Model for Testing Pearson’s r. Step 5. Making a decision. T(obtained) is less than t(critical). Therefore, we cannot reject the null hypothesis that the relationship between ideology and party ID is zero in the population.
Multiple Regression Multiple regression analysis has wide application in political, economic, social, and education research. It can be used with either continuous data or categorical data. Extension of linear regression with several independent variables.
Multiple regression The basis for multiple regression is the correlation matrix among the independent and dependent variables (the matrix of intercorrelations among all variables).