Presentation is loading. Please wait.

Presentation is loading. Please wait.

School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science LIS 397.1 Introduction.

Similar presentations


Presentation on theme: "School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science LIS 397.1 Introduction."— Presentation transcript:

1 School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science LIS 397.1 Introduction to Research in Library and Information Science Working with Two Variables: Correlation, Regression, and Chi-Square R. E. Wyllys Copyright 2003 by R. E. Wyllys Last revised 2003 Jan 15

2 School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science Standardized Tests of Statistical Hypotheses To each type of statistical hypothesis corresponds a particular standardized test procedure or procedures Each test procedure includes a formula, the “test statistic” You –place, into the test statistic, data from observed sample or samples –obtain a number, the observed value of the test statistic

3 School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science Standardized Tests of Statistical Hypotheses Traditional Method: Compare absolute value of observed value of test statistic against threshold value from pertinent table –If |test statistic|  tabled threshold Accept H 0 –If |test statistic| > tabled threshold Reject H 0 Computer-Era Method: Use probability of getting observed value of test statistic when the null hypothesis H 0 is true (OVTSWNHT) –If P(OVTSWNHT)   Accept H 0 –If P(OVTSWNHT) <  Reject H 0

4 School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science Common Types of Two-Variable Statistical Hypotheses H 0 :  XY = 0 –Interval variables X and Y are not correlated in the population: “There is no correlation between the age and the salary of a typical librarian” H 0 : Categorical variables X and Y are not associated in the population: –“There is no association between the sex of a library patron and the type of book the patron prefers”

5 School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science H 0 :  XY = 0 Test statistic r XY can be as large as +1 and as small as -1. It expresses the tendency, if any, toward "preferential co-occurrence": the tendency of certain values of X to "prefer" to occur together with certain values of Y.

6 School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science How Correlation Works Numerator of r XY is where paired behavior is analyzed : Each pair of parentheses contains pairs of numbers, of like or unlike signs. Pairs (+)(+) and (-)(-) yield positive products; pairs (+)(-) and (-)(+) yield negative products. If large values of X tend to occur along with large values of Y, and small values of X along with small values of Y, then most of the pairs will be either (+)(+) or (-)(-) and will contribute positive numbers to the sum. If large values of X tend to occur along with small values of Y, and small values of X along with large values of Y, then most of the pairs will be either (+)(-) or (-)(+) and will contribute negative numbers to the sum. The first situation is called positive correlation; the second, negative correlation. Numbers in denominator of r XY simply adjust for units in which variables are observed and for sample size, so as to yield range from -1 to +1.

7 School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science H 0 :  XY = 0 Example 1 : 1 Adapted from Stephens, p. 320, Example 13.19 Excel' s output for its Correlation procedure is rather brief, as shown next. However, the same information, and much more, is supplied by its Regression procedure.

8 School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science H 0 :  XY = 0 Example 1 : 1 Adapted from Stephens, p. 320, Example 13.19 Output of Excel' s Correlation procedure

9 School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science H 0 :  XY = 0 Example 1 : 1 Adapted from Stephens, p. 320, Example 13.19 Partial output of Excel' s Regression procedure

10 School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science Linear Regression Linear regression is applicable only when you have a pair of correlated variables. Regression allows you to use an observed value of one of the variables to provide an estimated corresponding value for the other variable. This is especially valuable when one of the variables can be observed easily and/or now, and the other variable can be observed only with difficulty and/or in the future. The "predictor" variable is denoted by X; the "predicted," or "criterion," variable, by Y. The LR equation is shown below; the coefficients B 0 and B 1 are calculated from the observed pairs of values of X and Y.

11 School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science Linear Regression The LR equation is the algebraic equation of a straight line, written in “slope-intercept” form. The coefficient B 0 is called the “intercept coefficient” because it tells us where the line intercepts the vertical axis (the Y axis). The coefficient B 1 is called the “slope coefficient” because it tells us the slope of the line: e.g., a line that slopes up to the right at a 45 o angle has a slope of 1; a line that slopes down at an angle of –22.5 o has a slope of –0.5. Programs that calculate the coefficients B i often label B 0 as “Intercept” and label B 1 with the name of the X variable.

12 School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science Calculation of the Linear Regression Coefficients

13 School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science Partial output of Excel 's Regression procedure for data in Stephens's Table 13.2

14 School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science Excel plot of observed points (X, Y) and trendline for data in Stephens's Table 13.2

15 School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science SPSS plot of observed points (X, Y) and trendline for data in Stephens's Table 13.2

16 School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science Multilinear Regression Multilinear regression extends the idea of linear regression. It permits the estimation of a predicted variable on the basis of observations of 2 or more predictor variables. It is a very powerful tool—widely used in research and in modern management—for analyzing complicated situations and figuring out which factors are important to some result (the predicted variable) and which factors are not important. In essence, the important factors will have large (positive or negative) values for their associated B i s; the unimportant factors, small (near zero) values for their B i s.

17 School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science Chi-Square Test of Association The chi-square test of association is applicable when you have 2 categorical variables and wonder whether there is any association between them. This is analogous to the search for a tendency (correlation) for preferential co- occurrence of values of a pair of interval variables. You cross-tabulate the elements of your sample according to the 2 variables, thus obtaining the observed frequencies of occurrence of the various values of the variables. You calculate expected frequencies and calculate  2 as shown below.

18 School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science Observed value of  2 is compared with tabled threshold value, chosen according to –Chosen level of significance,  –Number of degrees of freedom: df = (# of rows - 1)(# of columns - 1) Most statistical processing programs, including SPSS, also calculate the level of significance of the observed value of  2 Chi-Square Test of Association

19 School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science Example of  2 Association Test Observed values are shown in blue; expected values, in red.

20 School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science Example of  2 Association Test Output from SPSS' s Crosstabs procedure with Chi- Square option

21 School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science Don't use  2 if any E i < 5 (size of O i s does not matter) –If that happens, try collapsing categories (i.e., merging rows or columns) till every expected frequency is at least 5 In the 2x2 case, use Yates's correction for continuity (SPSS uses it automatically in the 2x2 case), as shown in the formula below. Here a and b represent the observed frequencies in the top row of the 2x2 table; c and d represent the observed frequencies in the bottom row; and n is the size of the sample. Special Notes re the  2 Test of Association

22 School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science Two-Variable Problems Usually Concern Interactions and Patterns of Co- Occurrence


Download ppt "School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science LIS 397.1 Introduction."

Similar presentations


Ads by Google